Question 1

How is the agent-friendliness score computed?

Accepted Answer

Each repository is shallow-cloned and evaluated against sixteen static signals — twelve cross-agent (AGENTS.md / CLAUDE.md, CI, tests, README, linter, type config, license, contributing guide, reproducible dev environment, pre-commit hooks, dependency manifest, codebase size) plus four agent-specific instruction files (`.cursor/rules/*.mdc`, `GEMINI.md`, `.openhands/setup.sh`, `.aider.conf.yml`). Per-model score = Σ(signal.pass × model.weight[signal]) / Σ(model.weight) × 100. Overall score = mean of per-model scores.

Question 2

Why score per model instead of giving one overall number?

Accepted Answer

Different agents lean on different repository properties — and we know which because each vendor documents it. Claude Code loads CLAUDE.md at the start of every conversation, so AGENTS.md and tests carry the most weight. GPT-5 Codex reads AGENTS.md before doing any work, so AGENTS.md is the strongest single signal for it. Devin runs in a sandboxed VM and needs an explicit dev-env setup (deps, secrets, lint/test commands), so dev-environment beats CI. Cursor cites `.cursor/rules/` and AGENTS.md as its canonical instruction surface. The same repository can score very differently across models, and a single overall number would hide that.

Question 3

Which AI coding agents are evaluated?

Accepted Answer

Claude Code, Cursor, Devin, GPT-5 Codex, Gemini CLI, Aider, OpenHands, and Pi. Each has its own weight profile encoded in lib/scoring/weights.ts.

Question 4

Is this a benchmark of agent performance?

Accepted Answer

No. Today every score is derived from static signals — file existence and content-length checks on the cloned tree. No agent is actually run. Per-model rationales are now derived from each agent's published documentation (see the Sources panel below for the URLs), but the weights themselves are still pre-benchmark — they're not yet calibrated against measured agent success. Treat the numbers as a directional signal, not a verdict.

Question 5

How can I improve my repository's score?

Accepted Answer

Add an AGENTS.md or CLAUDE.md file describing the project for agents, configure CI, ensure tests run, write a substantive README, add a linter and type config, include a license and CONTRIBUTING guide, and provide a reproducible dev environment (devcontainer or Dockerfile). The repo detail page lists the highest-impact gaps for each model.

Question 6

How do I keep my score from regressing on PRs?

Accepted Answer

Install the agent-friendly-action GitHub Action (hsnice16/agent-friendly-action). It scores the PR head and base inside your CI and posts a single comment with the score delta and per-signal changes — opt-in via an AGENTS_BADGE_TOKEN secret, falls through silently when the secret is unset. Each repo detail page shows a copy-paste workflow snippet under 'Catch score regressions on every PR'.

Question 7

What is AGENTS.md or CLAUDE.md?

Accepted Answer

A markdown file at the root of a repository that gives an AI coding agent a quick orientation: what the project is, how to build and test it, key conventions, and where to look. It is the highest-weighted signal for Pi, tied with the test suite as the top weight for Claude Code, and meaningfully helps every other agent.

Question 8

How often is the data refreshed?

Accepted Answer

Every six hours — a GitHub Actions cron runs the full scorer over the curated seed list and commits the refreshed database to the repo, which auto-deploys. Repositories are also re-scored whenever the seed list changes or the rubric is updated.

Question 9

Which forges are supported?

Accepted Answer

GitHub, GitLab, and Bitbucket. Cross-forge support is built into the cloning and scoring pipeline so the leaderboard can compare repositories regardless of host.

Methodology

Status: documented rationales, pre-benchmark weights

Score formula

Signals (16)

Models & weight profiles (8)

What isn't measured yet