Benchmark dataset for AI code review tools. Contains PRs reviewed by CodeRabbit and Greptile from popular open source repos, with full PR metadata and bot review comments as ground truth.
data/ # Full PR metadata (diff, files, commits, SHAs)
coderabbit/
oven-sh-bun.json
appwrite-appwrite.json
pingcap-tidb.json
elastic-kibana.json
unkeyed-unkey.json
greptile/
hoppscotch-hoppscotch.json
twentyhq-twenty.json
BerriAI-litellm.json
PostHog-posthog.json
activepieces-activepieces.json
ground_truth/ # Inline review comments left by the bots
coderabbit/
oven-sh-bun.json
appwrite-appwrite.json
pingcap-tidb.json
elastic-kibana.json
unkeyed-unkey.json
greptile/
hoppscotch-hoppscotch.json
twentyhq-twenty.json
BerriAI-litellm.json
PostHog-posthog.json
activepieces-activepieces.json
| Repo | Stars | PRs |
|---|---|---|
| oven-sh/bun | ~88.7k | 10 |
| appwrite/appwrite | ~55.5k | 10 |
| pingcap/tidb | ~39.9k | 10 |
| elastic/kibana | ~21k | 10 |
| unkeyed/unkey | ~5.2k | 10 |
| Repo | Stars | PRs |
|---|---|---|
| hoppscotch/hoppscotch | ~78.8k | 10 |
| twentyhq/twenty | ~43.5k | 10 |
| BerriAI/litellm | ~41.9k | 10 |
| PostHog/posthog | ~32.3k | 10 |
| activepieces/activepieces | ~21.5k | 10 |
Each file contains an array of PRs with full metadata — everything needed to benchmark offline:
{
"repo": "owner/repo",
"tool": "coderabbit|greptile",
"prs": [
{
"number": 123,
"title": "...",
"body": "PR description...",
"state": "open|closed",
"merged": true,
"author": "...",
"created_at": "...",
"updated_at": "...",
"merged_at": "...",
"base_branch": "main",
"head_branch": "feature-x",
"head_sha": "abc123...",
"base_sha": "def456...",
"labels": [],
"additions": 100,
"deletions": 50,
"changed_files": 5,
"diff": "full unified diff",
"files": [
{"filename": "...", "status": "modified", "additions": 10, "deletions": 5, "patch": "..."}
],
"commits": [
{"sha": "...", "message": "...", "author": "...", "date": "..."}
]
}
]
}Each file contains an array of PRs, each with its inline review comments and the commit_id they were made on:
{
"repo": "owner/repo",
"tool": "coderabbit|greptile",
"bot_account": "...[bot]",
"prs": [
{
"pr_number": 123,
"review_comments": [
{
"id": 456,
"body": "review comment text...",
"path": "src/foo.ts",
"line": 42,
"start_line": null,
"original_line": 42,
"side": "RIGHT",
"subject_type": "line",
"commit_id": "abc123...",
"original_commit_id": "abc123...",
"diff_hunk": "@@ -10,5 +10,8 @@...",
"created_at": "...",
"updated_at": "..."
}
]
}
]
}| Tool | PRs | Review Comments |
|---|---|---|
| CodeRabbit | 50 | 295 |
| Greptile | 50 | 96 |
Data collected on 2026-04-02. Each PR includes head_sha/base_sha for reproducibility, and each review comment includes commit_id for the exact commit it was made on.