Verify that every citation in your manuscript is real.
Catch hallucinated references before submission.
You wrote a research paper with AI assistance. The references look real. But are they?
LLMs hallucinate citations. They generate author names, journal titles, and DOIs that look plausible but don't exist. A single fake reference in a published paper can lead to retraction, damaged reputation, and wasted time for everyone who cites your work.
CiteCheck verifies every citation in your manuscript against real academic databases. In seconds.
pip install citecheckcitecheck my_paper.docxCiteCheck v0.1.0
Checking: my_paper.docx
Found 47 references. Verifying...
[1] McMurray JJV, Solomon SD, et al. Dapagliflozin in... ✅ crossref
[2] Gandhi L, Rodriguez-Abreu D, et al. Pembrolizumab... ✅ pubmed
...
[31] Fakeman AB, Notreal CD. The impact of quantum... ❌ not found
[32] Smith J. Nonexistent study on fake outcomes... ❌ not found
...
CiteCheck Report
==================================================
Total citations: 47
Verified: 45
Not found: 2
Potentially hallucinated citations:
[31] Fakeman AB, Notreal CD. The impact of quantum healing on chronic...
[32] Smith J. Nonexistent study on fake outcomes in imaginary patients...
from citecheck import check
# Check a file
report = check("my_paper.docx")
# Or raw text
report = check("1. Smith J. Some paper. Nature. 2023;600:1-10.")
# Results
print(report.summary())
print(f"Verified: {report.verified}/{report.total}")
if report.has_hallucinations:
for r in report.hallucinated:
print(f" [{r.number}] {r.raw[:80]}")
# Export
report.to_json() # JSON report
report.to_markdown() # Markdown tableLevel 1 checks if a paper exists. Level 2 checks if it says what you claim.
from citecheck import deep_check
result = deep_check(
claim="Drug X reduces mortality by 40%",
source_title="Effect of Drug X on cardiovascular outcomes",
source_abstract="...our trial showed a 15% reduction in mortality...",
)
print(result.verdict) # "partially_supported"
print(result.explanation) # "The paper reports 15% reduction, not 40%"Level 2 requires an LLM API key:
pip install 'citecheck[deep]'
| Feature | Description |
|---|---|
| Multi-source verification | CrossRef, PubMed, Semantic Scholar, OpenAlex (240M+ papers) |
| No API key needed | Level 1 uses only free, public APIs |
| Multiple formats | Reads .docx, .pdf, .txt, .md |
| Smart matching | Fuzzy title matching catches minor differences |
| DOI detection | Automatically extracts and verifies DOIs |
| CLI + Python API | Use from terminal or import in your code |
| JSON/Markdown export | Machine-readable reports for automation |
| GitHub Action | Verify citations in CI/CD |
| Level 2 deep check | Verify claim-source alignment (optional, needs LLM) |
Add citation checking to your CI pipeline:
# .github/workflows/citecheck.yml
name: CiteCheck
on: [pull_request]
jobs:
check-citations:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: tuyentran-md/citecheck@main
with:
file: paper/manuscript.docx
fail-on-hallucination: trueEvery PR that touches your manuscript will be automatically checked. Hallucinated citations block the merge.
We prompted major LLMs to write literature reviews and verified every citation they generated.
| Rank | Model | Citations | Accuracy | Hallucination Rate |
|---|---|---|---|---|
| 🥇 | coming soon | — | — | — |
| 🥈 | coming soon | — | — | — |
| 🥉 | coming soon | — | — | — |
Contribute benchmark data! See benchmark/ for instructions.
# Run the benchmark yourself
pip install 'citecheck[all]'
python -m benchmark.collect --model gpt-4o --n 10
python -m benchmark.evaluateYour manuscript
│
▼
Extract references (numbered list at end of document)
│
▼
For each reference:
├─ Extract DOI (if present) → direct lookup
└─ Bibliographic query → fuzzy title match
│
▼
Check against (in order):
1. CrossRef (150M+ works)
2. PubMed (36M+ biomedical)
3. Semantic Scholar (200M+ papers)
4. OpenAlex (240M+ works)
│
▼
Report: ✅ verified or ❌ not found
# Core (text files only, no external dependencies beyond requests)
pip install citecheck
# With DOCX support
pip install 'citecheck[docx]'
# With PDF support
pip install 'citecheck[pdf]'
# With Level 2 deep verification
pip install 'citecheck[deep]'
# Everything
pip install 'citecheck[all]'citecheck my_paper.docx # Basic check
citecheck paper.pdf --email you@uni.edu # Faster (polite API pool)
citecheck paper.txt --format json -o report.json # JSON output
citecheck paper.md --format markdown # Markdown table
citecheck paper.docx --min-similarity 0.70 # Stricter matching
citecheck paper.docx --sources crossref,pubmed # Specific databases only
- Researchers using AI to draft papers — verify before submission
- Journal editors — screen submissions for fake references
- Systematic review teams — batch-verify hundreds of references
- AI tool developers — add citation verification to your pipeline
Contributions welcome! Especially:
- Benchmark data from additional LLMs
- Parser improvements for different reference formats
- Bug reports with example manuscripts
git clone https://github.com/tuyentran-md/citecheck.git
cd citecheck
pip install -e ".[dev,all]"
pytest tests/If you use CiteCheck in your research, please cite:
@software{citecheck2026,
title = {CiteCheck: Verify citations in AI-assisted manuscripts},
author = {Tran, Tuyen},
year = {2026},
url = {https://github.com/tuyentran-md/citecheck}
}MIT