Skip to content

tuyentran-md/citecheck

Repository files navigation

CiteCheck

Verify that every citation in your manuscript is real.
Catch hallucinated references before submission.

PyPI CI License Python


The Problem

You wrote a research paper with AI assistance. The references look real. But are they?

LLMs hallucinate citations. They generate author names, journal titles, and DOIs that look plausible but don't exist. A single fake reference in a published paper can lead to retraction, damaged reputation, and wasted time for everyone who cites your work.

CiteCheck verifies every citation in your manuscript against real academic databases. In seconds.

Quick Start

pip install citecheck
citecheck my_paper.docx
CiteCheck v0.1.0
Checking: my_paper.docx

Found 47 references. Verifying...
  [1] McMurray JJV, Solomon SD, et al. Dapagliflozin in...    ✅ crossref
  [2] Gandhi L, Rodriguez-Abreu D, et al. Pembrolizumab...    ✅ pubmed
  ...
  [31] Fakeman AB, Notreal CD. The impact of quantum...        ❌ not found
  [32] Smith J. Nonexistent study on fake outcomes...           ❌ not found
  ...

CiteCheck Report
==================================================
Total citations:  47
Verified:         45
Not found:        2

Potentially hallucinated citations:
  [31] Fakeman AB, Notreal CD. The impact of quantum healing on chronic...
  [32] Smith J. Nonexistent study on fake outcomes in imaginary patients...

Python API

from citecheck import check

# Check a file
report = check("my_paper.docx")

# Or raw text
report = check("1. Smith J. Some paper. Nature. 2023;600:1-10.")

# Results
print(report.summary())
print(f"Verified: {report.verified}/{report.total}")

if report.has_hallucinations:
    for r in report.hallucinated:
        print(f"  [{r.number}] {r.raw[:80]}")

# Export
report.to_json()       # JSON report
report.to_markdown()   # Markdown table

Level 2: Does the citation actually support your claim?

Level 1 checks if a paper exists. Level 2 checks if it says what you claim.

from citecheck import deep_check

result = deep_check(
    claim="Drug X reduces mortality by 40%",
    source_title="Effect of Drug X on cardiovascular outcomes",
    source_abstract="...our trial showed a 15% reduction in mortality...",
)

print(result.verdict)      # "partially_supported"
print(result.explanation)  # "The paper reports 15% reduction, not 40%"

Level 2 requires an LLM API key: pip install 'citecheck[deep]'

Features

Feature Description
Multi-source verification CrossRef, PubMed, Semantic Scholar, OpenAlex (240M+ papers)
No API key needed Level 1 uses only free, public APIs
Multiple formats Reads .docx, .pdf, .txt, .md
Smart matching Fuzzy title matching catches minor differences
DOI detection Automatically extracts and verifies DOIs
CLI + Python API Use from terminal or import in your code
JSON/Markdown export Machine-readable reports for automation
GitHub Action Verify citations in CI/CD
Level 2 deep check Verify claim-source alignment (optional, needs LLM)

GitHub Action

Add citation checking to your CI pipeline:

# .github/workflows/citecheck.yml
name: CiteCheck
on: [pull_request]

jobs:
  check-citations:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: tuyentran-md/citecheck@main
        with:
          file: paper/manuscript.docx
          fail-on-hallucination: true

Every PR that touches your manuscript will be automatically checked. Hallucinated citations block the merge.

Benchmark: Which LLM hallucinates citations the most?

We prompted major LLMs to write literature reviews and verified every citation they generated.

Rank Model Citations Accuracy Hallucination Rate
🥇 coming soon
🥈 coming soon
🥉 coming soon

Contribute benchmark data! See benchmark/ for instructions.

# Run the benchmark yourself
pip install 'citecheck[all]'
python -m benchmark.collect --model gpt-4o --n 10
python -m benchmark.evaluate

How it works

Your manuscript
     │
     ▼
 Extract references (numbered list at end of document)
     │
     ▼
 For each reference:
     ├─ Extract DOI (if present) → direct lookup
     └─ Bibliographic query → fuzzy title match
     │
     ▼
 Check against (in order):
     1. CrossRef (150M+ works)
     2. PubMed (36M+ biomedical)
     3. Semantic Scholar (200M+ papers)
     4. OpenAlex (240M+ works)
     │
     ▼
 Report: ✅ verified  or  ❌ not found

Installation options

# Core (text files only, no external dependencies beyond requests)
pip install citecheck

# With DOCX support
pip install 'citecheck[docx]'

# With PDF support
pip install 'citecheck[pdf]'

# With Level 2 deep verification
pip install 'citecheck[deep]'

# Everything
pip install 'citecheck[all]'

CLI options

citecheck my_paper.docx                          # Basic check
citecheck paper.pdf --email you@uni.edu           # Faster (polite API pool)
citecheck paper.txt --format json -o report.json  # JSON output
citecheck paper.md --format markdown              # Markdown table
citecheck paper.docx --min-similarity 0.70        # Stricter matching
citecheck paper.docx --sources crossref,pubmed    # Specific databases only

Who is this for?

  • Researchers using AI to draft papers — verify before submission
  • Journal editors — screen submissions for fake references
  • Systematic review teams — batch-verify hundreds of references
  • AI tool developers — add citation verification to your pipeline

Contributing

Contributions welcome! Especially:

  • Benchmark data from additional LLMs
  • Parser improvements for different reference formats
  • Bug reports with example manuscripts
git clone https://github.com/tuyentran-md/citecheck.git
cd citecheck
pip install -e ".[dev,all]"
pytest tests/

Citation

If you use CiteCheck in your research, please cite:

@software{citecheck2026,
  title = {CiteCheck: Verify citations in AI-assisted manuscripts},
  author = {Tran, Tuyen},
  year = {2026},
  url = {https://github.com/tuyentran-md/citecheck}
}

License

MIT

About

Verify that every citation in your manuscript is real. Catch hallucinated references before submission.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages