CiteCheck

Verify that every citation in your manuscript is real.
Catch hallucinated references before submission.

The Problem

You wrote a research paper with AI assistance. The references look real. But are they?

LLMs hallucinate citations. They generate author names, journal titles, and DOIs that look plausible but don't exist. A single fake reference in a published paper can lead to retraction, damaged reputation, and wasted time for everyone who cites your work.

CiteCheck verifies every citation in your manuscript against real academic databases. In seconds.

Quick Start

pip install citecheck

citecheck my_paper.docx

CiteCheck v0.1.0
Checking: my_paper.docx

Found 47 references. Verifying...
  [1] McMurray JJV, Solomon SD, et al. Dapagliflozin in...    ✅ crossref
  [2] Gandhi L, Rodriguez-Abreu D, et al. Pembrolizumab...    ✅ pubmed
  ...
  [31] Fakeman AB, Notreal CD. The impact of quantum...        ❌ not found
  [32] Smith J. Nonexistent study on fake outcomes...           ❌ not found
  ...

CiteCheck Report
==================================================
Total citations:  47
Verified:         45
Not found:        2

Potentially hallucinated citations:
  [31] Fakeman AB, Notreal CD. The impact of quantum healing on chronic...
  [32] Smith J. Nonexistent study on fake outcomes in imaginary patients...

Python API

from citecheck import check

# Check a file
report = check("my_paper.docx")

# Or raw text
report = check("1. Smith J. Some paper. Nature. 2023;600:1-10.")

# Results
print(report.summary())
print(f"Verified: {report.verified}/{report.total}")

if report.has_hallucinations:
    for r in report.hallucinated:
        print(f"  [{r.number}] {r.raw[:80]}")

# Export
report.to_json()       # JSON report
report.to_markdown()   # Markdown table

Level 2: Does the citation actually support your claim?

Level 1 checks if a paper exists. Level 2 checks if it says what you claim.

from citecheck import deep_check

result = deep_check(
    claim="Drug X reduces mortality by 40%",
    source_title="Effect of Drug X on cardiovascular outcomes",
    source_abstract="...our trial showed a 15% reduction in mortality...",
)

print(result.verdict)      # "partially_supported"
print(result.explanation)  # "The paper reports 15% reduction, not 40%"

Level 2 requires an LLM API key: pip install 'citecheck[deep]'

Features

Feature	Description
Multi-source verification	CrossRef, PubMed, Semantic Scholar, OpenAlex (240M+ papers)
No API key needed	Level 1 uses only free, public APIs
Multiple formats	Reads .docx, .pdf, .txt, .md
Smart matching	Fuzzy title matching catches minor differences
DOI detection	Automatically extracts and verifies DOIs
CLI + Python API	Use from terminal or import in your code
JSON/Markdown export	Machine-readable reports for automation
GitHub Action	Verify citations in CI/CD
Level 2 deep check	Verify claim-source alignment (optional, needs LLM)

GitHub Action

Add citation checking to your CI pipeline:

# .github/workflows/citecheck.yml
name: CiteCheck
on: [pull_request]

jobs:
  check-citations:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: tuyentran-md/citecheck@main
        with:
          file: paper/manuscript.docx
          fail-on-hallucination: true

Every PR that touches your manuscript will be automatically checked. Hallucinated citations block the merge.

Benchmark: Which LLM hallucinates citations the most?

We prompted major LLMs to write literature reviews and verified every citation they generated.

Rank	Model	Citations	Accuracy	Hallucination Rate
🥇	coming soon	—	—	—
🥈	coming soon	—	—	—
🥉	coming soon	—	—	—

Contribute benchmark data! See benchmark/ for instructions.

# Run the benchmark yourself
pip install 'citecheck[all]'
python -m benchmark.collect --model gpt-4o --n 10
python -m benchmark.evaluate

How it works

Your manuscript
     │
     ▼
 Extract references (numbered list at end of document)
     │
     ▼
 For each reference:
     ├─ Extract DOI (if present) → direct lookup
     └─ Bibliographic query → fuzzy title match
     │
     ▼
 Check against (in order):
     1. CrossRef (150M+ works)
     2. PubMed (36M+ biomedical)
     3. Semantic Scholar (200M+ papers)
     4. OpenAlex (240M+ works)
     │
     ▼
 Report: ✅ verified  or  ❌ not found

Installation options

# Core (text files only, no external dependencies beyond requests)
pip install citecheck

# With DOCX support
pip install 'citecheck[docx]'

# With PDF support
pip install 'citecheck[pdf]'

# With Level 2 deep verification
pip install 'citecheck[deep]'

# Everything
pip install 'citecheck[all]'

CLI options

citecheck my_paper.docx                          # Basic check
citecheck paper.pdf --email you@uni.edu           # Faster (polite API pool)
citecheck paper.txt --format json -o report.json  # JSON output
citecheck paper.md --format markdown              # Markdown table
citecheck paper.docx --min-similarity 0.70        # Stricter matching
citecheck paper.docx --sources crossref,pubmed    # Specific databases only

Who is this for?

Researchers using AI to draft papers — verify before submission
Journal editors — screen submissions for fake references
Systematic review teams — batch-verify hundreds of references
AI tool developers — add citation verification to your pipeline

Contributing

Contributions welcome! Especially:

Benchmark data from additional LLMs
Parser improvements for different reference formats
Bug reports with example manuscripts

git clone https://github.com/tuyentran-md/citecheck.git
cd citecheck
pip install -e ".[dev,all]"
pytest tests/

Citation

If you use CiteCheck in your research, please cite:

@software{citecheck2026,
  title = {CiteCheck: Verify citations in AI-assisted manuscripts},
  author = {Tran, Tuyen},
  year = {2026},
  url = {https://github.com/tuyentran-md/citecheck}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.streamlit		.streamlit
action		action
benchmark		benchmark
citecheck		citecheck
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CiteCheck

The Problem

Quick Start

Python API

Level 2: Does the citation actually support your claim?

Features

GitHub Action

Benchmark: Which LLM hallucinates citations the most?

How it works

Installation options

CLI options

Who is this for?

Contributing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CiteCheck

The Problem

Quick Start

Python API

Level 2: Does the citation actually support your claim?

Features

GitHub Action

Benchmark: Which LLM hallucinates citations the most?

How it works

Installation options

CLI options

Who is this for?

Contributing

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages