Complete reference for all RagTune commands and flags.
| Command | Purpose |
|---|---|
ingest |
Load documents into vector store |
explain |
Debug retrieval for a single query with score distribution analysis |
simulate |
Batch benchmark with metrics (Recall, MRR, NDCG, Coverage, NeedleCoverage) + failure analysis |
compare |
Compare embedders or configs |
report |
Generate markdown reports |
import-queries |
Import queries from CSV or JSON |
audit |
Quick health check with pass/fail |
Splits documents into chunks, generates embeddings, stores in vector DB.
ragtune ingest ./docs --collection prod --chunk-size 512 --embedder ollama| Flag | Default | Description |
|---|---|---|
--collection |
required | Collection name |
--embedder |
openai |
Embedding backend |
--chunk-size |
512 |
Characters per chunk |
--chunk-overlap |
64 |
Overlap between chunks |
--store |
qdrant |
Vector store backend |
Reading documents from ./docs...
Found 42 documents
Created 187 chunks (avg 489 chars)
Using embedding dimension: 768 (auto-detected from ollama)
✓ Ingested 187 chunks into collection 'prod'
Shows exactly what chunks are retrieved for one query. Use --save to build your test suite incrementally.
ragtune explain "How do I reset my password?" --collection prod --save| Flag | Default | Description |
|---|---|---|
--collection |
required | Collection name |
--embedder |
openai |
Embedding backend |
--top-k |
5 |
Results to retrieve |
--save |
false |
Save query to golden queries file |
--golden-file |
golden-queries.json |
Path to golden queries file |
--relevant |
(inferred) | Explicit relevant doc path |
- Score statistics (range, mean, std dev)
- Quartiles (Q1, median, Q3) and distribution shape
- Top gap analysis (distance between #1 and #2)
- Automatic insights and warnings
| Signal | Meaning | Action |
|---|---|---|
| Score > 0.85 | Strong match | Good retrieval |
| Score 0.60-0.85 | Moderate match | May need tuning |
| Score < 0.60 | Weak match | Check chunk size, embedder |
| Right doc missing | Retrieval failure | Increase chunk size or try different embedder |
| All scores similar | No clear winner | Query may be too vague |
Runs many queries, computes aggregate metrics. Use --ci for automated quality gates. Supports optional needle annotations for sub-document content coverage analysis.
ragtune simulate --collection prod --queries golden-queries.json| Flag | Default | Description |
|---|---|---|
--collection |
required | Collection name |
--queries |
required | Path to queries JSON file |
--embedder |
openai |
Embedding backend |
--top-k |
5 |
Results to retrieve |
| Flag | Default | Description |
|---|---|---|
--ci |
false |
Enable CI mode (exit 1 if thresholds fail) |
--min-recall |
0 |
Minimum Recall@K threshold |
--min-mrr |
0 |
Minimum MRR threshold |
--min-coverage |
0 |
Minimum Coverage threshold |
--max-latency-p95 |
0 |
Maximum p95 latency in ms (0 = no limit) |
| Flag | Default | Description |
|---|---|---|
--bootstrap |
0 |
Number of bootstrap samples for confidence intervals (0 = disabled) |
--bootstrap-seed |
42 |
Random seed for reproducibility |
When --bootstrap N is set, metrics are computed N times using random sampling with replacement. Output includes mean ± standard deviation for each metric, enabling you to distinguish real changes from random variance.
Standard Recall@K checks whether the right document was retrieved, but it can't tell you whether the retrieved chunks actually contain the specific content needed to answer the query. A chunk from the right document may contain surrounding context (preambles, recitals, related sections) rather than the operative text.
NeedleCoverage@K solves this. You annotate queries with "needles" — short text spans that must be present in the retrieved chunks for a complete answer. RagTune checks each needle against the concatenated retrieved text (case-insensitive) and reports the fraction found.
Queries file with needles:
{
"queries": [{
"id": "q1",
"text": "What fines can be imposed under the GDPR?",
"relevant_docs": ["gdpr.txt"],
"needles": [
{"text": "up to 20 000 000 EUR", "source": "Art 83(5)", "difficulty": "easy"},
{"text": "up to 10 000 000 EUR", "source": "Art 83(4)", "difficulty": "easy"},
{"text": "imposed on public authorities", "source": "Art 83(7)", "difficulty": "hard"}
]
}]
}Each needle has a required text field (the span to search for) and optional source and difficulty fields (for your own tracking). If a query has no needles field, it is skipped for this metric — existing queries files work unchanged.
# Basic run
ragtune simulate --collection prod --queries golden.json
# With needle coverage (just use a queries file that has needles)
ragtune simulate --collection prod --queries needles.json --embedder ollama
# With bootstrap confidence intervals
ragtune simulate --collection prod --queries golden.json --bootstrap 20
# Output: Recall@5: 0.664 ± 0.012 (n=20)
# CI mode with thresholds
ragtune simulate --collection prod --queries golden.json \
--ci --min-recall 0.85 --min-coverage 0.90Exit code 1 if thresholds not met.
Compares collections (different chunk sizes) or embedders.
# Compare chunk sizes
ragtune compare --collections prod-256,prod-512,prod-1024 --queries queries.json
# Compare embedders (auto-ingests)
ragtune compare --embedders ollama,openai --docs ./docs --queries queries.json| Flag | Default | Description |
|---|---|---|
--collections |
Comma-separated collection names | |
--embedders |
Comma-separated embedder names | |
--docs |
Path to documents (required with --embedders) |
|
--queries |
required | Path to queries JSON file |
Pass/fail health report with recommendations. Great for daily checks or exec summaries.
ragtune audit --collection prod --queries golden-queries.json| Flag | Default | Description |
|---|---|---|
--collection |
required | Collection name |
--queries |
required | Path to queries JSON file |
--min-recall |
0.85 |
Minimum Recall@K threshold |
--min-mrr |
0.70 |
Minimum MRR threshold |
--min-coverage |
0.90 |
Minimum Coverage threshold |
--max-latency-p95 |
0 |
Maximum p95 latency in ms (0 = no limit) |
Returns exit code 0 (pass) or 1 (fail).
Creates Markdown or JSON report from a simulation run.
ragtune report --input runs/latest.json --format markdown > report.md| Flag | Default | Description |
|---|---|---|
--input |
required | Path to simulation run JSON |
--format |
markdown |
Output format (markdown or json) |
Import queries from CSV or JSON files.
ragtune import-queries queries.csv --output golden-queries.jsonquery,relevant_docs
"How do I reset password?",docs/auth/password.md
"What are rate limits?",docs/api/limits.md;docs/api/quotas.mdHeader required. Use semicolon for multiple relevant docs.
These flags work across most commands:
| Flag | Default | Description |
|---|---|---|
--collection |
required | Collection name |
--embedder |
openai |
Embedding backend (ollama, openai, tei, cohere, voyage) |
--top-k |
5 |
Results to retrieve |
--store |
qdrant |
Vector store backend |
| Flag | Description |
|---|---|
--store qdrant |
Use Qdrant (default) |
--store pgvector --pgvector-url URL |
Use PostgreSQL with pgvector |
--store weaviate --weaviate-host HOST |
Use Weaviate |
--store chroma --chroma-url URL |
Use ChromaDB |
--store pinecone --pinecone-host HOST --pinecone-api-key KEY |
Use Pinecone |
| Embedder | Required Flags / Environment |
|---|---|
ollama |
Ollama must be running locally |
openai |
OPENAI_API_KEY environment variable |
tei |
--tei-addr http://localhost:8080 |
cohere |
COHERE_API_KEY environment variable |
voyage |
VOYAGE_API_KEY environment variable, optional --voyage-model |