Skip to content

Latest commit

 

History

History
161 lines (112 loc) · 4.37 KB

File metadata and controls

161 lines (112 loc) · 4.37 KB

Biobtree Documentation

Welcome to the Biobtree documentation. Biobtree provides unified access to 70+ biological databases through intuitive chain queries.

Quick Navigation

Section Description
Getting Started Installation, quickstart, configuration
Concepts Architecture, data model, query model
API Reference REST API, query syntax, filters
MCP Server LLM integration, Claude Desktop setup
Datasets All 70+ supported databases
Development Contributing, adding datasets, testing
Internals Technical deep-dives (k-way merge, bucket system)

Getting Started

Quickstart

# Build all datasets (production - runs in background)
./bb.sh                      # Update all datasets
./bb.sh --status             # Check progress
./bb.sh --generate           # Build database
./bb.sh --activate           # Activate new version
./bb.sh --web                # Start web server (localhost:9292)

# Query via API
curl "localhost:9292/ws/map/?i=BRCA1&m=>>ensembl>>uniprot&mode=lite"

Build Management

# Update specific datasets
./bb.sh --only uniprot,chembl      # Update specific datasets
./bb.sh --from pubchem             # Resume from dataset
./bb.sh --check                    # Check for source changes

# Database versions
./bb.sh --db-versions              # Show versions
./bb.sh --activate                 # Activate latest
./bb.sh --cleanup                  # Remove old versions

Core Concepts

Chain Query Syntax

Use >> to traverse datasets:

identifier >> dataset1 >> dataset2 >> dataset3

Examples:

# Gene symbol → Ensembl → UniProt → Drug targets
curl "localhost:9292/ws/map/?i=TP53&m=>>ensembl>>uniprot>>chembl_target&mode=lite"

# Protein → Pathways
curl "localhost:9292/ws/map/?i=P04637&m=>>reactome&mode=lite"

# Disease → Genes
curl "localhost:9292/ws/map/?i=breast%20cancer&m=>>mondo>>gencc>>hgnc&mode=lite"

Filters

Apply CEL-based filters at any step:

# Reviewed proteins only
curl "localhost:9292/ws/map/?i=TP53&m=>>uniprot[reviewed==true]&mode=lite"

# High-resolution structures
curl "localhost:9292/ws/map/?i=P04637&m=>>pdb[resolution<2.0]&mode=lite"

# Pathogenic variants
curl "localhost:9292/ws/map/?i=BRCA1&m=>>alphamissense[am_class==\"likely_pathogenic\"]&mode=lite"

Response Modes

  • lite: Compact mode it is recommened mode especially for LLM.
  • full: Complete data avoid unless lite mode is not enough.
curl "localhost:9292/ws/map/?i=TP53&m=>>ensembl>>uniprot&mode=lite"

Dataset Categories

Biobtree integrates 70+ databases across these categories:

Category Examples
Genomics Ensembl, HGNC, Entrez, RefSeq, dbSNP
Proteins UniProt, AlphaFold, PDB, InterPro
Chemistry ChEMBL, PubChem, ChEBI, HMDB
Pathways Reactome, STRING, IntAct, SIGNOR
Disease ClinVar, MONDO, HPO, Orphanet, GWAS
Ontologies GO, EFO, UBERON, Cell Ontology
Expression Bgee, CELLxGENE, FANTOM5, SCXA

See Datasets Index for the complete list.


Web API

# Search
GET /ws/?i={terms}&s={dataset}&mode={full|lite}

# Map through datasets
GET /ws/map/?i={terms}&m={chain}&mode={full|lite}

# Get entry details
GET /ws/entry/?i={identifier}&s={dataset}

# List all datasets
GET /ws/meta

See API Reference for full documentation.


MCP Server (LLM Integration)

Biobtree includes an MCP server for Claude Desktop/CLI integration:

cd mcp_srv
python -m mcp_srv --mode http

Tools available:

  • biobtree_search - Search 70+ databases
  • biobtree_map - Map through dataset chains
  • biobtree_entry - Get full entry details
  • biobtree_meta - List available datasets

See MCP Server Documentation for setup instructions.


Resources