The Smart Document Router is an open source document processing data layer.
- It ingests unstructured docs through REST APIs and integrations from faxes, emails, and ERPs.
- It processes documents at scale with OCR and LLMs
- And it chunks, embeds, and organizes documents into queriable knowledge bases
The Document Router is designed to work standalone or with a human-in-the-loop, and can process medical, insurance, financial, supply chain, and legal documents.
It acts as a system of record for the extraction schemas and prompts, and it is portable over all major clouds and LLM providers.
A Document Agent is available to configure prompts and extractions, and to review processed results.
- NextJS, NextAuth, MaterialUI, TailwindCSS
- FastAPI
- MongoDB
- Pydantic
- LiteLLM
- OpenAI, Anthropic, Gemini, Vertex AI for GCP, AWS Bedrock, xAI, OpenRouter...
PyData Boston DocRouter Slides (Feb '24) have more details about tech stack, and how Cursor AI was used to build the DocRouter.
- Smart Document Router Slides from Boston PyData, Spring 2025
- DocRouter.AI: Adventures in CSS and AI Coding, Summer 2025
- Installation
- Development


