Kreuzberg is a polyglot document intelligence framework built around a high-performance Rust core. It helps developers extract text, structure, metadata, and embeddings from 56+ document formats at native speed, without requiring GPUs.
Kreuzberg is and will remain MIT-licensed and open-source. We're currently building a hosted cloud service around it to make document processing reliable, scalable, and easy to integrate into modern pipelines.
A high-performance, extensible document intelligence engine.
- Rust core with streaming parsers and full parallelism
- Native bindings for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript(Node/Bun/Wasm/Deno)
- 56+ supported formats including PDF, Office, images, HTML, XML, email, archives, and scientific formats
- OCR with table extraction (Tesseract, EasyOCR, PaddleOCR, extensible via plugins)
- Built-in semantic chunking and optional embeddings for RAG pipelines
- CLI, REST API, Docker images, and MCP server
Read more: https://kreuzberg.dev/
A fully managed document intelligence API powered by the same engine.
Planned features include:
- Hosted REST API
- Async jobs and webhooks
- Built-in chunking for RAG pipelines
- Premium OCR backends
- Usage dashboards and analytics
- Simple pay-as-you-go pricing
A high-performance HTML โ Markdown converter powered by Rust. Available as a Rust crate, Python package, PHP extension, Ruby gem, Elixir Rustler NIF, Node.js bindings, WebAssembly, and a standalone CLI- with identical rendering behavior across platforms.
- Truly polyglot: same engine across languages
- High throughput: optimized for batch workloads and multi-GB documents
- Memory efficient: streaming architecture keeps memory usage predictable
- Flexible deployment: use via CLI, REST API, MCP server and more
- MIT licensed: safe for enterprise, commercial, and closed-source use
- Built for RAG: native chunking, embeddings, and customization
Join our dev community to ask questions, share feedback, and show what youโre building.
Discord: https://discord.gg/xzx4KkAPED
Subreddit: https://www.reddit.com/r/kreuzberg_dev/
LinkedIn: https://www.linkedin.com/company/kreuzberg-dev/
X/Twitter: https://x.com/kreuzberg_dev
Contributions are welcome.
- Open an issue to propose a change
- Submit a pull request
- Maintainers review and merge
See CONTRIBUTING.md in the relevant repository for details.
Kreuzberg repository: https://github.com/kreuzberg-dev/kreuzberg
All open-source code is MIT licensed. Itโs permissive, enterprise-safe, and commercial-friendly. That means you can use Kreuzberg freely in both commercial and closed-source products with no obligations, no viral effects, and no licensing restrictions.
Built with love in Kreuzberg, Berlin.