LuminaDoc is a powerful, privacy-focused document analysis tool that brings RAG (Retrieval-Augmented Generation) capabilities to your local environment. Process, analyze, and interact with your documents using advanced LLMs—all without requiring an internet connection.
- 100% Offline Processing: Your documents never leave your machine.
- Built-in RAG Pipeline: Advanced retrieval and generation workflows.
- Local Vector Storage: Supports ChromaDB for efficient retrieval.
- Format Support: PDF, DOCX, TXT, CSV, XLSX.
- Intelligent Chunking: Semantic search with precise document splitting.
- Conversational Interface: Ask questions, get contextual answers.
- Custom Knowledge Base: Build and manage your knowledge repositories.
- Multi-file Upload: Upload and process multiple documents at once.
- Simple Streamlit UI: Clean, native Streamlit interface—no HTML required.
- 📊 Researchers: Analyze private datasets securely.
- ⚖️ Legal Professionals: Handle confidential legal documents locally.
- 💼 Business Analysts: Process sensitive data without cloud dependencies.
- 🧑💻 Developers: Build offline-first AI-powered applications.
- 🔐 Privacy Advocates: Ensure your data stays secure and private.
- Backend: Python 3.9+
- Frontend: Streamlit
- Database: Local Vector DB (ChromaDB)
- Embedding Model: Ollama (
nomic-embed-text) - LLM Support: Local LLMs (e.g.,
llama3.2:latest) - Cross-Encoder: Sentence Transformers (
ms-marco-MiniLM-L-6-v2)
- No Cloud Services: All processing happens locally.
- No API Keys Required: Run seamlessly without third-party dependencies.
- No Data Sharing: Your documents remain secure on your machine.
Follow these steps to install and run LuminaDoc locally:
Open your terminal and clone the LuminaDoc repository:
git clone https://github.com/yourusername/luminadoc.git
cd luminadocCreate an isolated Python environment:
python3 -m venv venv
source venv/bin/activate # On Linux/MacOS
venv\Scripts\activate # On WindowsInstall the required Python packages:
pip install --upgrade pip
pip install -r requirements.txtEnsure Required Dependencies Include:
streamlitlangchain-ollamachromadbsentence-transformersPyMuPDFhttpxlangchain-communitylangchainollama
If Ollama is not installed, run the following commands:
brew install ollamacurl -fsSL https://ollama.com/install.sh | shollama --versionMake sure to download the required models:
ollama pull nomic-embed-text
ollama pull llama3.2:latestStart the Ollama server:
ollama serveVerify the models:
curl http://localhost:11434/api/tagsEnsure the nomic-embed-text and llama3.2 models are listed.
Run the Streamlit app locally:
streamlit run app.pyThe app will start, and you’ll see a URL in your terminal, usually:
Local URL: http://localhost:8501
Open this in your browser.
-
Upload Documents:
- In the sidebar, upload one or more documents (PDF, DOCX, TXT, CSV, XLSX).
- Click on the Process Document(s) button.
- Need another format? Request it in the feedback!
-
Ask Questions:
- Enter your question in the Ask a Question text area.
- Click Ask to retrieve relevant information.
-
View Results:
- The system will show the retrieved context and the AI-generated answer.
- Explore retrieved documents and relevant text via expandable sections.
If you want to customize configurations:
Create a .env file:
OLLAMA_SERVER_URL=http://localhost:11434
VECTOR_DB_PATH=./rag-chromaUpdate your code to load .env using dotenv if needed.
-
Port Conflicts: Ensure Ollama is running on
http://localhost:11434. -
Dependencies Issues: Run
pip install -r requirements.txtagain. -
Streamlit Errors: Clear the cache:
streamlit cache clear
-
Logs: Check server logs for errors:
ollama serve --verbose
-
Document Upload & Processing:
- Documents are uploaded and split into smaller semantic chunks.
- Chunks are embedded using
nomic-embed-textand stored in ChromaDB.
-
User Query:
- Your query searches the vector database for relevant chunks.
- Results are re-ranked using CrossEncoder.
-
LLM Response:
- The context is passed to LLM (e.g., llama3.2) for final generation.
- The answer is displayed in the Streamlit interface.
- Explore advanced features like custom pipelines.
- Experiment with different local LLM models.
- Contribute to the project on GitHub!
Illuminate your documents with AI-powered insights—locally and securely. 🚀📚
