RAG Agent 🤖

A production-ready Retrieval-Augmented Generation (RAG) agent built with LangGraph, ChromaDB, and Streamlit. This system enables intelligent Q&A interactions with PDF documents through a modern web interface.

🚀 Features

🔍 Intelligent Document Retrieval: ChromaDB vector database for semantic document search
🧠 Advanced AI: Powered by Google Gemini 2.0 Flash for high-quality responses
⚡ LangGraph Workflow: Structured RAG pipeline with retrieve → generate workflow
🎨 Modern UI: Dark-themed Streamlit interface with custom styling
🛡️ Production Security: API key validation, rate limiting, and error handling
💾 Persistent Storage: Automatic vector database persistence and reuse
🧪 Comprehensive Testing: Full test suite with unit, integration, and performance tests
⚙️ CI/CD Ready: GitHub Actions workflow for automated testing and deployment

🏗️ Architecture

┌─────────────────┐    ┌──────────────┐    ┌─────────────────┐
│   Streamlit     │───▶│  RAG Agent   │───▶│   ChromaDB      │
│   Frontend      │    │  (LangGraph) │    │  Vector Store   │
└─────────────────┘    └──────────────┘    └─────────────────┘
        │                       │                     │
        │                       ▼                     │
        │              ┌──────────────┐               │
        └──────────────│ Google Gemini│◀──────────────┘
                      │  2.0 Flash   │
                      └──────────────┘

📁 Project Structure

rag_agent/
├── src/
│   ├── agent/
│   │   ├── __init__.py
│   │   └── agent_core.py          # Core RAG logic with LangGraph
│   └── utils/
│       ├── __init__.py
│       ├── load_db.py             # Vector database setup
│       ├── security.py            # API key validation
│       └── rate_limit.py          # Request rate limiting
├── tests/
│   ├── conftest.py                # Test fixtures and configuration
│   ├── test_agent_core.py         # Agent functionality tests
│   ├── test_integration.py        # End-to-end tests
│   └── test_streamlit_app.py      # UI component tests
├── .streamlit/
│   ├── config.toml                # Streamlit theming
│   └── secrets.toml.example       # Environment variables template
├── .github/
│   └── workflows/                 # CI/CD configuration
├── resources/
│   └── *.pdf                      # Sample documents
├── streamlit_app.py               # Main Streamlit application
├── requirements.txt               # Production dependencies
└── pyproject.toml                 # Development configuration

🛠️ Installation

Prerequisites

Python 3.9+
Google AI API Key (Get it here)

Quick Start

Clone the repository

git clone https://github.com/CJRockball/rag_agent.git
cd rag_agent

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

# Copy the example secrets file
cp .streamlit/secrets.toml.example .streamlit/secrets.toml

# Edit with your configuration
nano .streamlit/secrets.toml

Add your configuration:

GOOGLE_API_KEY = "your-google-api-key-here"
CHROMA_DB_PATH = "src/utils/vectorstore/db_chroma"
COLLECTION_NAME = "v_db"
DOC_PATH = "resources/your-document.pdf"

Run the application
```
streamlit run streamlit_app.py
```

💡 Usage

Web Interface

Launch the Streamlit app
The system automatically indexes your PDF documents
Ask questions in natural language
Get AI-powered answers with source context

API Integration

from src.agent.agent_core import ask_rag_with_connection
from src.utils.load_db import setup_vector_database

# Initialize database
db = setup_vector_database(
    doc_path="path/to/document.pdf",
    db_path="vectorstore/db_chroma", 
    collection_name="documents"
)

# Ask questions
response = ask_rag_with_connection("What is the main topic?", db)
print(response)

🧪 Testing

The project includes comprehensive testing with 90%+ coverage:

Run Tests

# All tests
pytest tests/

# Unit tests only
pytest tests/ -m unit

# Integration tests
pytest tests/ -m integration

# With coverage report
pytest tests/ --cov=. --cov-report=html

Test Categories

Unit Tests: Individual component testing with mocking
Integration Tests: End-to-end pipeline validation
Performance Tests: Response time and memory usage
Error Handling: Resilience under failure conditions

⚙️ Configuration

Environment Variables

Variable	Description	Default
`GOOGLE_API_KEY`	Google AI API key	Required
`CHROMA_DB_PATH`	Vector database path	`src/utils/vectorstore/db_chroma`
`COLLECTION_NAME`	ChromaDB collection name	`v_db`
`DOC_PATH`	Path to PDF document	`resources/document.pdf`

Rate Limiting

Built-in rate limiting prevents API abuse:

10 requests per minute per session
0.1 requests/second to LLM (Gemini free tier)
Automatic backoff and retry logic

Customization

Theming: Modify .streamlit/config.toml for UI customization
Chunking: Adjust chunk_size and chunk_overlap in load_db.py
Retrieval: Change k parameter in agent_core.py for more/fewer results

🔧 Development

Development Setup

# Install development dependencies
pip install -r dev-requirements.txt

# Install pre-commit hooks
pre-commit install

# Run code formatting
black src/ tests/

Adding New Features

Implement feature with proper typing
Add comprehensive tests
Update documentation
Ensure CI/CD passes

Code Quality

Black: Code formatting
pytest: Testing framework
pre-commit: Git hooks for quality checks
Type hints: Full typing support

🚀 Deployment

Streamlit Cloud

Fork this repository
Connect to Streamlit Cloud
Add secrets in deployment settings
Deploy automatically

Docker (Optional)

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "streamlit_app.py"]

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with tests
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Contribution Guidelines

Follow existing code style and patterns
Add tests for new functionality
Update documentation as needed
Ensure all CI/CD checks pass

📋 Dependencies

Core Dependencies

streamlit: Web interface framework
langchain-core: LangChain core components
langchain-google-genai: Google AI integration
langchain-chroma: ChromaDB vector store
langgraph: Agent workflow orchestration
pypdf: PDF document processing

Development Dependencies

pytest: Testing framework
pytest-cov: Coverage reporting
black: Code formatting
pre-commit: Git hooks

🐛 Troubleshooting

Common Issues

"API key not found"

Ensure GOOGLE_API_KEY is set in .streamlit/secrets.toml
Verify the API key format starts with "AIza"

"Database connection failed"

Check CHROMA_DB_PATH permissions
Ensure sufficient disk space for vector database

"Rate limit exceeded"

Wait 60 seconds for rate limit reset
Check Gemini API quotas in Google AI Studio

"Module import errors"

Verify all dependencies installed: pip install -r requirements.txt
Check Python path: export PYTHONPATH=$PYTHONPATH:$(pwd)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LangChain for the RAG framework
Google AI for Gemini API and embeddings
ChromaDB for vector database capabilities
Streamlit for the interactive web interface

📞 Support

For issues and questions:

Open an Issue
Check the Documentation
Review existing Discussions

Built with ❤️ for intelligent document interaction

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflow		.github/workflow
.streamlit		.streamlit
resources		resources
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
dev-requirements.txt		dev-requirements.txt
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
run_tests.py		run_tests.py
setup.cfg		setup.cfg
streamlit_app.py		streamlit_app.py
test-requirements.txt		test-requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RAG Agent 🤖

🚀 Features

🏗️ Architecture

📁 Project Structure

🛠️ Installation

Prerequisites

Quick Start

💡 Usage

Web Interface

API Integration

🧪 Testing

Run Tests

Test Categories

⚙️ Configuration

Environment Variables

Rate Limiting

Customization

🔧 Development

Development Setup

Adding New Features

Code Quality

🚀 Deployment

Streamlit Cloud

Docker (Optional)

🤝 Contributing

Contribution Guidelines

📋 Dependencies

Core Dependencies

Development Dependencies

🐛 Troubleshooting

Common Issues

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages