This project delivers a full-stack AI-powered query system that can ingest employee databases and unstructured documents, automatically discover schemas, and serve natural language queries with production-ready optimizations.
- Backend: FastAPI service providing ingestion, schema discovery, and query endpoints
- Frontend: React application for database configuration, document uploads, query interface, and analytics
- Document Processing: Multi-format ingestion with adaptive chunking and embedding generation
- Query Engine: Unified SQL and semantic retrieval with caching, optimization, and monitoring hooks
- Python 3.11+
- Node.js 18+
- PostgreSQL (optional, SQLite works for testing)
- Create virtual environment and install dependencies:
cd backend
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r ../requirements.txt- Run the FastAPI server (dev):
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000- (Optional) Enable background workers with Redis
- Set in
config.yml:
queue:
enabled: true
redis_url: "redis://localhost:6379/0"
queue_name: "ingestion"- Start Redis and a worker in another shell:
# Start worker using config.yml
python backend/worker.py- Test the API:
curl http://localhost:8000/health- Install Node.js dependencies:
cd frontend
npm install- Start the development server:
npm run dev- Point the frontend at the deployed API (local dev):
Create frontend/.env.local so Vite knows which backend URL to call. Restart npm run dev after saving.
VITE_API_BASE_URL=- Access the web interface:
Open http://localhost:5173 in your browser
- Run with Docker Compose:
docker-compose up --buildThis starts:
- Backend API: http://localhost:8000
- Frontend UI: http://localhost:5173
- PostgreSQL: localhost:5432
- Redis: localhost:6379
- Set an environment variable in your hosting provider (Vercel → Project Settings → Environment Variables):
VITE_API_BASE_URL=""- Trigger a new deployment so the SPA builds with the production API base URL.
To persist embeddings and enable scalable semantic search, you can switch the vector store to PgVector.
- Ensure your Postgres has the
pgvectorextension installed. - Update
config.yml:
vector_store:
type: pgvector
connection_string: ${VECTOR_DB_URL} # e.g., postgresql+psycopg://user:pass@localhost:5432/embeddings
table_name: document_chunksIf the PgVector initialization fails, the app will fall back to the in-memory FAISS store.
curl -X POST http://localhost:8000/api/ingest/database \
-H "Content-Type: application/json" \
-d '{"connection_string": "postgresql+psycopg://user:password@localhost:5432/employees"}'curl -X POST http://localhost:8000/api/ingest/documents \
-F "files=@resume.pdf" \
-F "files=@handbook.docx" \
-F "connection_string=postgresql+psycopg://user:password@localhost:5432/employees"You can quickly test CSV uploads using the provided sample files in sample-data/:
curl -X POST http://localhost:8000/api/ingest/documents \
-F "files=@sample-data/employees.csv" \
-F "files=@sample-data/departments.csv" \
-F "files=@sample-data/salaries.csv" \
-F "files=@sample-data/skills.csv" \
-F "connection_string=postgresql+psycopg://user:password@localhost:5432/employees"Then poll the ingestion job status:
curl -s http://localhost:8000/api/ingest/jobs | jqYou can upload CSV/XLSX files and have the backend create/append tables in your Postgres (e.g., Neon/Supabase).
curl -X POST http://localhost:8000/api/ingest/tabular \
-F "files=@sample-data/employees.csv" \
-F "files=@sample-data/departments.csv" \
-F "connection_string=${DATABASE_URL}" \
-F "if_exists=append" # or replace|fail \
-F "table_name=" # optional; defaults to sanitized filename \
-F "delimiter=," # optional for CSV; default , (.tsv uses \t) \
-F "sheet_name=Sheet1" # optional for ExcelThis returns a job id you can monitor with /api/ingest/jobs. Large CSVs are loaded in chunks to avoid memory spikes. By default, tables are appended; use if_exists=replace to overwrite.
curl -X POST http://localhost:8000/api/query \
-H "Content-Type: application/json" \
-d '{"query": "How many employees hired this year?", "top_k": 5}'The query engine can also perform safe UPDATE/INSERT/DELETE operations generated by the LLM. The system avoids adding LIMIT to DML and commits the transaction if successful.
curl -X POST http://localhost:8000/api/query \
-H "Content-Type: application/json" \
-d '{
"connection_string": "postgresql+psycopg://user:password@localhost:5432/employees",
"query": "increase the salary of hr department by 10 percent"
}'Response example:
{
"results": [{"affected_rows": 3, "status": "success"}],
"query_type": "sql",
"metrics": {"response_ms": 120.5, "cache_hit": false}
}Note: Always verify the result (affected_rows) and consider running within a transaction against non-production databases.
If you uploaded documents and lost the job id, you can list current jobs:
curl -s http://localhost:8000/api/ingest/jobsRun the backend test suite:
cd backend
python -m pytest tests/ -vEdit config.yml to customize:
- Database connection pooling
- Embedding model settings
- Cache configuration
- Logging levels
Notes:
- When using PostgreSQL with SQLAlchemy 2.x, prefer the psycopg v3 driver by specifying
postgresql+psycopg://...in the connection string. This avoids requiringpsycopg2. - If running the backend inside Docker Compose, your connection string host should be the service name:
postgresql+psycopg://user:password@postgres:5432/employeesIf running the backend locally against Dockerized Postgres on your machine, use:postgresql+psycopg://user:password@localhost:5432/employees
This project can generate SQL with an LLM (Llama via Groq) for robust natural language understanding, including JOINs and DML. Configure your key in config.yml:
groq:
api_key: "<YOUR_GROQ_API_KEY>"
model: "llama-3.1-8b-instant"Security tip: Prefer setting GROQ_API_KEY via environment variable or secrets management in production.
backend/
├── api/
│ ├── routes/ # FastAPI endpoints
│ ├── services/ # Business logic
│ │ ├── schema_discovery.py # Auto-discover DB schemas
│ │ ├── query_engine.py # Unified query processing
│ │ ├── document_processor.py # Document ingestion
│ │ └── vector_store.py # Semantic search
│ └── models/ # Pydantic schemas
└── tests/ # Unit tests
frontend/src/
├── components/
│ ├── AppLayout.tsx # Main layout
│ └── contexts/ # React contexts
├── App.tsx # Root component
└── main.tsx # Entry point
✅ Schema Discovery: Automatically detects table structures and relationships
✅ Natural Language Queries: Converts English to SQL and semantic search
✅ Document Processing: Handles PDFs, Word docs, CSVs with intelligent chunking
✅ Caching: TTL-based response caching for performance
✅ Multi-format Results: Returns structured data and document excerpts
✅ Query History: Tracks and caches previous queries
✅ LLM-powered SQL: Uses Groq (Llama) to translate natural language to SQL
✅ Data Modification (DML): Natural language UPDATE/INSERT/DELETE with safety guards (no LIMIT on DML, affected_rows returned)
-
No module named 'psycopg2'
- Cause: SQLAlchemy is attempting to use the old psycopg2 driver when your environment only has psycopg v3 installed (as specified in
requirements.txt). - Fix: Use a connection string that explicitly selects psycopg v3:
postgresql+psycopg://user:password@host:5432/dbname. - Optional: You could install
psycopg2-binary, but psycopg2 may not support the newest Python versions promptly (e.g. 3.13). Prefer psycopg v3.
- Cause: SQLAlchemy is attempting to use the old psycopg2 driver when your environment only has psycopg v3 installed (as specified in
-
psycopg import errors on Windows (psycopg v3): "couldn't import psycopg 'c'/'binary' implementation" or "libpq library not found"
-
Cause: Base
psycopgpackage tries to load a compiled implementation and falls back to a pure-Python variant which requireslibpqon your system. On Windows,libpqtypically isn't present by default. -
Fix: Install the binary wheel that bundles
libpq.-
We pin this in
requirements.txtas:psycopg[binary]==3.2.3. -
If you already installed dependencies before this change, update your venv:
# from repo root, using the existing .venv source .venv/Scripts/activate # on Windows bash; PowerShell: .venv\\Scripts\\Activate.ps1 pip uninstall -y psycopg2 psycopg2-binary || true pip install --upgrade "psycopg[binary]==3.2.3"
-
-
Ensure your connection string uses psycopg v3:
postgresql+psycopg://...(notpostgresql://and not+psycopg2).
-
