A comprehensive AI-powered platform that helps developers and organizations analyze, understand, and interact with legacy codebases through intelligent parsing, semantic chunking, and natural language queries.
- π GitHub Integration: Connect and analyze any GitHub repository (Future Scope)
- π§© Smart Code Parsing: Semantically chunk code into meaningful units (functions, classes, blocks)
- π€ AI-Powered Analysis: Extract features and metadata using Google Gemini AI
- π Knowledge Graph: Build interactive visualizations in Neo4j to map code relationships
- π¬ Natural Language Queries: Ask questions about your codebase using GraphRAG
- π Web Interface: User-friendly React.js frontend for seamless interaction
CodeScrybe-Legacy-Repository-AI/
βββ Server/ # Backend API (FastAPI)
β βββ backend/
β β βββ agents/ # Core parsing, graph builder and analysis modules
β β βββ database/ # models & services related to db
β β βββ app.py # FastAPI application
β β βββ pipeline.py # End-to-end processing pipeline
β βββ venv410/ # Python virtual environment
β βββ .env # Environent Variables
β βββ README.md # Server-specific documentation
βββ Client/ # Frontend (React.js)
β βββ src/ # All UI & Business Logic Files
β βββ public/
β βββ package.json
β βββ README.md # Client-specific documentation
βββ .gitignore
βββ README.md # This file
| Language | Extensions | Parser Type | Chunk Type |
|---|---|---|---|
| C, C++ | .c, .cpp, .h, .hpp |
regex_chunker |
Function + Context |
| Java | .java |
regex_chunker |
Function + Context |
| Shell | .sh, .bash |
regex_chunker |
Function-like blocks |
| Perl | .pl, .pm |
regex_chunker |
Subroutines |
| COBOL | .cob, .cbl, .cpy |
sas_cobol_chunkers |
Paragraphs |
| SAS | .sas |
sas_cobol_chunkers |
PROC & DATA blocks |
- Python 3.8+ for backend
- Node.js 18+ for frontend
- Neo4j Aura account (free tier available)
- Google AI Studio API key
git clone <repository-url>
cd CodeScrybe-Legacy-Repository-AICreate a .env file in the Server directory:
# Neo4j Configuration
NEO4J_URI=bolt://<your-neo4j-uri>
NEO4J_USERNAME=<your-username>
NEO4J_PASSWORD=<your-password>
# Google Gemini AI
GEMINI_API_KEY=<your-gemini-api-key>Neo4j Aura:
- Visit Neo4j Aura
- Create a free account and new instance
- Copy the connection details to your
.env
Google Gemini:
- Go to Google AI Studio
- Create a new API key
- Add it to your
.envfile
# Create virtual environment (if venv410 doesn't work)
python -m venv venv
source Server/venv/bin/activate # Linux/Mac
# or
Server\venv410\Scripts\activate.bat # Windows
# Install dependencies
pip install -r Server/requirements.txt
# Start the backend server
uvicorn Server.backend.app:app --reloadThe backend will be available at: http://127.0.0.1:8000
# Navigate to client directory (from root)
cd Client
# Install dependencies
npm install
# Start the development server
npm startThe frontend will be available at: http://localhost:3000
GET /health
Visit http://127.0.0.1:8000/docs for complete API documentation.
-
Test Individual Components:
cd Server source venv410/bin/activate # or your venv # Test repo cloning python backend/agents/repo_loader.py # Test code parsing python -m backend.agents.code_parsers.py # Test NER extraction python backend/agents/ner_extractor.py # Test knowledge graph building python backend/agents/graph_builder.py # Test GraphRAG queries python backend/agents/graph_rag.py
-
Run Complete Pipeline:
python backend/pipeline.py
cd client
npm start # Start development serverVirtual Environment Issues:
- If
venv410doesn't work, create a new virtual environment - Ensure Python 3.8+ is installed
- Check path issues when moving project directories
API Connection Issues:
- Verify
.envfile is in the Server directory - Check Neo4j instance is running
- Validate Gemini API key is active
Port Conflicts:
- Backend default:
8000 - Frontend default:
3000 - Modify ports in respective configuration files if needed
- Start both backend and frontend servers
- Navigate to
http://localhost:3000/dashboard - Click on Add Repo
- Enter a GitHub repository URL
- Wait for analysis to complete
- Explore the knowledge graph and ask questions in their chat section
- "What are the main functions in this codebase?"
- "Which files depend on the database module?"
- "Show me all the API endpoints defined"
- "What security-related functions are implemented?"
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini AI for intelligent code analysis
- Neo4j for graph database capabilities
- FastAPI for robust backend development
- React.js for modern frontend development
CodeScrybe - Legacy Repository AI | Transforming Legacy Code Understanding Through AI
Need Help? Check the individual README files in the Server/ and client/ directories for more detailed setup instructions.