An AI Hackathon MVP that moves beyond traditional resume screening.
Skills are verified, candidates are anonymized to prevent bias, and every hiring decision is explained by AI agents.
- What Problem Does This Solve?
- How It Works β End-to-End Flow
- Architecture Diagram
- Module-by-Module Breakdown
- The Three AI Agents
- Composite Scoring Formula
- API Reference
- Project Structure
- Tech Stack
- Quick Start
- Environment Variables
- Demo / Hackathon Fallback
Traditional recruitment tools suffer from three critical flaws:
| Problem | Traditional ATS | This Platform |
|---|---|---|
| Resume lying | Takes skills at face value | Verifies skills against real GitHub code |
| Hiring bias | Exposes name, gender, location | Auto-strips all PII before analysis |
| Black-box decisions | Shows only a score | 3 AI agents write a recruiter justification |
This platform builds a Multi-Agent System (MAS) where specialized AI agents inspect candidates from multiple angles and produce a ranked, scored, and explained pipeline β in seconds.
RECRUITER SYSTEM AI AGENTS
βββββββββ ββββββ βββββββββ
Upload PDF βββββββββββββββΊ pdf_parser.py
β (raw text)
βΌ
anonymizer_agent.py βββ LangChain + LLM
β (PII-free JSON profile)
β
Enter GitHub URL βββββββββΊ github_scraper.py
β (real repo deps, languages)
β
Enter CF Handle ββββββββββΊ codeforces_scraper.py
β (rating, rank, contest history)
β
βββββββΌβββββββ
β vector_db β βββ ChromaDB (cosine similarity)
βββββββ¬βββββββ
β
Paste Job Description ββββββββΊ get_top_matches()
β (top-k candidates ranked by similarity)
β
evaluation_agents.py
βββ TechLeadAgent β skill verification
βββ TrajectoryAgent β learning velocity (1β5)
βββ ExplainabilityAgent β 3-sentence justification
β
Synthesizer (weighted final_score)
β
Next.js Dashboard βββ recruiter sees ranked cards
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (Next.js) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββ
β β UploadPanel β β JDPanel β β CandidateCard ΓN ββ
β β drag & drop β β textarea JD β β score ring + agents ββ
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββββββββββββββββββββ
βββββββββββΌββββββββββββββββββΌββββββββββββββββββββββββββββββββββ-ββ
β POST /api/ β POST /api/match
β upload-candidate β
βββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β β FASTAPI BACKEND (main.py) β β
β ββββββββΌββββββββ βββββββΌβββββββ β β
β β pdf_parser β β vector_db βββββββββββββ β
β β anonymizer β β ChromaDB β β
β β gh_scraper β βββββββ¬βββββββ β
β β cf_scraper β β β
β ββββββββ¬ββββββββ β top-k matches β
β ββββββββββββββββββΊβ β
β ββββββββΌββββββββββββββββββββββ β
β β evaluation_agents.py β β
β β TechLead + Trajectory + β β
β β Explainability + Synthesizerβ β
β βββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
LLM API (OpenAI GPT-4o / Google Gemini)
External APIs (GitHub REST, Codeforces API)
ChromaDB (local disk persistence)
Extracts raw text from uploaded PDF resumes using a two-library strategy:
- PyPDF2 β fast, handles standard text-based PDFs
- pdfminer.six (fallback) β slower but handles complex layouts, multi-column, non-standard encoding
Exposes two functions:
extract_text_from_pdf(file_path)β file on diskextract_text_from_pdf_bytes(pdf_bytes)β in-memory bytes (for FastAPIUploadFile)
A LangChain LCEL chain that takes raw resume text and:
- Strips all PII: names, emails, phone numbers, addresses, LinkedIn URLs, gender markers, age/DOB
- Structures the output into a strict JSON schema with skills, work history, projects, education, and certifications
- Includes robust JSON parsing that handles markdown fences and extra prose from LLMs
Why anonymize? Removing identity signals forces the system (and ultimately the recruiter) to evaluate purely on capability β reducing unconscious bias.
Output schema:
{
"skills": ["Python", "FastAPI"],
"years_of_experience": 5,
"education": [{"degree": "...", "institution": "[Redacted]", "year": 2020}],
"work_history": [{"role": "...", "company": "[Redacted]", "duration_months": 24, "key_achievements": [...]}],
"certifications": [...],
"projects": [{"name": "...", "description": "...", "tech_stack": [...]}],
"pii_removed": true
}Fetches real evidence of a candidate's coding activity from GitHub's REST API:
- Profile stats β public repos count, followers, following
- Top 5 repos β sorted by stars, including language, description, and topics
- Language distribution β percentage breakdown across all public repos
- Dependency verification β fetches
requirements.txtandpackage.jsonfrom each repo and parses the actual libraries used
This is the anti-lying layer: a candidate claiming "I know FastAPI" is checked against whether any of their repos actually depend on fastapi in their package manifests.
Requires a GitHub Personal Access Token in .env (increases rate limit from 60 to 5,000 requests/hour).
Fetches competitive programming data from the Codeforces public API:
user.infoβ current/max rating, rank title, contribution scoreuser.ratingβ full contest history (count, recent deltas)user.statusβ approximation of solved problem count (distinct accepted submissions)
Safe-by-design: if a handle doesn't exist or the API is unreachable, it returns a clean dict with null values and an error message β it never crashes the pipeline.
Codeforces rank titles: Newbie β Pupil β Specialist β Expert β Candidate Master β Master β International Master β Grandmaster β International Grandmaster β Legendary Grandmaster
The semantic matching engine using ChromaDB:
- Embedding: Converts candidate profiles and JDs to dense vector embeddings using OpenAI
text-embedding-3-smallor Google Gemini embeddings - Profile serialization: Converts the JSON candidate profile into a rich natural-language document before embedding (skills, role history, projects, verified deps, CF rating all included)
- Storage: ChromaDB persists data to disk (
./chroma_db/) β survives restarts - Retrieval: Uses cosine similarity to find the top-k most semantically relevant candidates for a given JD
- Score conversion: ChromaDB returns cosine distance [0, 2]. We convert:
similarity = 1 - (distance / 2)β gives [0, 1]
Three LangChain-powered LLM agents that run after vector retrieval:
(See The Three AI Agents section below)
FastAPI server that ties everything together. Exposes 7 REST endpoints, handles file uploads via multipart/form-data, and caches each processed candidate in memory to avoid re-running the full pipeline on every match query.
Purpose: Verify if the candidate is telling the truth about their skills.
Input: List of claimed skills (from anonymized resume) + list of verified dependencies (from GitHub)
Process: Compares the two lists β any claimed skill that appears in actual requirements.txt/package.json files is marked verified. The rest are flagged as unverified claims.
Output:
{
"verified_skills": ["Python", "FastAPI", "scikit-learn"],
"unverified_claims": ["Kubernetes", "Spark"],
"verification_rate": 0.71,
"verdict": "Strong alignment β most core skills are GitHub-verified."
}Purpose: Score how fast the candidate is growing as a developer.
Input: Work history, projects, Codeforces contest history and rating trend
Process: The LLM analyzes the arc of the candidate's career β are they taking on more complex systems? Increasing their competitive programming rating? Leading teams? The score is holistic.
Scoring rubric:
| Score | Meaning |
|---|---|
| 1 | Stagnant β repetitive roles, no visible skill growth |
| 2 | Slow β minor incremental improvement |
| 3 | Steady β consistent, reasonable growth |
| 4 | Fast β clear skill expansion, increasing impact |
| 5 | Exceptional β rapid multi-domain growth, leadership, innovation |
Purpose: Write a recruiter-facing justification so hiring decisions are never a black box.
Input: JD text, vector similarity score, Tech Lead findings, Trajectory score
Output: Exactly 3 sentences structured as:
- Strongest technical evidence (specific verified skills/GitHub data)
- Learning trajectory and growth potential
- Overall hiring recommendation with confidence level
Example output:
This candidate demonstrates strong Python and FastAPI expertise confirmed through direct GitHub dependency analysis, with 71% of claimed skills verified in production codebases. Their Codeforces rating improvement of +320 over 24 months and consistent progression from junior backend to distributed systems engineering indicates fast, multi-domain learning. Overall a high-confidence match for the Senior Backend Engineer role, with Kubernetes and Spark claims warranting brief technical interview verification.
The final score that drives candidate ranking is a weighted composite of three signals:
final_score = (vector_similarity Γ 0.40)
+ (verification_rate Γ 0.30)
+ (trajectory_score/5 Γ 0.30)
| Component | Weight | What It Measures |
|---|---|---|
| Vector Similarity | 40% | Semantic relevance of the candidate's full profile to the JD |
| Verification Rate | 30% | What % of resume claims are backed by real GitHub code |
| Trajectory Score | 30% | Learning velocity and career growth (1β5 normalized) |
This weighting intentionally penalizes inflated resumes (low verification rate) even if the semantic similarity is high.
Base URL: http://localhost:8000
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
Health check |
POST |
/api/upload-candidate |
Upload PDF + GitHub/CF β run full ingestion pipeline |
POST |
/api/add-jd |
Store a JD in ChromaDB |
POST |
/api/match |
Run vector match + all 3 agents β ranked results |
GET |
/api/candidates |
List all stored candidate IDs |
DELETE |
/api/candidates/{id} |
Remove a specific candidate |
DELETE |
/api/reset |
Content-Type: multipart/form-data
Fields:
resume (file, required) β PDF resume file
github_username (string, optional) β GitHub username
codeforces_handle (string, optional) β Codeforces handle
candidate_label (string, optional) β Custom ID (auto-generated if omitted)
{
"jd_text": "We are looking for a Senior Python engineer...",
"top_k": 5
}Response: Ranked array of MatchResult objects with scores, verified skills, trajectory, and explanation.
eightfold/
βββ backend/
β βββ __init__.py
β βββ config.py β Central env settings
β βββ logger.py β Shared logger factory
β βββ pdf_parser.py β Phase 1: PDF text extraction
β βββ github_scraper.py β Phase 1: GitHub profile + deps
β βββ codeforces_scraper.py β Phase 1: CF rating/contests
β βββ anonymizer_agent.py β Phase 2: LLM PII stripping
β βββ vector_db.py β Phase 3: ChromaDB embed + match
β βββ evaluation_agents.py β Phase 4: 3 agents + synthesizer
β βββ main.py β Phase 5: FastAPI server
β
βββ frontend/
β βββ next.config.js β API proxy config
β βββ package.json
β βββ tsconfig.json
β βββ src/
β βββ app/
β β βββ layout.tsx β Root layout + fonts + toasts
β β βββ globals.css β Dark design system
β β βββ page.tsx β Main dashboard
β βββ components/
β βββ UploadPanel.tsx β PDF drag-drop + inputs
β βββ JDPanel.tsx β JD textarea + save
β βββ CandidateCard.tsx β Ranked result card
β βββ StatsBar.tsx β Live pipeline metrics
β
βββ chroma_db/ β Auto-created, ChromaDB persistence
βββ requirements.txt
βββ .env.example
βββ README.md
| Layer | Technology | Why |
|---|---|---|
| Backend | FastAPI + Python 3.10+ | Async, fast, automatic API docs |
| Frontend | Next.js 14 + React + TypeScript | SSR-ready, great DX, built-in proxy |
| LLM / Agents | LangChain LCEL + OpenAI GPT-4o / Gemini | Composable chains, dual-provider support |
| Vector DB | ChromaDB (local) | Zero-infra, persistent, open source |
| Embeddings | OpenAI text-embedding-3-small / Gemini |
State-of-art semantic search |
| PDF Parsing | PyPDF2 + pdfminer.six | Redundant fallback for robustness |
| GitHub Data | GitHub REST API | Real dependency verification |
| Coding Stats | Codeforces Public API | Objective algorithmic skill signal |
| Animations | Framer Motion | Smooth UI transitions |
| Icons | Lucide React | Consistent, lightweight iconset |
- Python 3.10+
- Node.js 18+
- An OpenAI API key OR Google Gemini API key
- (Optional) GitHub Personal Access Token for higher API rate limits
cd "C:\Users\Kuldeep Sharma\Desktop\eightfold"
# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate
# Install all Python dependencies
pip install -r requirements.txt
# Copy the example env file and fill in your keys
copy .env.example .envEdit .env:
OPENAI_API_KEY=sk-your-key-here
GITHUB_TOKEN=ghp_your-token-here
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o
USE_MOCK_DATA=Falseuvicorn backend.main:app --reload --port 8000Visit http://localhost:8000/docs to see the auto-generated Swagger UI.
cd "C:\Users\Kuldeep Sharma\Desktop\eightfold\frontend"
npm install
npm run devVisit http://localhost:3000 to use the dashboard.
- Upload Candidates tab: Drop a PDF resume. Optionally add a GitHub username and Codeforces handle. Click Ingest Candidate. Repeat for all candidates.
- Job Description tab: Paste your JD (or click "Use Sample"). Optionally save it to the vector DB.
- Click Match Candidates β the AI agents run and results appear ranked by final score.
- Click any card to expand it and see verified skills, unverified claims, score breakdown, and the AI explanation.
| Variable | Required | Description | Default |
|---|---|---|---|
OPENAI_API_KEY |
If using OpenAI | OpenAI API key | β |
GOOGLE_API_KEY |
If using Gemini | Google Gemini API key | β |
LLM_PROVIDER |
No | openai or gemini |
openai |
LLM_MODEL |
No | Model name (e.g. gpt-4o) |
gpt-4o |
GITHUB_TOKEN |
Recommended | GitHub PAT for higher rate limits | β |
USE_MOCK_DATA |
No | True to skip all external APIs |
False |
CHROMA_PERSIST_DIR |
No | ChromaDB storage directory | ./chroma_db |
APP_PORT |
No | FastAPI server port | 8000 |
π‘οΈ Never crash during a live demo.
Set USE_MOCK_DATA=True in your .env file. This activates pre-canned data for every external call:
| Module | Mock behavior |
|---|---|
github_scraper.py |
Returns a realistic profile for "octocat" with 8 repos, Python/JS distribution, and verified deps |
codeforces_scraper.py |
Returns "tourist" profile β Legendary Grandmaster, 247 contests, 3979 max rating |
anonymizer_agent.py |
Returns a Senior Software Engineer profile with 5 years experience and ML skills |
evaluation_agents.py |
Returns pre-written agent outputs with 71% verification rate, trajectory score 4/5 |
With USE_MOCK_DATA=True, no API keys are required β the entire pipeline runs locally, instantly.
Each backend module can be run directly from the command line for quick testing:
# Test GitHub scraper
python -m backend.github_scraper torvalds
# Test Codeforces scraper
python -m backend.codeforces_scraper tourist
# Test PDF parser
python -m backend.pdf_parser path\to\resume.pdf
# Test anonymizer (uses pdf_parser internally)
python -m backend.anonymizer_agent path\to\resume.pdf
# Test vector DB with sample data
python -m backend.vector_db
# Test evaluation agents (uses mock data)
python -m backend.evaluation_agentsBuilt for the AI Hackathon 2026 β Agentic Talent Intelligence.