An AI-powered scraper and tracker for fellowships, research internships, and open-source mentorship programs — built for CS students in Bangalore, India.
Gemini automatically generates search queries, crawls official program pages, extracts deadlines and eligibility details, stores everything in MongoDB, and serves it through a FastAPI + HTML frontend.
Live: fellowship-tracker.vercel.app
- AI Query Generation — Gemini generates targeted Google search queries for each program instead of using a hardcoded list
- Web Search — Serper API searches Google for official program pages
- AI Link Filtering — Gemini filters out blogs, aggregators, and social media — keeping only official application pages
- Page Crawling — crawl4ai scrapes each page's content
- AI Data Extraction — Gemini reads each page and extracts name, deadline, stipend, eligibility, mode, and tags
- MongoDB Storage — All data is upserted into MongoDB Atlas
- Frontend — Terminal-aesthetic UI with search, filters, and live stats
| Category | Programs |
|---|---|
| Open Source | LFX Mentorship, GSoC, DWoC, KWoC, CNCF Mentorship, FOSS United Fellowship |
| Bitcoin / Web3 | Summer of Bitcoin |
| Research | SRFP (JNCASR), SRIP (IIT Gandhinagar), IIT SURGE / SPARK / SRF, MSR India |
| Scholarships | Reliance Foundation, Grace Hopper Celebration (GHC), LIFT Fellowship |
| + AI suggested | 5 additional programs per run based on your student profile |
Fellowship_Tracker/
├── scraper/
│ ├── main.py # AI scraper pipeline (runs locally)
│ └── requirements.txt # Scraper-only dependencies (not deployed)
├── api/
│ └── index.py # FastAPI server + serves frontend
├── index.html # Frontend UI
├── requirements.txt # API-only dependencies (deployed to Vercel)
├── vercel.json # Vercel deployment config
├── .vercelignore # Excludes scraper from Vercel bundle
└── .env # API keys (never commit this)
Why two
requirements.txtfiles? Vercel has a 500MB Lambda size limit.crawl4ai+playwrightalone are ~400MB and are only needed for scraping locally. The deployed API only reads from MongoDB — it never scrapes.
- Python 3.10+
- MongoDB Atlas account (free tier works)
- Serper API key → serper.dev (2500 free searches)
- Gemini API key → aistudio.google.com (free tier)
git clone https://github.com/DuttaNeel07/FELLOWSHIP_TRACKER.git
cd FELLOWSHIP_TRACKERpython -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate# API dependencies
pip install -r requirements.txt
# Scraper dependencies (local only)
pip install -r scraper/requirements.txt
playwright install chromiumMONGO_URL=mongodb+srv://your_connection_string
SERPER_API_KEY=your_serper_key
GEMINI_API_KEY=your_gemini_keypython scraper/main.pyThis will take 20–60 minutes on the free Gemini tier due to rate limiting. The scraper handles this automatically with exponential backoff — just let it run.
uvicorn api.index:app --reload --port 8000Open http://localhost:8000 in your browser.
You do NOT need to run
python -m http.server. FastAPI serves the frontend directly.
requirements.txt (root — API only, no scraper packages):
fastapi
motor
dnspython
python-dotenv
pydantic
uvicorn
httpx
scraper/requirements.txt (local only — never deployed):
crawl4ai
playwright
google-genai
httpx
python-dotenv
motor
dnspython
.vercelignore (prevents scraper from being bundled):
scraper/
venv/
__pycache__/
*.pyc
Go to your project on vercel.com → Settings → Environment Variables and add:
MONGO_URL
SERPER_API_KEY
GEMINI_API_KEY
git add .
git commit -m "update"
git pushVercel auto-deploys on every push. The vercel.json routes all traffic through FastAPI:
{
"version": 2,
"rewrites": [
{ "source": "/api/(.*)", "destination": "/api/index.py" },
{ "source": "/(.*)", "destination": "/api/index.py" }
]
}| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Frontend UI |
| GET | /api/fellowships |
All opportunities (supports ?tag=, ?open=true, ?search=, ?limit=) |
| GET | /api/stats |
Total, open, and deadline counts |
| GET | /api/tags |
All distinct tags in the database |
The free Gemini tier allows ~15 requests/minute. The scraper sleeps 35 seconds between AI calls and retries with exponential backoff on 429 errors. To remove rate limits entirely, add billing to your Google AI Studio project — cost is under $0.01 per full scraper run.
To change the Gemini model, edit line 36 of scraper/main.py:
GEMINI_MODEL = "models/gemini-2.0-flash-lite"Available models confirmed for this project: models/gemini-2.5-flash, models/gemini-2.0-flash, models/gemini-2.0-flash-lite
Edit STUDENT_PROFILE and MUST_HAVE_PROGRAMS at the top of scraper/main.py to target different programs or locations.
STUDENT_PROFILE = {
"location": "Bangalore, Karnataka, India",
"education": "B.Tech / B.E. (undergraduate) or M.Tech (postgraduate)",
"domains": ["computer science", "software engineering", "AI/ML", "open source", "research"],
"year": "2025-2026 cycle",
}| Layer | Technology |
|---|---|
| AI | Google Gemini 2.0 Flash Lite |
| Search | Serper API (Google Search) |
| Scraping | crawl4ai + Playwright |
| Database | MongoDB Atlas |
| Backend | FastAPI + Motor (async) |
| Frontend | Vanilla HTML/CSS/JS + Tailwind CDN |
| Deployment | Vercel |