Real-time safety intelligence for Chicago public transit.
TransitGuard is an AI-powered platform that predicts safety incidents across the Chicago Transit Authority (CTA) system. Built as a capstone project for Northwestern University's MS in Data Science program.
The CTA serves 950,000+ daily riders across 128 bus routes and 8 rail lines. Crime on the system increased post-pandemic, but incident reporting remains reactive — passengers and transit authorities only know about problems after they happen.
TransitGuard shifts from reactive to predictive. We analyzed 50,000+ CTA-related crime incidents (2014-2024) alongside ridership patterns, streetlight outages, and environmental factors to build models that identify high-risk stations, routes, and time windows before incidents occur.
| Component | Description | Repo |
|---|---|---|
| Dashboard | Real-time safety metrics and hotspot visualization | transitguard-dashboard |
| Mobile App | SMS/push alerts for riders and dispatchers | transitguard-app |
| RAG API | GenAI chatbot backend — Pinecone + Claude for natural language safety queries | TransitGuardRAG |
| Predictive Models | XGBoost, Logistic Regression, time series forecasting | Integrated |
The TransitGuardRAG API powers natural language queries about CTA safety:
- "What are the stations near me?"
- "Total number of crimes today"
- "Safest line in the last 7 days"
- "Total number of traffic accidents today"
Built with FastAPI, Pinecone vector search, and Claude Haiku 3 for answer generation. Deployed on Railway.
- CTA trains have 3x more incidents than buses despite lower ridership — environment matters more than volume
- Theft and battery account for 52% of all CTA-related crimes
- Weekday peaks (Tue-Thu) and summer months show highest incident rates
- Spatial clustering (DBSCAN) identified persistent hotspots across the rail network
| Goal | Method |
|---|---|
| Identify hotspots | DBSCAN spatial clustering, KDE |
| Predict incidents | XGBoost, Logistic Regression, time series forecasting |
| Segment risk zones | ZIP code and community area aggregation |
| Surface insights | Real-time dashboard, RAG chatbot, mobile alerts |
All data from the City of Chicago Data Portal:
- Crime data (8.29M records, 2001-present)
- CTA ridership (daily boarding totals, L station entries, bus routes)
- 311 reports (streetlight outages, graffiti removal)
- Traffic crashes and fatalities
- Geographic boundaries (community areas, wards, ZIP codes)
Built by a 5-person team at Northwestern University (MSDS 498 Capstone, 2025):
- Kevin Ou — Dashboards, GenAI
- Derek Plemons — GenAI Development, App Development, Modeling
- Sergio Valentini — App Development, Modeling
- Summer Xia — Data Cleaning, Visualization, Modeling
- Sophie Xiao — Data Cleaning, Visualization
Python XGBoost scikit-learn Pandas Folium Streamlit React Native FastAPI Pinecone Claude API
MIT
Northwestern University — MS in Data Science Capstone Project — April 2025
