Skip to content

foxintheloop/TransitGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

TransitGuard

Real-time safety intelligence for Chicago public transit.

TransitGuard is an AI-powered platform that predicts safety incidents across the Chicago Transit Authority (CTA) system. Built as a capstone project for Northwestern University's MS in Data Science program.

TransitGuard Dashboard

The Problem

The CTA serves 950,000+ daily riders across 128 bus routes and 8 rail lines. Crime on the system increased post-pandemic, but incident reporting remains reactive — passengers and transit authorities only know about problems after they happen.

The Solution

TransitGuard shifts from reactive to predictive. We analyzed 50,000+ CTA-related crime incidents (2014-2024) alongside ridership patterns, streetlight outages, and environmental factors to build models that identify high-risk stations, routes, and time windows before incidents occur.

Components

Component Description Repo
Dashboard Real-time safety metrics and hotspot visualization transitguard-dashboard
Mobile App SMS/push alerts for riders and dispatchers transitguard-app
RAG API GenAI chatbot backend — Pinecone + Claude for natural language safety queries TransitGuardRAG
Predictive Models XGBoost, Logistic Regression, time series forecasting Integrated

RAG Chatbot Capabilities

The TransitGuardRAG API powers natural language queries about CTA safety:

  • "What are the stations near me?"
  • "Total number of crimes today"
  • "Safest line in the last 7 days"
  • "Total number of traffic accidents today"

Built with FastAPI, Pinecone vector search, and Claude Haiku 3 for answer generation. Deployed on Railway.

Key Findings

  • CTA trains have 3x more incidents than buses despite lower ridership — environment matters more than volume
  • Theft and battery account for 52% of all CTA-related crimes
  • Weekday peaks (Tue-Thu) and summer months show highest incident rates
  • Spatial clustering (DBSCAN) identified persistent hotspots across the rail network

Technical Approach

Goal Method
Identify hotspots DBSCAN spatial clustering, KDE
Predict incidents XGBoost, Logistic Regression, time series forecasting
Segment risk zones ZIP code and community area aggregation
Surface insights Real-time dashboard, RAG chatbot, mobile alerts

Data Sources

All data from the City of Chicago Data Portal:

  • Crime data (8.29M records, 2001-present)
  • CTA ridership (daily boarding totals, L station entries, bus routes)
  • 311 reports (streetlight outages, graffiti removal)
  • Traffic crashes and fatalities
  • Geographic boundaries (community areas, wards, ZIP codes)

Team

Built by a 5-person team at Northwestern University (MSDS 498 Capstone, 2025):

  • Kevin Ou — Dashboards, GenAI
  • Derek Plemons — GenAI Development, App Development, Modeling
  • Sergio Valentini — App Development, Modeling
  • Summer Xia — Data Cleaning, Visualization, Modeling
  • Sophie Xiao — Data Cleaning, Visualization

Stack

Python XGBoost scikit-learn Pandas Folium Streamlit React Native FastAPI Pinecone Claude API

License

MIT


Northwestern University — MS in Data Science Capstone Project — April 2025

About

Predictive safety intelligence for Chicago CTA. ML models (XGBoost, DBSCAN clustering) trained on 50K+ crime incidents to identify high-risk stations before incidents occur. Includes RAG chatbot (Pinecone + Claude), real-time dashboard, and mobile alerts. Northwestern MSDS capstone.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors