This project aims to predict the next day’s Air Quality Index (AQI) using historical pollutant measurements. The dataset contains hourly readings of pollutants such as CO, NH₃, NO₂, O₃, PM10, PM2.5, and SO₂, which are aggregated into daily averages for modeling.
I have also made a page about this project AQI Prediction model
A Random Forest Regressor is used to capture the nonlinear relationships between pollutant levels and AQI. Features include both pollutant concentrations and temporal attributes such as month and weekday.
With Diwali approaching, I wanted to observe how AQI levels deviate during the festival period — since air quality typically worsens significantly due to fireworks and increased emissions.
I have also written a blog on Effects of Diwali on AQI: Insights from my model.
Predictions before Diwali were highly accurate, as shown by standard regression metrics:
- MAE, MSE, and R² scores indicated solid performance, with R² ≈ 0.86 on seen data.
Score Results:
Predicted vs Actual Graph:
Predictions during Diwali were less consistent, often off by 20–25 AQI points. This drop in performance occurred because Diwali arrived earlier than usual, and the month/weekday features couldn’t effectively capture the sudden festival-related changes in emissions.
In the feature importance graph we could see that AQI was the most important feature. Stress test was done to see models perfomance without this important data.
With AQI Feature
Without AQI Feature
- Model: Random Forest Regressor
- Data: Daily averages of major air pollutants
- Goal: Next-day AQI prediction
- Result: Strong general accuracy (R² ≈ 0.86) on seen data, reduced accuracy during unmodeled festival events