Challenge 33 - From Thresholds to Events: Automated Air-Quality Feature Detection

> **Stream 3 - Software Development**

### Mentors

- Sebastian Steinig
- Miha Razinger
- Mark Parrington
- Cathy Wing Yi Li
- Auke Visser

### Skill Required

- Experience with using Python (e.g. xarray, numpy, earthkit) for geospatial data processing workflows is essential
- Familiarity with meteorological datasets (GRIB, NetCDF) is desirable
- Experience with feature detection or image segmentation algorithms (e.g. U-Net/convolutional neural networks) would be an advantage

### Goal

Develop a Python workflow to automatically screen our daily atmospheric composition forecasts for a configurable set of significant air quality “events” (e.g. wildfire smoke, Saharan dust, volcanic eruptions) and produce feature masks and summary statistics (e.g. number, location, size, strength). Compare approaches (e.g. thresholding, shape-based methods, deep learning) and assess performance versus computational complexity. The goal is to move beyond today’s threshold-based alerts towards event definitions with minimal hardcoded thresholds or labelled training data, enabling scalable detection, visualisation, and forecast evaluation across datasets and variables.

<figure>
 <img src="https://github.com/user-attachments/assets/7d018869-c35a-49ac-abbc-f2a122534889" width="800">
 <figcaption>
 
 Figure 1: Animation of long-range transport of wildfire smoke (purple) and Saharan dust (orange) into Europe. 
 These aerosol plumes are an example of air pollution events we want to detect automatically. 
 The associated forecast field is available as sample data from 
 <a href="https://sites.ecmwf.int/data/cams/codeforearth/2026/cams_feature_detection/" target="_blank">here</a>.
 
 </figcaption>
</figure>

<hr>

> *Note: The funding source for this challenge is ECMWF funding. For details on the eligibility, please refer to Article 3 of the [Terms and Conditions](https://codeforearth.ecmwf.int/terms-and-conditions).* 

<hr>


### Description of the Challenge

The Copernicus Atmospheric Monitoring Service (CAMS) provides daily global atmospheric composition forecasts of pollutants, aerosols and greenhouse gases. These products are widely used in public-facing communication ([web charts](https://atmosphere.copernicus.eu/charts/packages/cams/), [news articles](https://atmosphere.copernicus.eu/news), [social media](https://www.linkedin.com/company/copernicus-ecmwf/posts/?feedView=all)) to provide quality-controlled information on air pollution and health. A key goal is to highlight significant air quality events, in particular long-range transport (e.g. wildfire smoke from North America or Saharan dust impacting Europe). Doing this routinely requires automated detection and tracking in high-volume forecast data.

A common first approach are fixed threshold metrics: flagging grid points where a variable exceeds a predefined value. This is currently used for the [CAMS Aerosol Alerts service](https://aerosol-alerts.atmosphere.copernicus.eu/about), but this can have important limitations (arbitrary threshold choice, variable-specific cutoffs; non-stationary background; no structural/shape information; limited transferability across datasets).

The central question of this challenge is: 
"Can we do better than simple threshold metrics for automated atmospheric composition feature detection?"

You will find, implement and evaluate one or more techniques to detect and characterise events in gridded atmospheric composition fields. A ~20-year [reanalysis](https://ads.atmosphere.copernicus.eu/datasets/cams-global-reanalysis-eac4?tab=overview) can be used as reference (climatology/background state; see sample file [here](https://sites.ecmwf.int/data/cams/codeforearth/2026/cams_feature_detection/)), with the longer-term goal of routinely screening operational 5-day forecasts to flag significant events for further analysis. To keep the project feasible, we propose an MVP-first approach: start with one primary target feature (long-range aerosol plumes; see animation above) and build a robust end-to-end workflow around it. Additional targets (e.g. hemispheric ozone transport, volcanic SO2 plumes, or generic "anomalies") can be discussed as stretch goals during the project once the core pipeline is in place.

Possible approaches (in increasing complexity) include:
- "Simple" [thresholding](https://aerosol-alerts.atmosphere.copernicus.eu/about/), potentially extended to vertical levels. This provides a transparent baseline and helps quantify what is gained from more sophisticated methods.
- Shape/gradient-based feature extraction using local geometry (e.g. Hessian-based shape measures) to better capture plume-like structures and reduce sensitivity to background shifts. Concrete starting points could include the [SCAFET framework](https://github.com/nbarjun/SCAFET) or the [tobac library](https://tobac.readthedocs.io/en/latest/).
- ML-based segmentation/classification (e.g. CNN/U-Net), ideally with weakly-supervised, self-supervised or unsupervised elements to reduce reliance on labelled data. This could range from learning plume masks directly to anomaly detection which flags “interesting” patterns without fixed thresholds.

Expected outputs include an end-to-end, configurable workflow that produces feature masks and summary statistics, a small set of case studies, and a comparison of methods against threshold baseline (skill, robustness, computational cost). We see a number of immediate applications of this new workflow. These don't need to be part of your initial proposal, but could, depending on your interest and progress during the project, provide some real-world targets for your new workflow (details can be discussed during the project phase):
- Detecting 3D aerosol plume structure for improved visualisation and communication of long-range air pollution events.
- Event-based evaluation of forecast performance against in-situ observations.
- Combining multiple variables (e.g. greenhouse gases and co-emitted pollutants) to better detect and track emission plumes with more difficult signal-to-noise ratios.

Please remember that the selection process will not simply be based on the complexity of the proposal, but rather on the feasibility of deliverables given your existing skills. If you have particular experience or interest in a specific detection method, feel free to structure your proposal around this technique.

<figure>
 <img width="1920" height="1080" alt="Image"
 src="https://github.com/user-attachments/assets/3f6f28ef-f0f1-45e8-ace3-9599dbc3be36" />
 <figcaption>
Figure 2: Examples of shape-based feature extraction of the 3D jet stream and atmospheric rivers with the SCAFET library (<a href="https://gmd.copernicus.org/articles/17/301/2024/" target="_blank">Nellikkattil et al. (2024)</a>)</figcaption>
</figure>

### Evaluation criteria

- Feasibility
- Innovative approach
- Transferability
- Matching requirements;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenge 33 - From Thresholds to Events: Automated Air-Quality Feature Detection #11

Mentors

Skill Required

Goal

Description of the Challenge

Evaluation criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Challenge 33 - From Thresholds to Events: Automated Air-Quality Feature Detection #11

Description

Mentors

Skill Required

Goal

Description of the Challenge

Evaluation criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions