-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Stream 3 - Software Development
Mentors
- Sebastian Steinig
- Miha Razinger
- Mark Parrington
- Cathy Wing Yi Li
- Auke Visser
Skill Required
- Experience with using Python (e.g. xarray, numpy, earthkit) for geospatial data processing workflows is essential
- Familiarity with meteorological datasets (GRIB, NetCDF) is desirable
- Experience with feature detection or image segmentation algorithms (e.g. U-Net/convolutional neural networks) would be an advantage
Goal
Develop a Python workflow to automatically screen our daily atmospheric composition forecasts for a configurable set of significant air quality “events” (e.g. wildfire smoke, Saharan dust, volcanic eruptions) and produce feature masks and summary statistics (e.g. number, location, size, strength). Compare approaches (e.g. thresholding, shape-based methods, deep learning) and assess performance versus computational complexity. The goal is to move beyond today’s threshold-based alerts towards event definitions with minimal hardcoded thresholds or labelled training data, enabling scalable detection, visualisation, and forecast evaluation across datasets and variables.
Figure 1: Animation of long-range transport of wildfire smoke (purple) and Saharan dust (orange) into Europe.
These aerosol plumes are an example of air pollution events we want to detect automatically.
The associated forecast field is available as sample data from
here.
Note: The funding source for this challenge is ECMWF funding. For details on the eligibility, please refer to Article 3 of the Terms and Conditions.
Description of the Challenge
The Copernicus Atmospheric Monitoring Service (CAMS) provides daily global atmospheric composition forecasts of pollutants, aerosols and greenhouse gases. These products are widely used in public-facing communication (web charts, news articles, social media) to provide quality-controlled information on air pollution and health. A key goal is to highlight significant air quality events, in particular long-range transport (e.g. wildfire smoke from North America or Saharan dust impacting Europe). Doing this routinely requires automated detection and tracking in high-volume forecast data.
A common first approach are fixed threshold metrics: flagging grid points where a variable exceeds a predefined value. This is currently used for the CAMS Aerosol Alerts service, but this can have important limitations (arbitrary threshold choice, variable-specific cutoffs; non-stationary background; no structural/shape information; limited transferability across datasets).
The central question of this challenge is:
"Can we do better than simple threshold metrics for automated atmospheric composition feature detection?"
You will find, implement and evaluate one or more techniques to detect and characterise events in gridded atmospheric composition fields. A ~20-year reanalysis can be used as reference (climatology/background state; see sample file here), with the longer-term goal of routinely screening operational 5-day forecasts to flag significant events for further analysis. To keep the project feasible, we propose an MVP-first approach: start with one primary target feature (long-range aerosol plumes; see animation above) and build a robust end-to-end workflow around it. Additional targets (e.g. hemispheric ozone transport, volcanic SO2 plumes, or generic "anomalies") can be discussed as stretch goals during the project once the core pipeline is in place.
Possible approaches (in increasing complexity) include:
- "Simple" thresholding, potentially extended to vertical levels. This provides a transparent baseline and helps quantify what is gained from more sophisticated methods.
- Shape/gradient-based feature extraction using local geometry (e.g. Hessian-based shape measures) to better capture plume-like structures and reduce sensitivity to background shifts. Concrete starting points could include the SCAFET framework or the tobac library.
- ML-based segmentation/classification (e.g. CNN/U-Net), ideally with weakly-supervised, self-supervised or unsupervised elements to reduce reliance on labelled data. This could range from learning plume masks directly to anomaly detection which flags “interesting” patterns without fixed thresholds.
Expected outputs include an end-to-end, configurable workflow that produces feature masks and summary statistics, a small set of case studies, and a comparison of methods against threshold baseline (skill, robustness, computational cost). We see a number of immediate applications of this new workflow. These don't need to be part of your initial proposal, but could, depending on your interest and progress during the project, provide some real-world targets for your new workflow (details can be discussed during the project phase):
- Detecting 3D aerosol plume structure for improved visualisation and communication of long-range air pollution events.
- Event-based evaluation of forecast performance against in-situ observations.
- Combining multiple variables (e.g. greenhouse gases and co-emitted pollutants) to better detect and track emission plumes with more difficult signal-to-noise ratios.
Please remember that the selection process will not simply be based on the complexity of the proposal, but rather on the feasibility of deliverables given your existing skills. If you have particular experience or interest in a specific detection method, feel free to structure your proposal around this technique.
Figure 2: Examples of shape-based feature extraction of the 3D jet stream and atmospheric rivers with the SCAFET library (Nellikkattil et al. (2024))
Evaluation criteria
- Feasibility
- Innovative approach
- Transferability
- Matching requirements;