Challenge 21 - ML-Driven Downscaling of Global Air Pollution Fields
Stream 2 - Machine Learning for Earth Sciences Applications
Goal
Develop and test an application that downscales global-scale pollutant concentrations (e.g., from the CAMS Global Reanalysis or Forecasts) to regional-scale concentrations using Machine Learning (ML) approaches. The focus of this challenge is to design and evaluate ML models capable of learning the relationship between global air quality fields and regional-scale outputs, using the regional products as ground truth during training in Europe. The model should integrate additional spatial and environmental features such as land use, meteorology, topography, and geographic location to improve accuracy.
This effort will explore optimal ML architectures (e.g., convolutional neural networks, random forests, or other suitable methods) and key features that drive improved downscaling. The global and the downscaled regional model results will be evaluated against air quality observation using FAIRMODE metrics (BIAS, RMSE, MQI). The aim of the down scaling is to get a better agreement with observations. As a final step, the trained model will be applied to regions outside Europe (e.g., the U.S. and China) to assess its generalizability and performance in different geographical contexts.
Mentors
Martin Ramacher, Johannes Bieser (Helmholtz-Zentrum Hereon)
Johannes Flemming, Miha Razinger, Paula Harder (ECMWF)
Skills Required
- Familiarity with different ML algorithms (for example XGBoost, Random Forest, Neural Networks, CNNs, … )
- Working with geospatial data formats (e.g., netCDF, GeoTIFF, shapefiles, …)
- Programming (Python, R, …) proficient for data processing and ML implementation
- Understanding of error metrics (e.g., RMSE, BIAS, MAE) and performance evaluation techniques, such as the FAIRMODE evaluation framework and Model Quality Indicator (MQI)
Description
What is the current problem/limitation?
Global-scale atmospheric composition products, such as those from CAMS, often fail to capture regional-scale variations in pollutant concentrations, particularly in areas with complex spatial features such as coastlines, mountains, and urban regions. While regional products do exist (e.g., CAMS European Reanalysis), there is potential to improve their accuracy and extend their applicability to regions outside Europe through ML-based downscaling methods. The goal is to explore and identify optimal ML methods and input features for this downscaling task.
This challenge invites participants to design and implement an ML-based downscaling framework with the following key steps:
Training the ML Model in Europe: Train an ML model to downscale global CAMS pollutant concentrations (e.g., NO2, PM2.5, O3) using CAMS regional-scale concentrations for Europe as ground truth.
Explore different ML architectures (e.g., random forests, XGBoost, CNNs) and input features (e.g., land use, topography, meteorology) to determine which combination yields the most accurate downscaling results.
Feature Selection and Exploration: Identify and evaluate the impact of key input variables (e.g., topography, wind speed, urban density, land use, …) on model performance.
Evaluation in Europe: Evaluate the downscaled results in Europe using FAIRMODE principles, including BIAS, RMSE, and the Model Quality Indicator (MQI), based on observations in Europe.
Application to the U.S. and/or China: Apply the trained model to downscale global CAMS fields in regions outside Europe, such as the U.S. and China, and evaluate its performance using local ground-based measurements.
Evaluation criteria
- Feasibility
- Innovative approach
- Transferability
- Easy to maintain / Future-proof approach
- Comprehensibility
- Matching requirements
Challenge 21 - ML-Driven Downscaling of Global Air Pollution Fields
Goal
Develop and test an application that downscales global-scale pollutant concentrations (e.g., from the CAMS Global Reanalysis or Forecasts) to regional-scale concentrations using Machine Learning (ML) approaches. The focus of this challenge is to design and evaluate ML models capable of learning the relationship between global air quality fields and regional-scale outputs, using the regional products as ground truth during training in Europe. The model should integrate additional spatial and environmental features such as land use, meteorology, topography, and geographic location to improve accuracy.
This effort will explore optimal ML architectures (e.g., convolutional neural networks, random forests, or other suitable methods) and key features that drive improved downscaling. The global and the downscaled regional model results will be evaluated against air quality observation using FAIRMODE metrics (BIAS, RMSE, MQI). The aim of the down scaling is to get a better agreement with observations. As a final step, the trained model will be applied to regions outside Europe (e.g., the U.S. and China) to assess its generalizability and performance in different geographical contexts.
Mentors
Martin Ramacher, Johannes Bieser (Helmholtz-Zentrum Hereon)
Johannes Flemming, Miha Razinger, Paula Harder (ECMWF)
Skills Required
Description
What is the current problem/limitation?
Global-scale atmospheric composition products, such as those from CAMS, often fail to capture regional-scale variations in pollutant concentrations, particularly in areas with complex spatial features such as coastlines, mountains, and urban regions. While regional products do exist (e.g., CAMS European Reanalysis), there is potential to improve their accuracy and extend their applicability to regions outside Europe through ML-based downscaling methods. The goal is to explore and identify optimal ML methods and input features for this downscaling task.
This challenge invites participants to design and implement an ML-based downscaling framework with the following key steps:
Training the ML Model in Europe: Train an ML model to downscale global CAMS pollutant concentrations (e.g., NO2, PM2.5, O3) using CAMS regional-scale concentrations for Europe as ground truth.
Explore different ML architectures (e.g., random forests, XGBoost, CNNs) and input features (e.g., land use, topography, meteorology) to determine which combination yields the most accurate downscaling results.
Feature Selection and Exploration: Identify and evaluate the impact of key input variables (e.g., topography, wind speed, urban density, land use, …) on model performance.
Evaluation in Europe: Evaluate the downscaled results in Europe using FAIRMODE principles, including BIAS, RMSE, and the Model Quality Indicator (MQI), based on observations in Europe.
Application to the U.S. and/or China: Apply the trained model to downscale global CAMS fields in regions outside Europe, such as the U.S. and China, and evaluate its performance using local ground-based measurements.
Evaluation criteria