This is a list of raw weather and climate datasets that are commonly used in ML research. The list is in alphabetical order.
- Reference: Eyring et al. 2016
- Data: https://pcmdi.llnl.gov/CMIP6/
- Description: Huge archive of global climate model simulations following all kinds of different scenarios.
- Examples of papers using this dataset: Ham et al. 2019
- Reference: Hersbach et al. 2020
- Data: https://cds.climate.copernicus.eu/
- Description: The ultimate reanalysis dataset covering the last 40 years (1950 to 1978 as a preliminary version) at 0.25 degree global resolution. Hourly data available. Pretty much every variable.
- Notes: Care is to be taken for a bunch of surface variables, such as precipitation and wind. These often don't match direct observations very closely.
- Examples of papers using this dataset: WeatherBench
- Reference: Bougeault et al. 2010
- Data: https://apps.ecmwf.int/datasets/data/tigge/levtype=sfc/type=cf/
- Description: 15 year archive of operational global ensemble forecasts from different centers (not live).
- Examples of papers using this dataset: WeatherBench