Replication package for "Combining predictive distributions for time-to-event outcomes in meteorology"
C. Cunen ([email protected]) and T. Roksvåg ([email protected]), October 2025.
This replication package provides all code related to the paper "Combining predictive distributions for time-to-event outcomes in meteorology" by C. Cunen, T. Roksvåg, C. Heinrich‐Mertsching, and A. Lenkoski.
All 14 figures in the paper can be reproduced from the code provided. The code also provides the content of 5 tables in the paper.
This repository contains the following five folders:
scripts/
: R scripts to run simulation and case-study and create figures and tables;figures/
: folder of generated figures as PDF files;simulation_data/
: folder containingRData
files of the simulation results;application_data/
: folder containing the time-to-hard-freeze datasets used in the case-study and as illustrations;application_results/
: folder containing intermediary files for the case-study.
The provided R scripts (inside scripts/
) fall into four groups:
- The simulation study: the scripts
sim_largeSample_sc1-8.R
,sim_smallSample_sc9-16.R
andsimulation_tables_figures.R
. See details below. - The case-study: the scripts
frostData_exp_est_comboParams.R
,frostData_exp_score.R
andfrostData_exp_arrScores.R
. See details below. - Files with R functions:
nr_func.R
andcc_func_new_aug23.R
. These are called by all the other scripts. - Illustrations:
temp_fig_curves_fig1.R
,small_illustration_fig2-3.R
,data_map_fig8.R
. These make use of the time-to-hard-freeze datasets and serve illustrative purposes in the paper. These scripts generate figures 1, 2, 3 and 8 in the paper.
All scripts should be run with the script location as working directory (i.e. the scripts/
folder). The working directory may be changed within the script, relative to this location.
The analyses were run on R 4.3.1, and the following packages are explicitly called in the analysis files: data.table
(1.14.8), MASS
(7.3-60), tidyr
(1.3.0), ggplot2
(3.4.2), ggpubr
(0.6.0), RColorBrewer
(1.1-3), rworldmap
(1.3-8), sp
(1.6-1), xtable
(1.8.4), foreach
(1.5.2), doParallel
(1.0.17), parallel
(base R).
The script simulation_tables_figures.R
can be run to reproduce the figures and tables presenting the results of the simulation study in Section 5 of the paper. The script loads intermediary files from the folder simulation_data/
and generates figures 4, 5, 6 and 7 in the paper, and tables A1, A2 and A3.
One may also re-run the simulation study using the scripts sim_largeSample_sc1-8.R
and sim_smallSample_sc9-16.R
. These scripts produse result files of the same type as the files which are found in the folder simulation_data/
. The first, sim_largeSample_sc1-8.R
, provides the code for simulation scenarios 1 to 8 (scenarios with large training datasets). The code specifically runs scenario 1, but the others scenarios may be obtained by simple changes to the simulation parameters at the top of the file (and according to Table 2 in the paper). The second, sim_smallSample_sc9-16.R
, provides the code for simulation scenarios 9 to 16 (scenarios with small training datasets). The code specifically runs scenario 9, but the others scenarios may be obtained by simple changes to the simulation parameters at the top of the file (and according to Table 2 in the paper).
The simulation scripts do not require any particular setup, but are quite time-consuming, particularly the scenarios with small training datasets. Each scenario can take a couple of days to run with the number of replication we used (10^4), but this number can be adjusted by a simple change in the script.
The script frostData_exp_arrScores.R
can be run to reproduce the figures and tables presenting the case-study in Section 6 of the paper. The script requires intermediary files from the folders inside application_results/
and generates figures 9, 10, 11, A1, A2 and A3, and the content for tables 3 and A5. The case-study presents two sets of results: using post-processed forecasts and using raw forecasts. Inside frostData_exp_arrScores.R
one may choose either option at the top of the file: pp
for post-processed and raw
otherwise.
One may also re-run the entire case-study using the scripts frostData_exp_est_comboParams.R
and frostData_exp_score.R
. The script frostData_exp_est_comboParams.R
must be run first and estimates combination parameters for all combinations of locations and target year. This script reads the time-to-hard-freeze datasets inside the application_data/
folder, and writes intermediary result files into empty subfolders (new_results_pp/
and new_results_raw/
) inside that folder. Then, frostData_exp_score.R
, can be run. This script scores all the forecats for each combination of location and target year. The script reads the time-to-hard-freeze datasets and loads the combination parameters from the files inside new_results_pp/
and new_results_raw/
, and writes result files with scores into the same subfolders. Again, note that the scripts may be run using either post-processed or raw forecasts.
The script frostData_exp_arrScores.R
does not require any particular setup, but the scripts frostData_exp_est_comboParams.R
and frostData_exp_score.R
make use of the packages foreach
, doParallel
and parallel
, and cannot be run as written on a Windows computer.
These two scripts take some hours to run.
The following datasets are available in the replication package.
ttf_seasonal9.csv
and ttf_seasonal9_raw.csv
.
Dataset of the time to hard freeze for post-processed and raw seasonal weather forecasts issued September 1 for 1999-2018.
- Time: The day of the first hard freeze after the issue date, or the maximum lead time (if hard freeze never occurs).
- Obs: 1 if hard freeze is observed before the maximum lead time, 0 otherwise (censored).
- forecast_year: The year that the forecast was issued.
- system: Name of the weather center that issued the forecast.
- system_number: System number of the forecast.
- member: Ensemble member number.
- lon_sN: longitude.
- lat_sN:latitude.
ttf_seasonal10.csv
and ttf_seasonal10_raw.csv
are similar datasets, but for forecasts issued on October 1. These files are only used for scoring (inside frostData_exp_score.R
).
ttf_subseasonal_frostblending_1999_2018.csv
and ttf_subseasonal_frostblending_1999_2018_raw.csv
.
Dataset of the time to hard freeze for post-processed and raw subseasonal weather forecasts from ECMWF for 1999-2018 for a set of initialization dates.
- Time: The day of the first hard freeze after the issue date, or the maximum lead time (if hard freeze never occurs).
- Obs: 1 if hard freeze is observed before the maximum lead time, 0 otherwise (censored).
- forecast_year: The year that the forecast was issued.
- init_date: The date when the forecast, with corresponding hindcasts, were released. We consider forecasts released in 2019, hence the init_date year is always 2019. Each init_date in 2019 has a corresponding hindcast with forecast year given in the "forecast_year" column. In the paper we use data with init_date=30/09/2019 and forecast_year in 1999:2018.
- member: Ensemble member number.
- lon_sN: longitude.
- lat_sN:latitude.
ttf_senorge.csv
The observed time to hard freeze for locations in Fennoscandia according to the SeNorge2018 product.
- forecast_year: observation year.
- lon_sN: longitude.
- lat_sN:latitude.
- Time: The day of the first hard freeze after September 1, or day 122 (December 31) if hard freeze never occurs.
- Obs: 1 if hard freeze is observed before the maximum day, 0 otherwise (censored).
Hard freeze is defined as a day with average temperature below 0 degrees Celsius.
seas9.csv
, subseas.csv
and sn.csv
These datasets are subsets of the full temperature data which was used to construct the time-to-hard-freeze datasets described above. They are subsetted to only contain the two locations necessary for producing figure 1 in the paper. They are subsetted because these are extremely large datafiles - their provenance is described in the next section.
Seasonal temperature forecasts initialized on September 1 (in seas9.csv
).
- forecast_year: observation year.
- lon_sN: longitude.
- lat_sN:latitude.
- system: Name of the weather center that issued the forecast.
- member: Ensemble member number.
- target_day: Day after September 1.
- temperature: forecasted mean daily 2-meter temperature (in Kelvin).
- temperature_pp: forecasted mean daily 2-meter temperature after post-processing (in Kelvin). See next section for some explanations around post-processing.
Subseasonal temperature forecasts initialized on September 30 (in subseas.csv
).
- forecast_year: observation year.
- lon_sN: longitude.
- lat_sN:latitude.
- init_date: The date when the forecast, with corresponding hindcasts, were released.
- member: Ensemble member number.
- target_day: Day after September 1.
- temperature: forecasted mean daily 2-meter temperature (in Kelvin).
- temperature_pp: forecasted mean daily 2-meter temperature after post-processing (in Kelvin). See next section for some explanations around post-processing.
The actual temperature observations from SeNorge2018 (in sn.csv
).
- year: observation year.
- lon_sN: longitude.
- lat_sN:latitude.
- st: mean daily 2-meter temperature (in Kelvin).
We use temperature data from three different sources: (1) seasonal forecasts initialized on September 1; (2) subseasonal forecasts initialized on September 30; and (3) the actual time-to-hard-freeze observations which we obtain from the gridded, observation-based data product SeNorge2018 (version 18.12).
Seasonal forecasts of 2-meter temperatures, initialized on September 1, were downloaded from the Copernicus Climate Data Store (CDS) (https://confluence.ecmwf.int/display/CKB/C3S+Seasonal+Forecasts\%3A+dataset+documentation). Ensemble forecasts were available from five weather centres: The European Centre for Medium-Range Weather Forecasts (ECMWF), the Euro-Mediterranean Center for Climate Change (CMCC), Deutscher Wetterdienst (DWD), Météo France and the UK Met Office. Forecasts from 1992-2021 with lead times 5-7 months and spatial resolution
The subseasonal forecasts we use, are the 2-meter temperature forecasts delivered by ECMWF, initialized on September 30 (https://confluence.ecmwf.int/display/S2S/ECMWF+model+description). We use a forecast produced in 2019, with forecast cycle number CY46R1. Hindcasts from the same cycle number for 1999-2018 were available with a spatial resolution of
The temporal resolution of both the subseasonal and seasonal forecasts was 6 hours, but to match the SeNorge data, the data were converted to mean daily temperatures by computing averages of the 6 hour temperatures. To ensure that the forecasted temperatures better match the temperatures observed at the target locations, the seasonal and subseasonal forecasts are post-processed by adjusting the forecast means and standard deviations relative to historical means and standard deviations from SeNorge. See further descriptions in the paper.
The SeNorge product is a dataset developed by the Norwegian Meteorological Institute and contains mean daily temperature data for the mainland of Norway, and parts of Sweden, Finland and Russia (Lussana, C., Tveito, O. & Uboldi, F. (2018). Three-dimensional spatial interpolation of two-meter temperature over Norway. Quarterly Journal of the Royal Meteorological Society 144). The SeNorge data can be downloaded from https://thredds.met.no/thredds/catalog/senorge/seNorge_2018/catalog.html . The product is based on 2-meter temperature observations from meteorological stations that are interpolated to a 1 km
In order to obtain the datasets of forecasted times-to-hard-freeze, we extract the day of the first occurrence of hard freeze after September 1 for each combination of year, location, ensemble member and forecast system, from the raw and post-processed seasonal and subseasonal forecast datasets. Likewise, we otain the observed times-to-hard-freeze data by taking out the first day of observed hard freeze for SeNorge.
The following tables summarize the information on which scipt generates which table and figure. The figure and table numbering is the one used in the paper.
Figure number | File name | script |
---|---|---|
1 | two_temperatures_day1_day122.pdf |
temp_curves_fig1.R |
2 | ex_training_new.pdf |
small_illustration_fig2-3.R |
3 | ex_2018_new.pdf |
small_illustration_fig2-3.R |
4 | sim_pitExpSD_large_jun25.pdf |
simulation_tables_figures.R |
5 | sim_pit_sc1_oct24.pdf sim_skill_large_jun25.pdf |
simulation_tables_figures.R |
6 | sim_pitExpSD_small_jun25.pdf |
simulation_tables_figures.R |
7 | sim_pit_sc9_oct24.pdf sim_skill_small_jun25.pdf |
simulation_tables_figures.R |
8 | dataplot.pdf |
data_map_fig8 |
9 | frost_pit_thea.pdf |
frostData_exp_arrScores.R |
10 | skill_blending_bestmethod_lp_tr.pdf |
frostData_exp_arrScores.R |
11 | skillmap_septref_cc.pdf skillmap_subseasref_cc.pdf weightmap.pdf |
frostData_exp_arrScores.R |
A1 | frost_pit_thea_raw.pdf |
frostData_exp_arrScores.R |
A2 | parameter_values_real_pp.pdf |
frostData_exp_arrScores.R |
A3 | parameter_values_real_raw2.pdf |
frostData_exp_arrScores.R |
Table number | Content | script |
---|---|---|
1 | Description of simulation experiment | - |
2 | Description of simulation experiment | - |
3 | Results from case-study | frostData_exp_arrScores.R |
A1 | Results from simulation study | simulation_tables_figures.R |
A2 | Results from simulation study | simulation_tables_figures.R |
A3 | Results from simulation study | simulation_tables_figures.R |
A4 | Description of datasources | - |
A5 | Results from case-study | frostData_exp_arrScores.R |