Replication package for "Combining predictive distributions for time-to-event outcomes in meteorology"

C. Cunen ([email protected]) and T. Roksvåg ([email protected]), October 2025.

Overview & contents

This replication package provides all code related to the paper "Combining predictive distributions for time-to-event outcomes in meteorology" by C. Cunen, T. Roksvåg, C. Heinrich‐Mertsching, and A. Lenkoski.

All 14 figures in the paper can be reproduced from the code provided. The code also provides the content of 5 tables in the paper.

This repository contains the following five folders:

scripts/: R scripts to run simulation and case-study and create figures and tables;
figures/: folder of generated figures as PDF files;
simulation_data/: folder containing RData files of the simulation results;
application_data/: folder containing the time-to-hard-freeze datasets used in the case-study and as illustrations;
application_results/: folder containing intermediary files for the case-study.

The provided R scripts (inside scripts/) fall into four groups:

The simulation study: the scripts sim_largeSample_sc1-8.R, sim_smallSample_sc9-16.R and simulation_tables_figures.R. See details below.
The case-study: the scripts frostData_exp_est_comboParams.R, frostData_exp_score.R and frostData_exp_arrScores.R. See details below.
Files with R functions: nr_func.R and cc_func_new_aug23.R. These are called by all the other scripts.
Illustrations: temp_fig_curves_fig1.R, small_illustration_fig2-3.R, data_map_fig8.R. These make use of the time-to-hard-freeze datasets and serve illustrative purposes in the paper. These scripts generate figures 1, 2, 3 and 8 in the paper.

General instructions

All scripts should be run with the script location as working directory (i.e. the scripts/ folder). The working directory may be changed within the script, relative to this location.

The analyses were run on R 4.3.1, and the following packages are explicitly called in the analysis files: data.table (1.14.8), MASS (7.3-60), tidyr (1.3.0), ggplot2 (3.4.2), ggpubr (0.6.0), RColorBrewer (1.1-3), rworldmap (1.3-8), sp (1.6-1), xtable (1.8.4), foreach (1.5.2), doParallel (1.0.17), parallel (base R).

Reproducing the simulation study

The script simulation_tables_figures.Rcan be run to reproduce the figures and tables presenting the results of the simulation study in Section 5 of the paper. The script loads intermediary files from the folder simulation_data/ and generates figures 4, 5, 6 and 7 in the paper, and tables A1, A2 and A3.

One may also re-run the simulation study using the scripts sim_largeSample_sc1-8.R and sim_smallSample_sc9-16.R. These scripts produse result files of the same type as the files which are found in the folder simulation_data/. The first, sim_largeSample_sc1-8.R, provides the code for simulation scenarios 1 to 8 (scenarios with large training datasets). The code specifically runs scenario 1, but the others scenarios may be obtained by simple changes to the simulation parameters at the top of the file (and according to Table 2 in the paper). The second, sim_smallSample_sc9-16.R, provides the code for simulation scenarios 9 to 16 (scenarios with small training datasets). The code specifically runs scenario 9, but the others scenarios may be obtained by simple changes to the simulation parameters at the top of the file (and according to Table 2 in the paper).

The simulation scripts do not require any particular setup, but are quite time-consuming, particularly the scenarios with small training datasets. Each scenario can take a couple of days to run with the number of replication we used (10^4), but this number can be adjusted by a simple change in the script.

Reproducing the case-study

The script frostData_exp_arrScores.R can be run to reproduce the figures and tables presenting the case-study in Section 6 of the paper. The script requires intermediary files from the folders inside application_results/ and generates figures 9, 10, 11, A1, A2 and A3, and the content for tables 3 and A5. The case-study presents two sets of results: using post-processed forecasts and using raw forecasts. Inside frostData_exp_arrScores.R one may choose either option at the top of the file: pp for post-processed and raw otherwise.

One may also re-run the entire case-study using the scripts frostData_exp_est_comboParams.R and frostData_exp_score.R. The script frostData_exp_est_comboParams.R must be run first and estimates combination parameters for all combinations of locations and target year. This script reads the time-to-hard-freeze datasets inside the application_data/ folder, and writes intermediary result files into empty subfolders (new_results_pp/ and new_results_raw/) inside that folder. Then, frostData_exp_score.R, can be run. This script scores all the forecats for each combination of location and target year. The script reads the time-to-hard-freeze datasets and loads the combination parameters from the files inside new_results_pp/ and new_results_raw/, and writes result files with scores into the same subfolders. Again, note that the scripts may be run using either post-processed or raw forecasts.

The script frostData_exp_arrScores.R does not require any particular setup, but the scripts frostData_exp_est_comboParams.R and frostData_exp_score.R make use of the packages foreach, doParallel and parallel, and cannot be run as written on a Windows computer. These two scripts take some hours to run.

Description of datasets

The following datasets are available in the replication package.

Seasonal time-to-hard-freeze forecasts

ttf_seasonal9.csv and ttf_seasonal9_raw.csv.

Dataset of the time to hard freeze for post-processed and raw seasonal weather forecasts issued September 1 for 1999-2018.

Time: The day of the first hard freeze after the issue date, or the maximum lead time (if hard freeze never occurs).
Obs: 1 if hard freeze is observed before the maximum lead time, 0 otherwise (censored).
forecast_year: The year that the forecast was issued.
system: Name of the weather center that issued the forecast.
system_number: System number of the forecast.
member: Ensemble member number.
lon_sN: longitude.
lat_sN:latitude.

ttf_seasonal10.csv and ttf_seasonal10_raw.csv are similar datasets, but for forecasts issued on October 1. These files are only used for scoring (inside frostData_exp_score.R).

Subseasonal time-to-hard-freeze forecasts

ttf_subseasonal_frostblending_1999_2018.csv and ttf_subseasonal_frostblending_1999_2018_raw.csv.

Dataset of the time to hard freeze for post-processed and raw subseasonal weather forecasts from ECMWF for 1999-2018 for a set of initialization dates.

Time: The day of the first hard freeze after the issue date, or the maximum lead time (if hard freeze never occurs).
Obs: 1 if hard freeze is observed before the maximum lead time, 0 otherwise (censored).
forecast_year: The year that the forecast was issued.
init_date: The date when the forecast, with corresponding hindcasts, were released. We consider forecasts released in 2019, hence the init_date year is always 2019. Each init_date in 2019 has a corresponding hindcast with forecast year given in the "forecast_year" column. In the paper we use data with init_date=30/09/2019 and forecast_year in 1999:2018.
member: Ensemble member number.
lon_sN: longitude.
lat_sN:latitude.

Observed times-to-hard-freeze

ttf_senorge.csv

The observed time to hard freeze for locations in Fennoscandia according to the SeNorge2018 product.

forecast_year: observation year.
lon_sN: longitude.
lat_sN:latitude.
Time: The day of the first hard freeze after September 1, or day 122 (December 31) if hard freeze never occurs.
Obs: 1 if hard freeze is observed before the maximum day, 0 otherwise (censored).

Hard freeze is defined as a day with average temperature below 0 degrees Celsius.

Temperature datasets

seas9.csv, subseas.csv and sn.csv

These datasets are subsets of the full temperature data which was used to construct the time-to-hard-freeze datasets described above. They are subsetted to only contain the two locations necessary for producing figure 1 in the paper. They are subsetted because these are extremely large datafiles - their provenance is described in the next section.

Seasonal temperature forecasts initialized on September 1 (in seas9.csv).

forecast_year: observation year.
lon_sN: longitude.
lat_sN:latitude.
system: Name of the weather center that issued the forecast.
member: Ensemble member number.
target_day: Day after September 1.
temperature: forecasted mean daily 2-meter temperature (in Kelvin).
temperature_pp: forecasted mean daily 2-meter temperature after post-processing (in Kelvin). See next section for some explanations around post-processing.

Subseasonal temperature forecasts initialized on September 30 (in subseas.csv).

forecast_year: observation year.
lon_sN: longitude.
lat_sN:latitude.
init_date: The date when the forecast, with corresponding hindcasts, were released.
member: Ensemble member number.
target_day: Day after September 1.
temperature: forecasted mean daily 2-meter temperature (in Kelvin).
temperature_pp: forecasted mean daily 2-meter temperature after post-processing (in Kelvin). See next section for some explanations around post-processing.

The actual temperature observations from SeNorge2018 (in sn.csv).

year: observation year.
lon_sN: longitude.
lat_sN:latitude.
st: mean daily 2-meter temperature (in Kelvin).

Data provenance and processing

We use temperature data from three different sources: (1) seasonal forecasts initialized on September 1; (2) subseasonal forecasts initialized on September 30; and (3) the actual time-to-hard-freeze observations which we obtain from the gridded, observation-based data product SeNorge2018 (version 18.12).

Seasonal forecasts of 2-meter temperatures, initialized on September 1, were downloaded from the Copernicus Climate Data Store (CDS) (https://confluence.ecmwf.int/display/CKB/C3S+Seasonal+Forecasts\%3A+dataset+documentation). Ensemble forecasts were available from five weather centres: The European Centre for Medium-Range Weather Forecasts (ECMWF), the Euro-Mediterranean Center for Climate Change (CMCC), Deutscher Wetterdienst (DWD), Météo France and the UK Met Office. Forecasts from 1992-2021 with lead times 5-7 months and spatial resolution $1 ^\circ \times 1 ^\circ$, were downloaded with corresponding hindcasts for the same historical period (1992-2021). The number of ensemble members varies among the forecasting systems and years, but each year there are approximately $150$ ensemble members in total for the five systems combined.

The subseasonal forecasts we use, are the 2-meter temperature forecasts delivered by ECMWF, initialized on September 30 (https://confluence.ecmwf.int/display/S2S/ECMWF+model+description). We use a forecast produced in 2019, with forecast cycle number CY46R1. Hindcasts from the same cycle number for 1999-2018 were available with a spatial resolution of $1.5^\circ \times 1.5 ^\circ$, 11 ensemble members and a maximum lead time of 46 days (November 15).

The temporal resolution of both the subseasonal and seasonal forecasts was 6 hours, but to match the SeNorge data, the data were converted to mean daily temperatures by computing averages of the 6 hour temperatures. To ensure that the forecasted temperatures better match the temperatures observed at the target locations, the seasonal and subseasonal forecasts are post-processed by adjusting the forecast means and standard deviations relative to historical means and standard deviations from SeNorge. See further descriptions in the paper.

The SeNorge product is a dataset developed by the Norwegian Meteorological Institute and contains mean daily temperature data for the mainland of Norway, and parts of Sweden, Finland and Russia (Lussana, C., Tveito, O. & Uboldi, F. (2018). Three-dimensional spatial interpolation of two-meter temperature over Norway. Quarterly Journal of the Royal Meteorological Society 144). The SeNorge data can be downloaded from https://thredds.met.no/thredds/catalog/senorge/seNorge_2018/catalog.html . The product is based on 2-meter temperature observations from meteorological stations that are interpolated to a 1 km $\times$ 1 km grid. Here, we use the SeNorge data as a substitute for local, observed surface temperatures. See further descriptions in the paper.

In order to obtain the datasets of forecasted times-to-hard-freeze, we extract the day of the first occurrence of hard freeze after September 1 for each combination of year, location, ensemble member and forecast system, from the raw and post-processed seasonal and subseasonal forecast datasets. Likewise, we otain the observed times-to-hard-freeze data by taking out the first day of observed hard freeze for SeNorge.

Overview of tables and figures

The following tables summarize the information on which scipt generates which table and figure. The figure and table numbering is the one used in the paper.

Figure number	File name	script
1	`two_temperatures_day1_day122.pdf`	`temp_curves_fig1.R`
2	`ex_training_new.pdf`	`small_illustration_fig2-3.R`
3	`ex_2018_new.pdf`	`small_illustration_fig2-3.R`
4	`sim_pitExpSD_large_jun25.pdf`	`simulation_tables_figures.R`
5	`sim_pit_sc1_oct24.pdf` `sim_skill_large_jun25.pdf`	`simulation_tables_figures.R`
6	`sim_pitExpSD_small_jun25.pdf`	`simulation_tables_figures.R`
7	`sim_pit_sc9_oct24.pdf` `sim_skill_small_jun25.pdf`	`simulation_tables_figures.R`
8	`dataplot.pdf`	`data_map_fig8`
9	`frost_pit_thea.pdf`	`frostData_exp_arrScores.R`
10	`skill_blending_bestmethod_lp_tr.pdf`	`frostData_exp_arrScores.R`
11	`skillmap_septref_cc.pdf` `skillmap_subseasref_cc.pdf` `weightmap.pdf`	`frostData_exp_arrScores.R`
A1	`frost_pit_thea_raw.pdf`	`frostData_exp_arrScores.R`
A2	`parameter_values_real_pp.pdf`	`frostData_exp_arrScores.R`
A3	`parameter_values_real_raw2.pdf`	`frostData_exp_arrScores.R`

Table number	Content	script
1	Description of simulation experiment	-
2	Description of simulation experiment	-
3	Results from case-study	`frostData_exp_arrScores.R`
A1	Results from simulation study	`simulation_tables_figures.R`
A2	Results from simulation study	`simulation_tables_figures.R`
A3	Results from simulation study	`simulation_tables_figures.R`
A4	Description of datasources	-
A5	Results from case-study	`frostData_exp_arrScores.R`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Replication package for "Combining predictive distributions for time-to-event outcomes in meteorology"

Overview & contents

General instructions

Reproducing the simulation study

Reproducing the case-study

Description of datasets

Seasonal time-to-hard-freeze forecasts

Subseasonal time-to-hard-freeze forecasts

Observed times-to-hard-freeze

Temperature datasets

Data provenance and processing

Overview of tables and figures

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
application_data		application_data
application_results		application_results
figures		figures
scripts		scripts
simulation_data		simulation_data
README.md		README.md

NorskRegnesentral/CombiningTimeToEvent

Folders and files

Latest commit

History

Repository files navigation

Replication package for "Combining predictive distributions for time-to-event outcomes in meteorology"

Overview & contents

General instructions

Reproducing the simulation study

Reproducing the case-study

Description of datasets

Seasonal time-to-hard-freeze forecasts

Subseasonal time-to-hard-freeze forecasts

Observed times-to-hard-freeze

Temperature datasets

Data provenance and processing

Overview of tables and figures

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages