Here we provide the relevant code from _targets.R
and describe what
they do.
-
Create the age-specific population data you would like to generate synthetic contact matrices for.
tar_target( in_data_wpp, wpp_age(years = 2015) ),
If you would like to use your own age-specific population data, load your data as a target object here using a function such as
read_csv()
. Your data must have the following variables:country
for country name,year
for the year of survey,lower.age.limit
for the lower age limit of each age group, andpopulation
for the size of each age group.
In this instance, we use the
wpp_age()
function from thesocialmixr
package to obtain our population data. The data is saved in the target objectin_data_wpp
. -
Clean the data.
tar_target( cleaned_wpp, in_data_wpp %>% mutate(country = case_when( # Renames the following, otherwise picked up as "China" country == "China, Hong Kong SAR" ~ "Hong Kong", country == "China, Taiwan province of China" ~ "Taiwan, Province of China", .default = country )) %>% # The following is otherwise picked up as "China" filter(country != "Less developed regions, excluding China") ),
This step sorts out issues unique to the WPP dataset. In this instance, we manually rename the regions Hong Kong and Taiwan as well as remove observations labelled “Less developed regions, excluding China.” If this was not done manually, the next step will re-label these observations as “China,” which is problematic for the subsequent steps of the analysis.
If you are using your own data within this workflow, we recommend you explore and manually clean your data in this step.
-
Standardise the country names.
tar_target( standardised_wpp_data, standardise_country_names( cleaned_wpp, column_name = "country", conversion_destination_code = "iso3c") ),
The function
standardise_country_names()
relies on thecountryname()
function in thecountrycode
package. The argumentconversion_destination_code
defaults to ISO-3 codes ("iso3c"
) but can be changed to whatever you like.1 The converted, standardised country names are saved as a new variable in the existing data frame calledstd_country_names
. -
Check excluded region names.
tar_target( excluded_names, standardised_wpp_data %>% filter(is.na(std_country_names)) %>% select(country) %>% distinct(country) ),
As the population data obtained from
wpp_age()
also includes global and regional population data in addition to country-level population data, we would like to exclude these. Names that do not match a country name are labelled as missing (NA
) in thestd_country_names
variable. These values are returned in a data frame for manual checking.To obtain this data frame, use
tar_load(excluded_names)
. -
Split the data frame of standardised country names into lists.
tar_target( list_of_data, split( standardised_wpp_data, standardised_wpp_data$std_country_names) ),
-
Select which countries you would like to create synthetic contact matrices for.
tar_target( selection_of_countries, list_of_data[1:200] ),
list_of_data[1:200]
returns all 200 countries in the WPP data. Change these numbers if you would only like a subset of this 200.Alternatively, use
dplyr::filter()
if you have a list of country names you would like to filter. -
Convert our data to a
conmat
population data.tar_target( population_data, create_population_data(selection_of_countries) ),
The function
create_population_data()
uses theas_conmat_population()
function fromconmat
. -
Generate synthetic contact matrices from our population data.
tar_target( contact_matrices_data, create_contact_matrices( population_data = population_data, start_age = 0, end_age = 80 ) ),
If you would like to adjust the age limits for the synthetic contact matrices, this can be done by changing the
start_age
andend_age
arguments in thecreate_contact_matrices()
function. The default is 0 to 80+ years. If you would like to change it to 0 to 60+, you would changeend_age = 60
.The
create_contact_matrices()
function uses theextrapolate_polymod()
function fromconmat
. -
Save the generated synthetic contact matrices as csv files.
tar_target( csv_output, save_conmat_as_csv( matrix_list = contact_matrices_data, path = "./output-contact-matrices", subfolder = FALSE ), format = "file" ),
The
path
argument allows you to specify where you would like to save these csv files; change this if you would like to change where the files should be saved. The folder specified inpath
must already exist.The
subfolder
argument defaults toFALSE
, which means the generated contact matrices for all countries would be saved within the specified path without subdirectories. In other words, all csv files would be saved in the one folder.Alternatively, if the
subfolder
argument were set toTRUE
, the five resulting contact matrices for each country are saved in its own subdirectory. In other words, the five synthetic contact matrices generated (for the environments: all, home, other, school, and work) for one country–as an example, Australia–is saved in its own subfolder labelled ‘AUS’ within the path specified.
Footnotes
-
Refer to the
countrycodes
documentation or type?codelist
for a list of available codes. ↩