Skip to content

Improve / standardize the way that we pass through default parameters and build-specific parameters #1131

@trvrb

Description

@trvrb

Context

Currently, the workflow is a bit of mess when it comes to how one can specify parameters. There seems to be 3 different approaches:

1 – parameters.yaml only

For the frequencies rule, the parameter pivot_interval is retrieved via config["frequencies"]["pivot_interval"]. This makes it grab exactly what's in parameters.yaml under frequencies:pivot_interval. If someone tries to specify a 4-week build-specific pivot_interval via

frequencies:
  global_all-time:
    pivot_interval: 4

The workflow will secretly just keep using the pivot_interval specified in paramaters.yaml.

Most parameters in the workflow work like this.

2 – builds.yaml override with necessity of default

For the traits rule, the parameter sampling_bias_correction is retrieved via _get_sampling_bias_correction_for_wildcards which is defined as

def _get_sampling_bias_correction_for_wildcards(wildcards):
    if wildcards.build_name in config["traits"] and 'sampling_bias_correction' in config["traits"][wildcards.build_name]:
        return config["traits"][wildcards.build_name]["sampling_bias_correction"]
    else:
        return config["traits"]["default"]["sampling_bias_correction"]

Ie it first looks for a build-specific list of traits:{build_name}:sampling_bias_correction and if doesn't find it, it expects traits:default:sampling_bias_correction in builds.yaml or parameters.yaml. This is described in the docs as

   traits:
     default:
       sampling_bias_correction: 2.5
       columns: ["country"]
     washington:
       # Override default sampling bias correction for
       # "washington" build and continue to use default
       # trait columns.
       sampling_bias_correction: 5.0

This works, but requires parameters.yaml to look like:

traits:
  default:
    sampling_bias_correction: 2.5
    columns: ["country"]

with an extra default key compared to other entries in parameters.yaml.

This strategy is only used for the traits rule.

3 – builds.yaml override without default

For the frequencies rule, the parameter min_date is retrieved via _get_min_date_for_frequencies which is defined as

def _get_min_date_for_frequencies(wildcards):
    if wildcards.build_name in config["frequencies"] and "min_date" in config["frequencies"][wildcards.build_name]:
        return config["frequencies"][wildcards.build_name]["min_date"]
    elif "frequencies" in config and "min_date" in config["frequencies"]:
        return config["frequencies"]["min_date"]
    else:
        # If not explicitly specified, default to 1 year back from the present
        min_date_cutoff = datetime.date.today() - datetime.timedelta(weeks=52)
        return numeric_date(
            min_date_cutoff
        )

Ie it starts with trying to grab build-specific frequencies:{build_name}:min_date from builds.yaml. If this doesn't exist, it looks for frequencies:min_date in builds.yaml or parameters.yaml and if this doesn't exist it directly returns a sensible default.

This strategy is only used for the frequencies rule.

Description

I believe that we should replace the strategy 2 above used only in traits rule with a setup like 3 above. This is what I did when I realized we weren't collecting narrow_bandwidth properly. In PR #1130 I followed strategy 3 to specify narrow_bandwidth as:

def _get_narrow_bandwidth_for_wildcards(wildcards):
    # check if builds.yaml contains frequencies:{build_name}:narrow_bandwidth
    if wildcards.build_name in config["frequencies"] and 'narrow_bandwidth' in config["frequencies"][wildcards.build_name]:
        return config["frequencies"][wildcards.build_name]["narrow_bandwidth"]
    # check if parameters.yaml contains frequencies:narrow_bandwidth
    elif "frequencies" in config and "narrow_bandwidth" in config["frequencies"]:
        return config["frequencies"]["narrow_bandwidth"]
    # else return augur frequencies default value
    else:
        return 0.0833

We have the issue that if we swap _get_sampling_bias_correction_for_wildcards to use strategy 3, we'll need to update parameters.yaml to read

traits:
  sampling_bias_correction: 2.5
  columns: ["country"]

This will break custom profiles that external users are running. We could provide backwards compatibility however by looking first for traits:sampling_bias_correction and then for traits:defaults:sampling_bias_correction.

Does this seem reasonable?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions