-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Context
This issue was originally brought up in nextstrain/measles#9 (review) and I wanted to document options and consensus on how we should standardize Nextstrain config schemas for files. This is specifically for discussing config files required for workflows such as reference.gb, exclude.txt, include.txt, etc.
Historically, these files were provided at the top of the Snakefile. The files are grouped together instead of within rule specific params because a single file may be used by multiple rules.
To make the files easily configurable, the ncov workflow moved the files to the config YAML under a top level files key. Other pathogen repos have taken similar paths of providing config files through the config YAML, but have varying schemas. mpox uses top level file name keys (i.e. drops the files key) and it also includes rule specific file name keys:
auspice_config: "config/mpxv/auspice_config.json"
include: "config/mpxv/include.txt"
reference: "config/reference.fasta"
...
filter:
exclude: "config/exclude_accessions.txt"rsv uses the top level files key and top level file name keys
exclude: "config/outliers.txt"
description: "config/description.md"
...
files:
color_schemes: "config/colors.tsv"
auspice_config: "config/auspice_config.json"Open questions
- Is providing config files through the config YAML still the way forward?
- Do we want to standardize config file schemas across pathogens? Should this be decided per pathogen repo?
- If we do standardize config schemas, should it use the top level
fileskey, top level file name keys, or rule specific file name keys? Are there other suggested schemas?