Skip to content

Prototype a linear version of our Snakemake workflow in WDL #809

@huddlej

Description

@huddlej

Context

We would like to support a canonical Nextstrain workflow for SARS-CoV-2 analyses that can run on Terra, DNANexus, and other similar web-based, cloud-backed platforms. However, these platforms require workflows to be implemented in the WDL language and do not support Snakemake (the language we use for our current ncov workflow).

We can’t liftover our current Snakemake workflow to WDL because it relies on multiple Snakemake-specific features including dynamic graph definitions associated with subsampling and date-based filters that are dynamically calculated with Python logic. We have pull requests to address the subsampling and date filter issues, but these are both still under review (and subsampling logic is likely to change soon).

In addition to supporting users who would like to run our specific workflow, a WDL implementation would allow other groups like those at the Broad Institute and Theiagen to compose their own workflows from the WDL tasks we publish. This WDL implementation would allow us to push new features like Nextalign alignments, Nextclade diagnostic, fitness annotations, etc. out to more users. This kind of support for Terra users is especially important given the recent federal support dedicated to helping public health labs move their workflows to Terra.

Description

Instead of lifting over the Snakemake workflow directly, we could prototype a simpler linear workflow that expects a fixed number of inputs (e.g., only the Nextstrain open data hosted on data.nextstrain.org), skips subsampling, and, optionally, deploys to a Nextstrain Group, for example.

This would prototype would allow us to run more realistic workflows than our previous WDL prototype and help us identify additional parts of our Snakemake workflow that could be linearized or converted from Snakemake logic into standalone scripts that could run in multiple workflow languages.

Alternate approaches we have discussed include:

  • wrapping the entire Snakemake workflow in a single WDL rule
  • implementing a WDL rule per Snakemake rule that calls the corresponding Snakemake rule

See this epic Slack thread for more context.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions