-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype a linear version of our Snakemake workflow in WDL #809
Comments
Notes from Started implementation of both options in two separate task.wdl files. (Can simply delete the one we don't use later.)
Option 1 is shown below with some notes
To import into Terra: dockstore/mini_wdl Working on:
|
Current wdl pathogen build: https://dockstore.org/workflows/github.com/j23414/wdl_pathogen_build:main?tab=info
|
|
@j23414 Given that the WDL pathogen build workflow is working so well on Terra, we should migrate that workflow to this repo, so we can continue to refine the interface more collaboratively. We can start by adding the WDL workflow in a |
Sounds great! I added a |
Cool, thank you! This worked well for me from Terra, so I'm happy to see the |
@j23414 Whenever you are ready, I think the |
Context
We would like to support a canonical Nextstrain workflow for SARS-CoV-2 analyses that can run on Terra, DNANexus, and other similar web-based, cloud-backed platforms. However, these platforms require workflows to be implemented in the WDL language and do not support Snakemake (the language we use for our current ncov workflow).
We can’t liftover our current Snakemake workflow to WDL because it relies on multiple Snakemake-specific features including dynamic graph definitions associated with subsampling and date-based filters that are dynamically calculated with Python logic. We have pull requests to address the subsampling and date filter issues, but these are both still under review (and subsampling logic is likely to change soon).
In addition to supporting users who would like to run our specific workflow, a WDL implementation would allow other groups like those at the Broad Institute and Theiagen to compose their own workflows from the WDL tasks we publish. This WDL implementation would allow us to push new features like Nextalign alignments, Nextclade diagnostic, fitness annotations, etc. out to more users. This kind of support for Terra users is especially important given the recent federal support dedicated to helping public health labs move their workflows to Terra.
Description
Instead of lifting over the Snakemake workflow directly, we could prototype a simpler linear workflow that expects a fixed number of inputs (e.g., only the Nextstrain open data hosted on data.nextstrain.org), skips subsampling, and, optionally, deploys to a Nextstrain Group, for example.
This would prototype would allow us to run more realistic workflows than our previous WDL prototype and help us identify additional parts of our Snakemake workflow that could be linearized or converted from Snakemake logic into standalone scripts that could run in multiple workflow languages.
Alternate approaches we have discussed include:
See this epic Slack thread for more context.
The text was updated successfully, but these errors were encountered: