Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype a linear version of our Snakemake workflow in WDL #809

Closed
huddlej opened this issue Dec 8, 2021 · 7 comments
Closed

Prototype a linear version of our Snakemake workflow in WDL #809

huddlej opened this issue Dec 8, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@huddlej
Copy link
Contributor

huddlej commented Dec 8, 2021

Context

We would like to support a canonical Nextstrain workflow for SARS-CoV-2 analyses that can run on Terra, DNANexus, and other similar web-based, cloud-backed platforms. However, these platforms require workflows to be implemented in the WDL language and do not support Snakemake (the language we use for our current ncov workflow).

We can’t liftover our current Snakemake workflow to WDL because it relies on multiple Snakemake-specific features including dynamic graph definitions associated with subsampling and date-based filters that are dynamically calculated with Python logic. We have pull requests to address the subsampling and date filter issues, but these are both still under review (and subsampling logic is likely to change soon).

In addition to supporting users who would like to run our specific workflow, a WDL implementation would allow other groups like those at the Broad Institute and Theiagen to compose their own workflows from the WDL tasks we publish. This WDL implementation would allow us to push new features like Nextalign alignments, Nextclade diagnostic, fitness annotations, etc. out to more users. This kind of support for Terra users is especially important given the recent federal support dedicated to helping public health labs move their workflows to Terra.

Description

Instead of lifting over the Snakemake workflow directly, we could prototype a simpler linear workflow that expects a fixed number of inputs (e.g., only the Nextstrain open data hosted on data.nextstrain.org), skips subsampling, and, optionally, deploys to a Nextstrain Group, for example.

This would prototype would allow us to run more realistic workflows than our previous WDL prototype and help us identify additional parts of our Snakemake workflow that could be linearized or converted from Snakemake logic into standalone scripts that could run in multiple workflow languages.

Alternate approaches we have discussed include:

  • wrapping the entire Snakemake workflow in a single WDL rule
  • implementing a WDL rule per Snakemake rule that calls the corresponding Snakemake rule

See this epic Slack thread for more context.

@huddlej huddlej added the enhancement New feature or request label Dec 8, 2021
@j23414 j23414 moved this from Committed to In Progress in Nextstrain planning (archived) Dec 9, 2021
@j23414
Copy link

j23414 commented Jan 5, 2022

Notes from 2022-01-03 meeting with huddlej and j23414 (slides):

Started implementation of both options in two separate task.wdl files. (Can simply delete the one we don't use later.)

Option 1 is shown below with some notes

task nextstrain_build {
    input {
        File input_dir
        String dockerImage = "nextstrain/base:latest"
    }
    command {
        PROC=`nproc`        # <= select max number of available threads
        nextstrain build --cpus $PROC --native "${input_dir}"
        cp -rf "${input_dir}/results" results    # <= file copying is necessary due to how cromwell does caching
        cp -rf "${input_dir}/auspice" auspice  
    }
    output {
        File auspice_dir = "auspice"            # <= could return the auspice.json file instead
    }
    runtime {
        docker: dockerImage
    }
}

To import into Terra: dockstore/mini_wdl

Working on:

  • Terra vs local cromwell run behavior
  • Add a wget ncov task to pass in scripts, or add it to the Dockerfile
  • Identify reasonable Input / Output behavior in Terra

@j23414
Copy link

j23414 commented Jan 27, 2022

Current wdl pathogen build: https://dockstore.org/workflows/github.com/j23414/wdl_pathogen_build:main?tab=info

  • Adding video documentation
  • Finding beta testers

@j23414
Copy link

j23414 commented Feb 1, 2022

@huddlej
Copy link
Contributor Author

huddlej commented Feb 15, 2022

@j23414 Given that the WDL pathogen build workflow is working so well on Terra, we should migrate that workflow to this repo, so we can continue to refine the interface more collaboratively. We can start by adding the WDL workflow in a wdl branch, placing the files in the workflow subdirectory in workflow/wdl. This would allow you to test publishing the workflow through Dockstore before we merge into master.

@j23414
Copy link

j23414 commented Feb 15, 2022

Sounds great! I added a wdl branch, and linked it to Dockstore. Feel free to launch and test from the following link:

@huddlej
Copy link
Contributor Author

huddlej commented Feb 16, 2022

Cool, thank you! This worked well for me from Terra, so I'm happy to see the wdl branch merged into master. Then we can close this issue and move on to other refinements.

@huddlej
Copy link
Contributor Author

huddlej commented Mar 2, 2022

@j23414 Whenever you are ready, I think the wdl branch can be merged and we can close this issue.

@j23414 j23414 closed this as completed Mar 2, 2022
Repository owner moved this from In Progress to Done in Nextstrain planning (archived) Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

3 participants