Add mad4hatter/scientific overview here
- To run the main Mad4Hatter workflow, you'll need to set up your Terra
workspace with the appropriate metadata tables. Since the workflow is designed to run once per dataset (not once
per sample), there will be two tables required in your Terra workspace. For this example, we'll call the first
table
sampleand the second talesample_set. However, these can be customized if you'd prefer different names. - For the
sampletable, while you can always add additional columns, the minimum that will be required are the following columns:
sample_id- The sample IDread1- The path to the forward FASTQ fileread2The path to the reverse FASTQ file
- The easiest way to add this to Terra is to create a tsv (for example, called
sample.tsv). Ensure the primary key header is labeled using the name of the file, followed with_id(for example,sample_idin this case). The remaining columns can have any headers that make sense for the metadata ifread1andread2are not desired. Any additional columns can be added to the tsv as well, if desired. - Once you have created the tsv, navigate to the "Data" tab in your Terra workspace, and click on the
"Import Data" button. Select the tsv file you created, and Terra will create a new table in your workspace
with the contents of the tsv. This table will be called
sample(or whatever you named the tsv file). - Next, you'll need to create a
sample_settable which is how Terra will know which samples to process as part of one dataset. If you're following the naming convention in this example, you'll need the following headers exactly:
sample_set_id- This will be the dataset name - use the same name for all rows (for exampleMyDataset1)sample- This will be all samples to be included in the dataset - each sample should be listed in its own row
- As with the
sampletable, create a tsv (for example, calledsample_set.tsv) with the appropriate headers and contents. Then, navigate to the "Data" tab in your Terra workspace, and click on the "Import Data" button. Select the tsv file you created, and Terra will create a new table in your workspace with the contents of the tsv. This table will be calledsample_set(or whatever you named the tsv file). - Once both tables are created, you can navigate to the "Data" tab in your Terra workspace to view and verify that the tables have been created correctly with the appropriate contents.
- Next, import your workflows (see directions below).
There are three workflows available in this repository (Mad4Hatter, Mad4HatterPostProcessing, and Mad4HatterQcOnly), which can be run via Terra. To import your desired workflow into your Terra workspace, please follow the instructions below:
- Create a new Terra workspace, use an existing one, or clone an existing one. Note that if you're cloning an existing workspace that already has your desired workflow(s), you can skip the rest of these steps.
- Navigate to the "Workflows" tab in your Terra workspace.
- Click on "Find a Workflow" and select the "Dockstore.org" option. This will bring you to the Dockstore website.
- In Dockstore, search for "MAD4HatTeR" and select the appropriate workflow from the search results.
- In the new page that opens, under "Launch with", select Terra.
- Enter your destination workspace name in the new page that opens and select "Import".
- You will be redirected back to your Terra workspace, where you can configure and run the workflow (see directions below).
- Prerequisites include setting up your metadata and importing the workflow into your Terra workspace.
- Once those steps are complete, navigate to the "Workflows" tab in your Terra workspace.
- If running Mad4Hatter workflow, select the workflow under the "Workflows" tab. This will bring up the configuration page.
First, select the "Run workflow(s) with inputs defined by data table" option. Under "Step 1: Select data table",
choose the
sample_settable (or whatever you named this table in the earlier steps). Under "Step 2: Select Data", toggle the "Chose specific sample_sets to process" option in the popup, and then select your desired dataset. Click "Ok". - Next, you'll have to configure your inputs. The two inputs to pay attention to specifically are
forward_fastqsandreverse_fastqs. The "Input value" forforward_fastqsshould bethis.samples.read1(read1is the column header, so if you named it something different, use that instead). The input forreverse_fastqsshould bethis.samples.read2(or whatever you named that column if notread2). - The rest of the inputs can be configured as desired. If you uploaded additional columns to your
sample_settable, you can use those as inputs as well here by usingthis.samples.{column_name}. If you uploaded additional columns to yoursampletable, you can use those as input as well here by usingthis.{column_name}. Otherwise, you can put in literal hard-coded strings and file paths as needed. - Once all inputs are configured, you can click "Save" and then "Launch" to start the workflow. If everything was configured correctly, you'll see "You are launching 1 workflow run in this submission." in the popup. If you see that more than one workflow is being launched, go back through the configuration steps and ensure that a "set" of samples has been selected, as this workflow is designed to run once per dataset.
- After launching, you can monitor the progress of the workflow in the "Submission History" tab. By default, Terra only displays workflows that have been launched in the past 30 days. If you want to see submission history from all time, make sure you select "All submissions" from the Date range drop down at the top of the page.