title |
---|
Setup |
{% capture generic_tab_content %}
We will create a new git repository and import a library of existing tool definitions that will help us build our workflow.
Create a new empty git repository to hold our workflow with this command:
git init rnaseq-cwl-training-exercises
{: .language-bash }
Next, import bio-cwl-tools with this command:
git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
{: .language-bash }
Start from your rnaseq-cwl-exercises directory.
mkdir rnaseq
cd rnaseq
wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/
{: .language-bash }
Running STAR requires index files generated from the reference.
This is a rather large download (4 GB). Depending on your bandwidth, it may be faster to generate it yourself.
mkdir hg19-chr1-STAR-index
cd hg19-chr1-STAR-index
wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/
{: .language-bash }
Create chr1-star-index.yaml
:
InputFiles:
- class: File
location: rnaseq/reference_data/chr1.fa
format: http://edamontology.org/format_1930
IndexName: 'hg19-chr1-STAR-index'
Gtf:
class: File
location: rnaseq/reference_data/chr1-hg19_genes.gtf
Overhang: 99
{: .language-yaml }
Generate the index with your local cwl-runner.
cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
{: .language-bash }
{% endcapture %}
{% capture arvados_tab_content %}
We will create a new git repository and import a library of existing tool definitions that will help us build our workflow.
When using the recommended VSCode environment to develop on Arvados, start by forking the arvados-vscode-cwl-template repository.
- Vscode: On the left sidebar, choose
Explorer
- Select
Clone Repository
and enter https://github.com/arvados/arvados-vscode-cwl-template, then clickOpen
- If asked
Would you like to open the cloned repository?
chooseOpen
Next, import the bio-cwl-tools repository:
- Vscode: In the top menu, select
Terminal
→New Terminal
- This will open a terminal window in the lower part of the screen
- Run this command:
git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
{: .language-bash }
You may already have access to this collection.
You can check by going to Workbench and pasting
9178fe1b80a08a422dbe02adfd439764+925
into the search box. If you arrived at a collection page instead of a "not found" error, then you do not need to perform this download step. {: .callout}
- Go to https://workbench2.jutro.arvadosapi.com and sign in, this will create an account
- Go to
Get an API token
under the user menu - Log into the shell node of your Arvados cluster
- On the shell node, copy the host name and token for the
jutro
cluster into the file~/.config/arvados/jutro.conf
as described on the page for arv-copy.
Now, on shell node of your Arvados cluster, use arv-copy
to copy the collection:
arv-copy --src jutro 9178fe1b80a08a422dbe02adfd439764+925
{: .language-bash }
Running STAR requires index files generated from the reference.
This is a rather large download (4 GB). Depending on your bandwidth, it may be faster to generate it yourself.
As above, you can check by going to Workbench and pasting
02a12ce9e2707610991bd29d38796b57+2912
into the search box to see if you already have access to this collection. {: .callout}
Use arv-copy
to copy the collection:
arv-copy --src jutro 02a12ce9e2707610991bd29d38796b57+2912
{: .language-bash }
Create chr1-star-index.yaml
:
InputFiles:
- class: File
location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1.fa
format: http://edamontology.org/format_1930
IndexName: 'hg19-chr1-STAR-index'
Gtf:
class: File
location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1-hg19_genes.gtf
Overhang: 99
{: .language-yaml }
Generate the index with arvados-cwl-runner.
arvados-cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
{: .language-bash }
{% endcapture %}
{% include links.md %}