Skip to content

Commit dd19271

Browse files
committed
updates to run DITTO pipeline and instructions in readme
1 parent 6c26b86 commit dd19271

File tree

5 files changed

+29
-20
lines changed

5 files changed

+29
-20
lines changed

.test_data/README

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1+
# Test Data Directory
2+
13
This directory has 3 files -
24

35
`oc_test_data.vcf.gz` - test multi-sample VCF data from OpenCRAVAT
46

57
`testing_variants_hg38.vcf.gz` - We custom made a test VCF file with few variants from every chromosome (1-22,X,Y)
68

7-
`file_list.txt` - contains list of above 2 test vcf files with relative path. This file is used to test nextflow pipeline
9+
`file_list.txt` - contains list of above 2 test vcf files with relative path. Please add the full paths to this file and
10+
test nextflow pipeline

README.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,18 @@ To fetch DITTO source code, change in to directory of your choice and run:
6464
git clone https://github.com/uab-cgds-worthey/DITTO.git
6565
```
6666

67-
#### Setup OpenCravat (only one-time installation)
67+
#### Run DITTO pipeline on UAB Cheaha
68+
69+
To run on UAB cheaha, please update the `model.job` and `.test_data/file_list.txt` files with complete file paths for all
70+
necessary files and tools and submit a slurm job using the command below
71+
72+
```sh
73+
sbatch model.job
74+
```
75+
76+
#### Run DITTO pipeline outside of UAB Cheaha
77+
78+
***Setup OpenCravat (only one-time installation)***
6879

6980
Please follow the steps mentioned in [install_openCravat.md](docs/install_openCravat.md).
7081

@@ -75,7 +86,7 @@ Please follow the steps mentioned in [install_openCravat.md](docs/install_openCr
7586
<!-- markdown-link-check-enable -->
7687
> These will be ignored when running the pipeline.
7788
78-
#### Run DITTO pipeline
89+
***Setup Nextflow***
7990

8091
Create an environment via conda. Below is an example to install `nextflow`.
8192

@@ -91,7 +102,8 @@ conda activate ditto-env
91102
conda install bioconda::nextflow
92103
```
93104

94-
Please make a samplesheet with VCF files (incl. path). Please make sure to edit the directory paths as needed and run
105+
Please make a samplesheet `.test_data/file_list.txt` with VCF files (incl. path).
106+
Please make sure to edit the directory paths as needed and run
95107
the pipeline as shown below.
96108

97109
```sh
@@ -103,24 +115,18 @@ nextflow run pipeline.nf \
103115
--sample_sheet .test_data/file_list
104116
```
105117

106-
To run on UAB cheaha, please update the `model.job` file and submit a slurm job using the command below
107-
108-
```sh
109-
sbatch model.job
110-
```
111-
112118
## Reproducing the DITTO model
113119

114120
Detailed instructions on reproducing the model is explained in [build_DITTO.md](docs/build_DITTO.md)
115121

116122
## Download DITTO DB (Precomputed scores)
117123

118-
Precomputed scores for all possible SNVs and known Indels from gnomADv3.0 in main chromosomes in hg38 reference genome
124+
Precomputed scores for all possible SNVs and known Indels from gnomADv3.0 in main chromosomes in hg38 reference genome
119125
are available to download here - <https://s3.lts.rc.uab.edu/cgds-public/dittodb/dittodb.html>
120126

121127
## How to cite?
122128
<!-- markdown-link-check-disable -->
123-
Mamidi, T.K.K.; Wilk, B.M.; Gajapathy, M.; Worthey, E.A. DITTO: An Explainable Machine-Learning Model for
129+
Mamidi, T.K.K.; Wilk, B.M.; Gajapathy, M.; Worthey, E.A. DITTO: An Explainable Machine-Learning Model for
124130
Transcript-Specific Variant Pathogenicity Prediction. Preprints 2024, 2024040837. <https://doi.org/10.20944/preprints202404.0837.v1>
125131
<!-- markdown-link-check-enable -->
126132
## Contact information

cheaha.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
conda {
22
enabled = true
3-
cacheDir = '/nextflow/nextflow-conda-env-cache/'
3+
cacheDir = '/data/project/worthey_lab/tools/nextflow/nextflow-conda-env-cache/'
44
}
55

66
// Define the Scratch directory

model.job

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
#
66
# Number of tasks needed for this job. Generally, used with MPI jobs
77
#SBATCH --ntasks=1
8-
#SBATCH --partition=amd-hdr100-res
8+
#SBATCH --partition=amd-hdr100
99
#SBATCH --time=06:00:00
1010
#
1111
# Number of CPUs allocated to each task.
@@ -23,10 +23,10 @@ module load Java/13.0.2
2323
module load Anaconda3
2424
#conda activate nextflow
2525

26-
#Modify paths and run the pipeline here
27-
/data/project/worthey_lab/tools/nextflow/nextflow-22.10.7/nextflow run ../pipeline.nf \
28-
--outdir /data/results \
29-
-work-dir .work_dir/ \
26+
#Modify paths to include full paths and run the pipeline here
27+
/data/project/worthey_lab/tools/nextflow/nextflow-22.10.7/nextflow run pipeline.nf \
28+
--outdir /data \
29+
-work-dir $USER_SCRATCH \
3030
--build hg38 -c cheaha.config -with-report \
3131
--sample_sheet .test_data/file_list.txt -resume
3232

pipeline.nf

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ log.info """\
3030
process runOC {
3131

3232
// Define the conda environment file to be used
33-
conda '../configs/envs/open-cravat.yaml'
33+
conda './configs/envs/open-cravat.yaml'
3434

3535
input:
3636
path var_ch
@@ -72,7 +72,7 @@ process parseAnnotation {
7272
process prediction {
7373

7474
// Define the conda environment file to be used
75-
conda '../configs/envs/ditto-nf.yaml'
75+
conda './configs/envs/ditto-nf.yaml'
7676

7777
input:
7878
path var_parse_ch

0 commit comments

Comments
 (0)