@@ -14,6 +14,37 @@ Markdown](https://github.com/uab-cgds-worthey/DITTO/actions/workflows/linting.ym
1414DITTO is an explainable neural network that can be helpful for accurate and rapid interpretation of small
1515genetic variants for pathogenicity using patient’s genotype (VCF) information.
1616
17+ ## Getting Started
18+
19+ - [ Prerequisites] ( #prerequisites )
20+ - [ Using DITTO] ( #using-ditto )
21+ - [ Webapp] ( #webapp )
22+ - [ API] ( #api )
23+ - [ Prediction] ( #prediction )
24+
25+ ## Prerequisites
26+
27+ The following prerequisites are required to be installed in the target envrionment for deploying and running DITTO
28+ prediction model.
29+
30+ ### Tools
31+
32+ - [ Python 3.10] ( https://www.python.org/ ) - [ Install] ( https://www.python.org/downloads/ )
33+ - The specified OpenCravat version requires Python 3.10
34+ - [ Anaconda3 25.7+] ( https://anaconda.com/ ) - [ install] ( https://www.anaconda.com/docs/getting-started/anaconda/install )
35+ - [ OpenCravat 2.4.1] ( https://www.opencravat.org/ ) - [ install] ( https://github.com/KarchinLab/open-cravat/releases/tag/2.4.1 )
36+ - [ Git] ( https://git-scm.com/ )
37+ - Setup with your favorite git client. Here is a [ GitHub Guide] ( https://github.com/git-guides/install-git )
38+ for different platforms.
39+ - [ Nextflow 22.10.7+] ( https://www.nextflow.io/ ) - [ install] ( https://www.nextflow.io/docs/latest/install.html )
40+
41+ ### System Requirements
42+
43+ - CPU: >2
44+ - RAM: ~ 25GB for a WGS VCF sample
45+ - Storage: 1TB
46+ - The storage requirements are for hosting the OpenCravat annotators ~ 600GB of data required to store all annotators
47+
1748## Using DITTO
1849
1950DITTO scores for variants can be obtained by the below 3 ways. Webapp and API are for single variant analysis and the
@@ -30,52 +61,26 @@ DITTO is available for public use at this [website](https://cgds-ditto.streamlit
3061DITTO is not hosted as a public API but one can serve up locally to query DITTO scores. Please follow the instructions
3162in this [ GitHub repo] ( https://github.com/uab-cgds-worthey/DITTO-API ) .
3263
33- ### Setting up to use locally
34-
35- > *** NOTE:*** This setup will allow one to annotate a VCF sample and make DITTO predictions. Currently tested only in
36- > Cheaha (UAB HPC) because of resource limitations to download datasets from OpenCRAVAT.
37- > Docker versions may need to be explored later to make it useable in Mac and Windows.
38-
39- #### System Requirements
40-
41- * Tools:*
42-
43- - Anaconda3
44- - OpenCravat-2.4.1
45- - Git
46-
47- * Resources:*
48-
49- - CPU: > 2
50- - Storage: ~ 1TB
51- - RAM: ~ 25GB for a WGS VCF sample
64+ ### Prediction
5265
5366#### Installation
5467
55- Requirements:
56-
57- - DITTO repo from GitHub
58- - OpenCravat with databases to annotate
59- - Nextflow >=22.10.7
60-
6168To fetch DITTO source code, change in to directory of your choice and run:
6269
6370``` sh
6471git clone https://github.com/uab-cgds-worthey/DITTO.git
72+ cd DITTO
6573```
6674
67- #### Run DITTO pipeline on UAB Cheaha
75+ ### Local Prediction
6876
69- To run on UAB cheaha, please update the ` model.job ` (outdir and samplesheet) and ` .test_data/file_list.txt ` (inout vcfs)
70- files with complete file paths and submit a slurm job using the command below
71-
72- ``` sh
73- sbatch model.job
74- ```
77+ > *** NOTE:*** This setup will allow one to annotate a VCF sample and make DITTO predictions. Currently tested only in
78+ > Cheaha (UAB HPC) because of resource limitations to download datasets from OpenCRAVAT.
79+ > Docker versions may need to be explored later to make it useable in Mac and Windows.
7580
76- #### Run DITTO pipeline outside of UAB Cheaha
81+ #### Setup Steps
7782
78- *** Setup OpenCravat (only one-time installation)***
83+ 1 . *** Setup OpenCravat (only one-time installation)***
7984
8085Please follow the steps mentioned in [ install_openCravat.md] ( docs/install_openCravat.md ) .
8186
@@ -86,7 +91,7 @@ Please follow the steps mentioned in [install_openCravat.md](docs/install_openCr
8691 <!-- markdown-link-check-enable -->
8792> These will be ignored when running the pipeline.
8893
89- *** Setup Nextflow***
94+ 2 . *** Setup Nextflow***
9095
9196Create an environment via conda. Below is an example to install ` nextflow ` .
9297
@@ -102,9 +107,23 @@ conda activate ditto-env
102107conda install bioconda::nextflow=22.10 conda-forge::conda=23.1
103108```
104109
110+ 3 . *** Sample Sheet***
111+
105112Please make a samplesheet ` .test_data/file_list.txt ` with VCF files (incl. path).
106- Please make sure to edit the directory paths as needed and run
107- the pipeline as shown below.
113+
114+ Example ` file_list.txt ` :
115+ ``` bash
116+ # Example is using MacOS home folder
117+
118+ /Users/< username> /Workspace/DITTO/.test_data/oc_test_data.vcf.gz
119+ /Users/< username> /Workspace/DITTO/.test_data/testing_variants_hg38.vcf.gz
120+ ```
121+
122+ This will run DITTO prediction for both vcf files in the ` file_list.txt ` .
123+
124+ 4 . *** Run the NextFlow pipeline***
125+
126+ Please make sure to edit the directory paths as needed and run the pipeline as shown below.
108127
109128``` sh
110129# Note: NextFlow work directory is defined as `-work-dir` in the run command parameters
@@ -113,10 +132,27 @@ nextflow run pipeline.nf \
113132 -work-dir ./work_dir \
114133 --outdir ./data/ \
115134 --build hg38 -with-report \
116- --oc_modules /data /opencravat/modules \
135+ --oc_modules /< path-to > /opencravat/modules \
117136 --sample_sheet .test_data/file_list.txt
118137```
119138
139+ ### HPC Prediction with Cheaha
140+
141+ To run on UAB cheaha, see the [ installation] ( #installation ) step to clone the DITTO repository into a Cheaha directory.
142+
143+ 1 . Update the ` .test_data/file_list.txt ` (inout vcfs) files with complete file paths and submit a slurm job using the command below
144+
145+ ``` bash
146+ /home/< username> /Workspace/DITTO/.test_data/oc_test_data.vcf.gz
147+ /home/< username> /Workspace/DITTO/.test_data/testing_variants_hg38.vcf.gz
148+ ```
149+
150+ 2 . Update ` model.job ` (outdir and samplesheet)
151+
152+ ``` sh
153+ sbatch model.job
154+ ```
155+
120156## Reproducing the DITTO model
121157
122158Detailed instructions on reproducing the model is explained in [ build_DITTO.md] ( docs/build_DITTO.md )
@@ -138,7 +174,7 @@ For queries, please open a GitHub issue.
138174
139175For urgent queries, send an email with clear description to
140176
141- | Name | Email |
142- | ------| --------|
143- | Tarun Mamidi
| < [email protected] > | 144- | Liz Worthey
| < [email protected] > | 177+ | Name | Email |
178+ | -------------- | ------------ --------|
179+ | Tarun Mamidi
| < [email protected] > | 180+ | Liz Worthey
| < [email protected] > |
0 commit comments