Skip to content

Commit 774e32d

Browse files
committed
Updating the README.md
1 parent 9f41cc1 commit 774e32d

File tree

1 file changed

+78
-42
lines changed

1 file changed

+78
-42
lines changed

README.md

Lines changed: 78 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,37 @@ Markdown](https://github.com/uab-cgds-worthey/DITTO/actions/workflows/linting.ym
1414
DITTO is an explainable neural network that can be helpful for accurate and rapid interpretation of small
1515
genetic variants for pathogenicity using patient’s genotype (VCF) information.
1616

17+
## Getting Started
18+
19+
- [Prerequisites](#prerequisites)
20+
- [Using DITTO](#using-ditto)
21+
- [Webapp](#webapp)
22+
- [API](#api)
23+
- [Prediction](#prediction)
24+
25+
## Prerequisites
26+
27+
The following prerequisites are required to be installed in the target envrionment for deploying and running DITTO
28+
prediction model.
29+
30+
### Tools
31+
32+
- [Python 3.10](https://www.python.org/) - [Install](https://www.python.org/downloads/)
33+
- The specified OpenCravat version requires Python 3.10
34+
- [Anaconda3 25.7+](https://anaconda.com/) - [install](https://www.anaconda.com/docs/getting-started/anaconda/install)
35+
- [OpenCravat 2.4.1](https://www.opencravat.org/) - [install](https://github.com/KarchinLab/open-cravat/releases/tag/2.4.1)
36+
- [Git](https://git-scm.com/)
37+
- Setup with your favorite git client. Here is a [GitHub Guide](https://github.com/git-guides/install-git)
38+
for different platforms.
39+
- [Nextflow 22.10.7+](https://www.nextflow.io/) - [install](https://www.nextflow.io/docs/latest/install.html)
40+
41+
### System Requirements
42+
43+
- CPU: >2
44+
- RAM: ~25GB for a WGS VCF sample
45+
- Storage: 1TB
46+
- The storage requirements are for hosting the OpenCravat annotators ~600GB of data required to store all annotators
47+
1748
## Using DITTO
1849

1950
DITTO scores for variants can be obtained by the below 3 ways. Webapp and API are for single variant analysis and the
@@ -30,52 +61,26 @@ DITTO is available for public use at this [website](https://cgds-ditto.streamlit
3061
DITTO is not hosted as a public API but one can serve up locally to query DITTO scores. Please follow the instructions
3162
in this [GitHub repo](https://github.com/uab-cgds-worthey/DITTO-API).
3263

33-
### Setting up to use locally
34-
35-
> ***NOTE:*** This setup will allow one to annotate a VCF sample and make DITTO predictions. Currently tested only in
36-
> Cheaha (UAB HPC) because of resource limitations to download datasets from OpenCRAVAT.
37-
> Docker versions may need to be explored later to make it useable in Mac and Windows.
38-
39-
#### System Requirements
40-
41-
*Tools:*
42-
43-
- Anaconda3
44-
- OpenCravat-2.4.1
45-
- Git
46-
47-
*Resources:*
48-
49-
- CPU: > 2
50-
- Storage: ~1TB
51-
- RAM: ~25GB for a WGS VCF sample
64+
### Prediction
5265

5366
#### Installation
5467

55-
Requirements:
56-
57-
- DITTO repo from GitHub
58-
- OpenCravat with databases to annotate
59-
- Nextflow >=22.10.7
60-
6168
To fetch DITTO source code, change in to directory of your choice and run:
6269

6370
```sh
6471
git clone https://github.com/uab-cgds-worthey/DITTO.git
72+
cd DITTO
6573
```
6674

67-
#### Run DITTO pipeline on UAB Cheaha
75+
### Local Prediction
6876

69-
To run on UAB cheaha, please update the `model.job` (outdir and samplesheet) and `.test_data/file_list.txt` (inout vcfs)
70-
files with complete file paths and submit a slurm job using the command below
71-
72-
```sh
73-
sbatch model.job
74-
```
77+
> ***NOTE:*** This setup will allow one to annotate a VCF sample and make DITTO predictions. Currently tested only in
78+
> Cheaha (UAB HPC) because of resource limitations to download datasets from OpenCRAVAT.
79+
> Docker versions may need to be explored later to make it useable in Mac and Windows.
7580
76-
#### Run DITTO pipeline outside of UAB Cheaha
81+
#### Setup Steps
7782

78-
***Setup OpenCravat (only one-time installation)***
83+
1. ***Setup OpenCravat (only one-time installation)***
7984

8085
Please follow the steps mentioned in [install_openCravat.md](docs/install_openCravat.md).
8186

@@ -86,7 +91,7 @@ Please follow the steps mentioned in [install_openCravat.md](docs/install_openCr
8691
<!-- markdown-link-check-enable -->
8792
> These will be ignored when running the pipeline.
8893
89-
***Setup Nextflow***
94+
2. ***Setup Nextflow***
9095

9196
Create an environment via conda. Below is an example to install `nextflow`.
9297

@@ -102,9 +107,23 @@ conda activate ditto-env
102107
conda install bioconda::nextflow=22.10 conda-forge::conda=23.1
103108
```
104109

110+
3. ***Sample Sheet***
111+
105112
Please make a samplesheet `.test_data/file_list.txt` with VCF files (incl. path).
106-
Please make sure to edit the directory paths as needed and run
107-
the pipeline as shown below.
113+
114+
Example `file_list.txt`:
115+
```bash
116+
# Example is using MacOS home folder
117+
118+
/Users/<username>/Workspace/DITTO/.test_data/oc_test_data.vcf.gz
119+
/Users/<username>/Workspace/DITTO/.test_data/testing_variants_hg38.vcf.gz
120+
```
121+
122+
This will run DITTO prediction for both vcf files in the `file_list.txt`.
123+
124+
4. ***Run the NextFlow pipeline***
125+
126+
Please make sure to edit the directory paths as needed and run the pipeline as shown below.
108127

109128
```sh
110129
# Note: NextFlow work directory is defined as `-work-dir` in the run command parameters
@@ -113,10 +132,27 @@ nextflow run pipeline.nf \
113132
-work-dir ./work_dir \
114133
--outdir ./data/ \
115134
--build hg38 -with-report \
116-
--oc_modules /data/opencravat/modules \
135+
--oc_modules /<path-to>/opencravat/modules \
117136
--sample_sheet .test_data/file_list.txt
118137
```
119138

139+
### HPC Prediction with Cheaha
140+
141+
To run on UAB cheaha, see the [installation](#installation) step to clone the DITTO repository into a Cheaha directory.
142+
143+
1. Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the command below
144+
145+
```bash
146+
/home/<username>/Workspace/DITTO/.test_data/oc_test_data.vcf.gz
147+
/home/<username>/Workspace/DITTO/.test_data/testing_variants_hg38.vcf.gz
148+
```
149+
150+
2. Update `model.job` (outdir and samplesheet)
151+
152+
```sh
153+
sbatch model.job
154+
```
155+
120156
## Reproducing the DITTO model
121157

122158
Detailed instructions on reproducing the model is explained in [build_DITTO.md](docs/build_DITTO.md)
@@ -138,7 +174,7 @@ For queries, please open a GitHub issue.
138174

139175
For urgent queries, send an email with clear description to
140176

141-
|Name | Email |
142-
|------|--------|
143-
|Tarun Mamidi | <[email protected]>|
144-
|Liz Worthey | <[email protected]>|
177+
| Name | Email |
178+
|--------------|--------------------|
179+
| Tarun Mamidi | <[email protected]> |
180+
| Liz Worthey | <[email protected]> |

0 commit comments

Comments
 (0)