-
Notifications
You must be signed in to change notification settings - Fork 0
DITTO Clean Up and Local Prediction instructions #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 15 commits
706751c
23f59d4
82168db
92bc76e
0deeada
9f41cc1
774e32d
dbf095c
2bf24f3
a432c54
471d0ba
5d89a29
db5d1ef
1c5bc9d
9b7a691
54ec4ec
f9d8c03
305940e
6cd3306
4104446
3122d20
b585a5b
8692431
b9c4bd1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| .test_data/oc_test_data.vcf.gz | ||
| .test_data/testing_variants_hg38.vcf.gz | ||
| .test_data/testing_variants_hg38.vcf.gz |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -14,6 +14,50 @@ Markdown](https://github.com/uab-cgds-worthey/DITTO/actions/workflows/linting.ym | |||||||||||||
| DITTO is an explainable neural network that can be helpful for accurate and rapid interpretation of small | ||||||||||||||
| genetic variants for pathogenicity using patient’s genotype (VCF) information. | ||||||||||||||
|
|
||||||||||||||
| ## Getting Started | ||||||||||||||
|
|
||||||||||||||
| - [Prerequisites](#prerequisites) | ||||||||||||||
| - [Using DITTO](#using-ditto) | ||||||||||||||
| - [Webapp](#webapp) | ||||||||||||||
| - [API](#api) | ||||||||||||||
| - [Prediction](#prediction) | ||||||||||||||
| - [Local Prediction](#local-prediction) | ||||||||||||||
| - [HPC Prediction with Cheaha](#hpc-prediction-with-cheaha) | ||||||||||||||
| - [Reproducing the DITTO model](#reproducing-the-ditto-model) | ||||||||||||||
| - [Download DITTO DB (Precomputed scores)](#download-ditto-db-precomputed-scores) | ||||||||||||||
| - [How to cite?](#how-to-cite) | ||||||||||||||
| - [Contact](#contact-information) | ||||||||||||||
|
|
||||||||||||||
| ## Prerequisites | ||||||||||||||
|
|
||||||||||||||
| The following prerequisites are required to be installed in the target envrionment for deploying and running DITTO | ||||||||||||||
| prediction model. | ||||||||||||||
|
|
||||||||||||||
| ### Tools | ||||||||||||||
|
|
||||||||||||||
| - [Python 3.10](https://www.python.org/) - [Install](https://www.python.org/downloads/) | ||||||||||||||
| - The specified OpenCravat version requires Python 3.10 | ||||||||||||||
| - [Anaconda3 25.7+](https://anaconda.com/) - [install](https://www.anaconda.com/docs/getting-started/anaconda/install) | ||||||||||||||
| - [OpenCravat 2.4.1](https://www.opencravat.org/) - [install](https://github.com/KarchinLab/open-cravat/releases/tag/2.4.1) | ||||||||||||||
| - [Git](https://git-scm.com/) | ||||||||||||||
| - Setup with your favorite git client. Here is a [GitHub Guide](https://github.com/git-guides/install-git) | ||||||||||||||
| for different platforms. | ||||||||||||||
| - [Nextflow 22.10.7+](https://www.nextflow.io/) - [install](https://www.nextflow.io/docs/latest/install.html) | ||||||||||||||
|
|
||||||||||||||
| > ***NOTE:*** Current version of OpenCravat that we're using doesn't support "Spanning or overlapping deletions" | ||||||||||||||
| > variants i.e. variants with `*` in `ALT Allele` column. More on these variants | ||||||||||||||
| <!-- markdown-link-check-disable --> | ||||||||||||||
| > [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele). | ||||||||||||||
| <!-- markdown-link-check-enable --> | ||||||||||||||
| > These will be ignored when running the pipeline. | ||||||||||||||
|
|
||||||||||||||
| ### System Requirements | ||||||||||||||
|
|
||||||||||||||
| - CPU: >2 | ||||||||||||||
| - RAM: ~25GB for a WGS VCF sample | ||||||||||||||
| - Storage: 1TB | ||||||||||||||
| - The storage requirements are for hosting the OpenCravat annotators ~600GB of data required to store all annotators | ||||||||||||||
|
|
||||||||||||||
| ## Using DITTO | ||||||||||||||
|
|
||||||||||||||
| DITTO scores for variants can be obtained by the below 3 ways. Webapp and API are for single variant analysis and the | ||||||||||||||
|
|
@@ -30,89 +74,90 @@ DITTO is available for public use at this [website](https://cgds-ditto.streamlit | |||||||||||||
| DITTO is not hosted as a public API but one can serve up locally to query DITTO scores. Please follow the instructions | ||||||||||||||
| in this [GitHub repo](https://github.com/uab-cgds-worthey/DITTO-API). | ||||||||||||||
|
|
||||||||||||||
| ### Setting up to use locally | ||||||||||||||
| ### Prediction | ||||||||||||||
|
|
||||||||||||||
| #### Installation | ||||||||||||||
|
|
||||||||||||||
| To fetch DITTO source code, change in to directory of your choice and run: | ||||||||||||||
|
|
||||||||||||||
| ```sh | ||||||||||||||
| git clone https://github.com/uab-cgds-worthey/DITTO.git | ||||||||||||||
| cd DITTO | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| ### Local Prediction | ||||||||||||||
|
|
||||||||||||||
| > ***NOTE:*** This setup will allow one to annotate a VCF sample and make DITTO predictions. Currently tested only in | ||||||||||||||
| > Cheaha (UAB HPC) because of resource limitations to download datasets from OpenCRAVAT. | ||||||||||||||
| > Docker versions may need to be explored later to make it useable in Mac and Windows. | ||||||||||||||
|
|
||||||||||||||
| #### System Requirements | ||||||||||||||
| #### Setup Steps | ||||||||||||||
|
|
||||||||||||||
| *Tools:* | ||||||||||||||
| - ***Setup OpenCravat (only one-time installation)*** | ||||||||||||||
|
|
||||||||||||||
| - Anaconda3 | ||||||||||||||
| - OpenCravat-2.4.1 | ||||||||||||||
| - Git | ||||||||||||||
| Please follow the steps mentioned in [install_openCravat.md](docs/install_openCravat.md). | ||||||||||||||
|
|
||||||||||||||
| *Resources:* | ||||||||||||||
| - ***Setup Nextflow*** | ||||||||||||||
|
|
||||||||||||||
| - CPU: > 2 | ||||||||||||||
| - Storage: ~1TB | ||||||||||||||
| - RAM: ~25GB for a WGS VCF sample | ||||||||||||||
| Create an environment via conda. Below is an example to install `nextflow`. | ||||||||||||||
|
|
||||||||||||||
| ```sh | ||||||||||||||
| # create environment. Needed only the first time. Please use the above link if you're not using Mac. | ||||||||||||||
| conda create --name ditto-env | ||||||||||||||
|
|
||||||||||||||
| #### Installation | ||||||||||||||
|
|
||||||||||||||
| Requirements: | ||||||||||||||
| conda activate ditto-env | ||||||||||||||
|
|
||||||||||||||
| - DITTO repo from GitHub | ||||||||||||||
| - OpenCravat with databases to annotate | ||||||||||||||
| - Nextflow >=22.10.7 | ||||||||||||||
| # Install nextflow | ||||||||||||||
| conda install bioconda::nextflow=22.10 conda-forge::conda=23.1 | ||||||||||||||
sdhutchins marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| To fetch DITTO source code, change in to directory of your choice and run: | ||||||||||||||
|
|
||||||||||||||
| ```sh | ||||||||||||||
| git clone https://github.com/uab-cgds-worthey/DITTO.git | ||||||||||||||
| ``` | ||||||||||||||
| - ***Sample Sheet*** | ||||||||||||||
|
|
||||||||||||||
| #### Run DITTO pipeline on UAB Cheaha | ||||||||||||||
| Please make a samplesheet `.test_data/file_list.txt` with VCF files (incl. path). | ||||||||||||||
|
|
||||||||||||||
| To run on UAB cheaha, please update the `model.job` (outdir and samplesheet) and `.test_data/file_list.txt` (inout vcfs) | ||||||||||||||
| files with complete file paths and submit a slurm job using the command below | ||||||||||||||
| Example `file_list.txt`: | ||||||||||||||
|
|
||||||||||||||
| ```sh | ||||||||||||||
| sbatch model.job | ||||||||||||||
| ``` | ||||||||||||||
| ```bash | ||||||||||||||
| # Example is using MacOS home folder | ||||||||||||||
|
|
||||||||||||||
| #### Run DITTO pipeline outside of UAB Cheaha | ||||||||||||||
| /Users/<username>/Workspace/DITTO/.test_data/oc_test_data.vcf.gz | ||||||||||||||
| /Users/<username>/Workspace/DITTO/.test_data/testing_variants_hg38.vcf.gz | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| ***Setup OpenCravat (only one-time installation)*** | ||||||||||||||
| This will run DITTO prediction for both vcf files in the `file_list.txt`. | ||||||||||||||
|
|
||||||||||||||
| Please follow the steps mentioned in [install_openCravat.md](docs/install_openCravat.md). | ||||||||||||||
| - ***Run the NextFlow pipeline*** | ||||||||||||||
|
|
||||||||||||||
| > ***NOTE:*** Current version of OpenCravat that we're using doesn't support "Spanning or overlapping deletions" | ||||||||||||||
| > variants i.e. variants with `*` in `ALT Allele` column. More on these variants | ||||||||||||||
| <!-- markdown-link-check-disable --> | ||||||||||||||
| > [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele). | ||||||||||||||
| <!-- markdown-link-check-enable --> | ||||||||||||||
| > These will be ignored when running the pipeline. | ||||||||||||||
| Please make sure to edit the directory paths as needed and run the pipeline as shown below. | ||||||||||||||
|
|
||||||||||||||
| ***Setup Nextflow*** | ||||||||||||||
| ```sh | ||||||||||||||
| # Note: NextFlow work directory is defined as `-work-dir` in the run command parameters | ||||||||||||||
| # Note: `--output` cannot be relative, set a path nextflow can access. ex. `/tmp/DITTO/output` | ||||||||||||||
|
|
||||||||||||||
| Create an environment via conda. Below is an example to install `nextflow`. | ||||||||||||||
| nextflow run pipeline.nf \ | ||||||||||||||
| -work-dir ./work_dir \ | ||||||||||||||
| --build hg38 -c ./configs/nextflow/local.config -with-report \ | ||||||||||||||
| --sample_sheet .test_data/file_list.txt \ | ||||||||||||||
| --oc_modules /<path-to>/opencravat/modules \ | ||||||||||||||
| --outdir $PWD/data/output | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| - [Anaconda virtual environment](https://docs.anaconda.com/free/anaconda/install/index.html) | ||||||||||||||
| ### HPC Prediction with Cheaha | ||||||||||||||
|
|
||||||||||||||
| ```sh | ||||||||||||||
| # create environment. Needed only the first time. Please use the above link if you're not using Mac. | ||||||||||||||
| conda create --name ditto-env | ||||||||||||||
| To run on UAB cheaha, see the [installation](#installation) step to clone the DITTO repository into a Cheaha directory. | ||||||||||||||
|
|
||||||||||||||
| conda activate ditto-env | ||||||||||||||
| - Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the | ||||||||||||||
| command below | ||||||||||||||
|
||||||||||||||
| - Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the | |
| command below | |
| - Create a text file listing the path to VCF file(s) (1 path per line) with variants to score | |
| - Paths can be full absolute paths **or** relative paths (relative to the directory where the pipeline will be run from, **not** the directory where the `pipeline.nf` file is) | |
| - See the example input file [.test_data/file_list.txt](.test_data/file_list.txt) (lists 2 testing example input VCFs) | |
| for reference or as an input file for testing (default behavior of `model.job`) |
updated this text to be a bit more explicit and clear on what to do for input (see suggestion on supporting relative pathing for input files in pipeline.nf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the README.md, provided real examples of the relative/absolute pathing in addition to this clarification.
JmScherer marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
JmScherer marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
sdhutchins marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,9 @@ | ||
| dependencies: | ||
| - pip | ||
| - python=3.10 | ||
| - pip: | ||
| - pytabix==0.1 | ||
| - open-cravat==2.4.1 | ||
|
|
||
| #- joblib==1.3.2 | ||
| #- git+https://github.com/tkmamidi/open-cravat.git |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| conda { | ||
| enabled = true | ||
| } | ||
|
|
||
| env { | ||
| TMPDIR="/tmp/DITTO/" | ||
| } |
Uh oh!
There was an error while loading. Please reload this page.