-
Notifications
You must be signed in to change notification settings - Fork 0
DITTO Clean Up and Local Prediction instructions #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JmScherer
wants to merge
24
commits into
main
Choose a base branch
from
local-prediction
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
706751c
Initial cleaning up the configs folder, organizing by service, and up…
JmScherer 23f59d4
Updating the configs folder for the nextflow configurations
JmScherer 82168db
moved shap_plots to docs and removed hardcoded path from ./.test_data…
JmScherer 92bc76e
Update the model.job with some changes and the pipeline.nf
JmScherer 0deeada
Moved oc ditto package into config/opencravat
JmScherer 9f41cc1
Updated readme
JmScherer 774e32d
Updating the README.md
JmScherer dbf095c
More README.md updates
JmScherer 2bf24f3
Markdown linting on README.md
JmScherer a432c54
Hopefully finished README markdown linting
JmScherer 471d0ba
Updated README.md to include local.config for local prediction instru…
JmScherer 5d89a29
Worked out an output directory path and updated the documentation
JmScherer db5d1ef
fixing ./text_data/file_list.txt
JmScherer 1c5bc9d
Updating model.job output folder
JmScherer 9b7a691
updating the folder for output on nextflow command
JmScherer 54ec4ec
Update pipeline.nf
JmScherer f9d8c03
Update README.md
JmScherer 305940e
Update README.md
JmScherer 6cd3306
Update README.md
JmScherer 4104446
Markdown linting for README
JmScherer 3122d20
Added ditto-env.yaml to conda envs, updated README to reflect, and ad…
JmScherer b585a5b
Updated the README to discuss relative and absolute pathing for the f…
JmScherer 8692431
markdown linting
JmScherer b9c4bd1
Updated the ditto-nf.yaml conda env to include h5py binary as it was …
JmScherer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| .test_data/oc_test_data.vcf.gz | ||
| .test_data/testing_variants_hg38.vcf.gz | ||
| .test_data/testing_variants_hg38.vcf.gz |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,50 @@ Markdown](https://github.com/uab-cgds-worthey/DITTO/actions/workflows/linting.ym | |
| DITTO is an explainable neural network that can be helpful for accurate and rapid interpretation of small | ||
| genetic variants for pathogenicity using patient’s genotype (VCF) information. | ||
|
|
||
| ## Getting Started | ||
|
|
||
| - [Prerequisites](#prerequisites) | ||
| - [Using DITTO](#using-ditto) | ||
| - [Webapp](#webapp) | ||
| - [API](#api) | ||
| - [Prediction](#prediction) | ||
| - [Local Prediction](#local-prediction) | ||
| - [HPC Prediction with Cheaha](#hpc-prediction-with-cheaha) | ||
| - [Reproducing the DITTO model](#reproducing-the-ditto-model) | ||
| - [Download DITTO DB (Precomputed scores)](#download-ditto-db-precomputed-scores) | ||
| - [How to cite?](#how-to-cite) | ||
| - [Contact](#contact-information) | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| The following prerequisites are required to be installed in the target envrionment for deploying and running DITTO | ||
| prediction model. | ||
|
|
||
| ### Tools | ||
|
|
||
| - [Python 3.10](https://www.python.org/) - [Install](https://www.python.org/downloads/) | ||
| - The specified OpenCravat version requires Python 3.10 | ||
| - [Anaconda3 25.7+](https://anaconda.com/) - [install](https://www.anaconda.com/docs/getting-started/anaconda/install) | ||
| - [OpenCravat 2.4.1](https://www.opencravat.org/) - [install](https://github.com/KarchinLab/open-cravat/releases/tag/2.4.1) | ||
| - [Git](https://git-scm.com/) | ||
| - Setup with your favorite git client. Here is a [GitHub Guide](https://github.com/git-guides/install-git) | ||
| for different platforms. | ||
| - [Nextflow 22.10.7+](https://www.nextflow.io/) - [install](https://www.nextflow.io/docs/latest/install.html) | ||
|
|
||
| > ***NOTE:*** Current version of OpenCravat that we're using doesn't support "Spanning or overlapping deletions" | ||
| > variants i.e. variants with `*` in `ALT Allele` column. More on these variants | ||
| <!-- markdown-link-check-disable --> | ||
| > [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele). | ||
| <!-- markdown-link-check-enable --> | ||
| > These will be ignored when running the pipeline. | ||
|
|
||
| ### System Requirements | ||
|
|
||
| - CPU: >2 | ||
| - RAM: ~25GB for a WGS VCF sample | ||
| - Storage: 1TB | ||
| - The storage requirements are for hosting the OpenCravat annotators ~600GB of data required to store all annotators | ||
|
|
||
| ## Using DITTO | ||
|
|
||
| DITTO scores for variants can be obtained by the below 3 ways. Webapp and API are for single variant analysis and the | ||
|
|
@@ -30,89 +74,129 @@ DITTO is available for public use at this [website](https://cgds-ditto.streamlit | |
| DITTO is not hosted as a public API but one can serve up locally to query DITTO scores. Please follow the instructions | ||
| in this [GitHub repo](https://github.com/uab-cgds-worthey/DITTO-API). | ||
|
|
||
| ### Setting up to use locally | ||
| ### Prediction | ||
|
|
||
| #### Installation | ||
|
|
||
| To fetch DITTO source code, change in to directory of your choice and run: | ||
|
|
||
| ```sh | ||
| git clone https://github.com/uab-cgds-worthey/DITTO.git | ||
| cd DITTO | ||
| ``` | ||
|
|
||
| ### Local Prediction | ||
|
|
||
| > ***NOTE:*** This setup will allow one to annotate a VCF sample and make DITTO predictions. Currently tested only in | ||
| > Cheaha (UAB HPC) because of resource limitations to download datasets from OpenCRAVAT. | ||
| > Docker versions may need to be explored later to make it useable in Mac and Windows. | ||
|
|
||
| #### System Requirements | ||
| #### NextFlow Conda Vs. Mamba Setup | ||
|
|
||
| *Tools:* | ||
| ***NOTE:*** If the user has conda running with Mamba instead of Conda, NextFlow can be configured to use Mamba instead | ||
| by modifying the `configs/nextflow/local.config` file and updating the **useMamba** parameter to reflect the user's | ||
| environment: | ||
|
|
||
| - Anaconda3 | ||
| - OpenCravat-2.4.1 | ||
| - Git | ||
| ```yaml | ||
| # This parameter is defaulted to false, change to true if using Mamba | ||
|
|
||
| *Resources:* | ||
| useMamba = true | ||
| ``` | ||
|
|
||
| - CPU: > 2 | ||
| - Storage: ~1TB | ||
| - RAM: ~25GB for a WGS VCF sample | ||
| #### Setup Steps | ||
|
|
||
| #### Installation | ||
| - ***Setup OpenCravat (only one-time installation)*** | ||
|
|
||
| Requirements: | ||
| Please follow the steps mentioned in [install_openCravat.md](docs/install_openCravat.md). | ||
|
|
||
| - DITTO repo from GitHub | ||
| - OpenCravat with databases to annotate | ||
| - Nextflow >=22.10.7 | ||
| - ***Setup Nextflow*** | ||
|
|
||
| To fetch DITTO source code, change in to directory of your choice and run: | ||
| Create an environment via conda. Below is an example to install `nextflow`. | ||
|
|
||
| ```sh | ||
| git clone https://github.com/uab-cgds-worthey/DITTO.git | ||
| ``` | ||
| ```sh | ||
| # create environment. Needed only the first time. Please use the above link if you're not using Mac. | ||
| conda env create -f ./configs/conda/ditto-env.yaml | ||
|
|
||
| #### Run DITTO pipeline on UAB Cheaha | ||
| conda activate ditto-env | ||
| ``` | ||
|
|
||
| To run on UAB cheaha, please update the `model.job` (outdir and samplesheet) and `.test_data/file_list.txt` (inout vcfs) | ||
| files with complete file paths and submit a slurm job using the command below | ||
| - ***Sample Sheet*** | ||
|
|
||
| ```sh | ||
| sbatch model.job | ||
| ``` | ||
| Please make a samplesheet `.test_data/file_list.txt` with VCF files (incl. path). One can supply either relative paths | ||
| or absolute paths to files for the vcf.gz files. Relative paths need to be relative to the work directory that DITTO | ||
| was executed from. | ||
|
|
||
| #### Run DITTO pipeline outside of UAB Cheaha | ||
| Example `file_list.txt` with relative paths: | ||
|
|
||
| ***Setup OpenCravat (only one-time installation)*** | ||
| ```bash | ||
| .test_data/oc_test_data.vcf.gz | ||
| .test_data/testing_variants_hg38.vcf.gz | ||
|
|
||
| Please follow the steps mentioned in [install_openCravat.md](docs/install_openCravat.md). | ||
| # Example, will become: /Users/<username>/Workspace/DITTO/.test_data/oc_test_data.vcf.gz | ||
| ``` | ||
|
|
||
| > ***NOTE:*** Current version of OpenCravat that we're using doesn't support "Spanning or overlapping deletions" | ||
| > variants i.e. variants with `*` in `ALT Allele` column. More on these variants | ||
| <!-- markdown-link-check-disable --> | ||
| > [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele). | ||
| <!-- markdown-link-check-enable --> | ||
| > These will be ignored when running the pipeline. | ||
| Or absolute paths | ||
|
|
||
| ***Setup Nextflow*** | ||
| ```bash | ||
| /Users/<username>/Desktop/test_data/oc_test_data.vcf.gz | ||
| /Users/<username>/Desktop/test_data/testing_variants_hg38.vcf.gz | ||
|
|
||
| Create an environment via conda. Below is an example to install `nextflow`. | ||
| # Example is using MacOS Desktop folder with test_data directory | ||
| ``` | ||
|
|
||
| - [Anaconda virtual environment](https://docs.anaconda.com/free/anaconda/install/index.html) | ||
| This will run DITTO prediction for both vcf files in the `file_list.txt`. | ||
|
|
||
| ```sh | ||
| # create environment. Needed only the first time. Please use the above link if you're not using Mac. | ||
| conda create --name ditto-env | ||
| - ***Run the NextFlow pipeline*** | ||
|
|
||
| conda activate ditto-env | ||
| Please make sure to edit the directory paths as needed and run the pipeline as shown below. | ||
|
|
||
| # Install nextflow | ||
| conda install bioconda::nextflow | ||
| ``` | ||
| ```sh | ||
| # Note: NextFlow work directory is defined as `-work-dir` in the run command parameters | ||
| # Note: `--output` cannot be relative, set a path nextflow can access. ex. `/tmp/DITTO/output` | ||
|
|
||
| Please make a samplesheet `.test_data/file_list.txt` with VCF files (incl. path). | ||
| Please make sure to edit the directory paths as needed and run | ||
| the pipeline as shown below. | ||
| nextflow run pipeline.nf \ | ||
| -work-dir ./work_dir \ | ||
| --build hg38 -c ./configs/nextflow/local.config -with-report \ | ||
| --sample_sheet .test_data/file_list.txt \ | ||
| --oc_modules /<path-to>/opencravat/modules \ | ||
| --outdir $PWD/data/output | ||
| ``` | ||
|
|
||
| ### HPC Prediction with Cheaha | ||
|
|
||
| To run on UAB cheaha, see the [installation](#installation) step to clone the DITTO repository into a Cheaha directory. | ||
|
|
||
| - Create a text file listing the path to VCF file(s) (1 path per line) with variants to score | ||
| - Paths can be full absolute paths **or** relative paths (relative to the directory where the pipeline will be run | ||
| from, **note** the directory where the `pipeline.nf` file is) | ||
| - See the example input file [.test_data/file_list.txt](.test_data/file_list.txt) (lists 2 testing example input VCFs) | ||
| for reference or as an input file for testing (default behavior of `model.job`) | ||
| - One can supply either relative paths or absolute paths to files for the vcf.gz files. Relative paths need to be | ||
| relative to the work directory that DITTO was executed from. | ||
|
|
||
| Example `file_list.txt` with relative paths: | ||
|
|
||
| ```bash | ||
| .test_data/oc_test_data.vcf.gz | ||
| .test_data/testing_variants_hg38.vcf.gz | ||
|
|
||
| # Example, will become: /home/<username>/Workspace/DITTO/.test_data/oc_test_data.vcf.gz | ||
| ``` | ||
|
|
||
| Or absolute paths | ||
|
|
||
| ```bash | ||
| /home/<username>/test_data/oc_test_data.vcf.gz | ||
| /home/<username>/test_data/testing_variants_hg38.vcf.gz | ||
|
|
||
| # Example is using Linux home directory with a test_data directory | ||
| ``` | ||
|
|
||
| - Update `model.job` (change the `--sample_sheet` option to your input file with VCF path(s) and | ||
| `--outdir` to the desired output location of DITTO predictions) | ||
|
|
||
| ```sh | ||
| nextflow run pipeline.nf \ | ||
| --outdir ./data/ \ | ||
| -work-dir ./wor_dir \ | ||
| --build hg38 -with-report \ | ||
| --oc_modules /data/opencravat/modules \ | ||
| --sample_sheet .test_data/file_list | ||
| sbatch model.job | ||
| ``` | ||
|
|
||
| ## Reproducing the DITTO model | ||
|
|
@@ -125,17 +209,19 @@ Precomputed scores for all possible SNVs and known Indels from gnomADv3.0 in mai | |
| are available to download here - <https://s3.lts.rc.uab.edu/cgds-public/dittodb/dittodb.html> | ||
|
|
||
| ## How to cite? | ||
|
|
||
| <!-- markdown-link-check-disable --> | ||
| Mamidi, T.K.K.; Wilk, B.M.; Gajapathy, M.; Worthey, E.A. DITTO: An Explainable Machine-Learning Model for | ||
| Transcript-Specific Variant Pathogenicity Prediction. Preprints 2024, 2024040837. <https://doi.org/10.20944/preprints202404.0837.v1> | ||
| <!-- markdown-link-check-enable --> | ||
|
|
||
| ## Contact information | ||
|
|
||
| For queries, please open a GitHub issue. | ||
|
|
||
| For urgent queries, send an email with clear description to | ||
|
|
||
| |Name | Email | | ||
| |------|--------| | ||
| |Tarun Mamidi | <[email protected]>| | ||
| |Liz Worthey | <[email protected]>| | ||
| | Name | Email | | ||
| |--------------|--------------------| | ||
| | Tarun Mamidi | <[email protected]> | | ||
| | Liz Worthey | <[email protected]> | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| name: ditto-env | ||
|
|
||
| channels: | ||
| - bioconda | ||
| - conda-forge | ||
|
|
||
| dependencies: | ||
| - nextflow=22.10 | ||
| - conda=23.1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,14 @@ | ||
| name: ditto-nf | ||
|
|
||
| channels: | ||
| - conda-forge | ||
|
|
||
| dependencies: | ||
| - python=3.10.11 | ||
| - pandas=2.0.1 | ||
| - numpy=1.23.5 | ||
| - pyaml=23.7.0 | ||
| - pip=23.2.1 | ||
| - pip: | ||
| - --only-binary h5py | ||
| - tensorflow==2.11 |
File renamed without changes.
4 changes: 4 additions & 0 deletions
4
configs/envs/open-cravat.yaml → configs/conda/open-cravat.yaml
sdhutchins marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,11 @@ | ||
| name: opencravat-env | ||
|
|
||
| dependencies: | ||
| - pip | ||
| - python=3.10 | ||
| - pip: | ||
| - pytabix==0.1 | ||
| - open-cravat==2.4.1 | ||
|
|
||
| #- joblib==1.3.2 | ||
| #- git+https://github.com/tkmamidi/open-cravat.git |
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| conda { | ||
| enabled = true | ||
| useMamba = false | ||
| } | ||
|
|
||
| env { | ||
| TMPDIR="/tmp/DITTO/" | ||
| } |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.