You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-7
Original file line number
Diff line number
Diff line change
@@ -5,17 +5,17 @@ Intel lab's open sourced data science framework for accelerating digital biology
5
5
# Introduction
6
6
We are in the epoch of digital biology, that is fueled by the convergence of three revolutions: 1) Measurement of biological systems at high resolution resulting in massive multi-modal, multi-scale, unstructured, distributed data, 2) Novel data science (AI and data management) techniques on this data, and 3) Wide-spread cloud use enabling massive compute and public data repositories, large collaborative projects and consortia. It will require computing and data management at unprecedented scale and speed. However, performance alone would not suffice if it significantly compromised the productivity of biologists and data scientists who are at the forefront of this transformation.
7
7
8
-
With a goal to build a performant, cost effective and productive platform, we are building **Open Omics acceleration framework**: a one-click, containerized, customizable, open-sourced framework for accelerating digital biology research. The framework is being built with a modular design that keeps in mind the different ways the users would want to interact with it. As shown in the following block diagram, it consists of three layers:
9
-
***Pipeline layer**: for users who are looking for one click solution to run standard pipelines. Currently, we support the following pipelines:
8
+
With a goal to build a performant, cost effective and productive platform, we are building **Open Omics acceleration framework**: a one-click, containerized, customizable, open-sourced framework for accelerating digital biology research. It provides tools and pipelines in the field of genomics, transcriptomics, proteomics, drug molecule search and De novo drug design. The framework is being built with a modular design that keeps in mind the different ways the users would want to interact with it. As shown in the following block diagram, it consists of three layers:
9
+
***Pipeline layer**: for users who are looking for one click solution to run standard pipelines. The pipelines can be accessed in the 'pipelines' subfolder. It provides instrcutions to build & run the docker images. Currently, we support the following pipelines:
10
10
*[**fq2sortedbam**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/fq2sortedbam): Given gzipped fastq files of an individual, this workflow performs sequence mapping ([BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2)) and sorting ([SAMtools](https://github.com/samtools/samtools) sort) to output the sorted BAM file.
11
11
*[**DeepVariant based germline pipeline for variant calling (fq2vcf)**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/deepvariant-based-germline-variant-calling-fq2vcf): Given paired end gzipped fastq files of an individual, this workflow performs sequence mapping ([BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2)), sorting ([SAMtools](https://github.com/samtools/samtools) sort) and variant calling ([Open Omics DeepVariant](https://github.com/IntelLabs/open-omics-deepvariant)) to call the variants in the genome of the individual.
12
-
*[**AlphaFold2-based protein folding**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/alphafold2-based-protein-folding): Given one or more protein sequences, this workflow performs preprocessing (database search and multiple sequence alignment using Open Omics [HMMER](https://github.com/IntelLabs/hmmer) and [HH-suite](https://github.com/IntelLabs/hh-suite)) and structure prediction ([Open Omics AlphaFold2](https://github.com/IntelLabs/open-omics-alphafold)) to output the structure(s) of the protein sequences.
12
+
*[**AlphaFold2-based protein folding**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/alphafold2-based-protein-folding): Given one or more protein sequences, this workflow performs preprocessing (database search and multiple sequence alignment using Open Omics [HMMER](https://github.com/IntelLabs/hmmer) and [HH-suite](https://github.com/IntelLabs/hh-suite)) and structure prediction ([Open Omics AlphaFold2](https://github.com/IntelLabs/open-omics-alphafold)) to output the structure(s) of the protein sequences. It has support for both AlphaFold2 monomer and AlphaFold2 multimer.
13
13
*[**Single cell RNASeq analysis**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/single-cell-RNA-seq-analysis): Given a cell by gene matrix, this [scanpy](https://github.com/scverse/scanpy) based workflow performs data preprocessing (filter, linear regression and normalization), dimensionality reduction (PCA), clustering (Louvain/Leiden/kmeans) to cluster the cells into different cell types and visualize those clusters (UMAP/t-SNE).
14
-
***Toolkit (applications) layer**: for users who want to use individual tools or to create their own custom pipelines by combining various tools.
15
-
***Building blocks (lib) layer**: for tool developers, this layer consists of key building blocks -- biology specific and generic AI algorithms and data structures -- that can replace ones used in existing tools to accelerate them or can be used as ingredients to build new efficient tools.
14
+
***Toolkit layer**: for users who want to use individual tools or to create their own custom pipelines by combining various tools. The toolkit layer can be accessed in the 'applications' subfolder. For each tool, we provide instructions to build and run it. Currently, the tools supported include: genomics (BWA-MEM, minimap2, bcftools, SAMtools, DeepVariant), transcriptomics (STAR aligner), protein folding (AlphaFold2, ESMFold), protein structure and sequence design (RFDiffusion, ProteinMPNN, LM-design, ESM2-inv, ProtGPT2, ESM2 embeddings), molecular docking (AutoDock, AutoDock-Vina), De novo molecule generation (MoFlow).
15
+
***Building blocks layer**: for tool developers, this layer consists of key building blocks -- biology specific and generic AI algorithms and data structures -- that can replace ones used in existing tools to accelerate them or can be used as ingredients to build new efficient tools. This layer can be accessed in the 'lib' subfolder.
Open-Omics-Autodock-Vina is a fast, efficient molecular docking software used to predict ligand-protein binding poses and affinities. It features a refined scoring function, parallel execution on multicore CPUs and user-friendly configuration.
3
+
4
+
## Docker Setup Instructions
5
+
6
+
7
+
### 1. Build the Docker Image
8
+
To build the Docker image with the tag `docker_vina`, use the following commands based on your machine's proxy requirements:
Select any protein complex from the available dataset of **140** protein-ligand complexes(https://zenodo.org/records/4031961) which you can download from (https://zenodo.org/records/4031961/files/data.zip?download=1). This guide uses the **5wlo** protein as an example.
21
+
22
+
1) Run the below commands to make data download script executable, download the complete dataset and extract the data for `5wlo`:
23
+
24
+
```bash
25
+
chmod +x data_download_script.sh
26
+
bash data_download_script.sh 5wlo
27
+
```
28
+
**Note: You can replace 5wlo with any other complex name from the complete dataset available in `data_original/data` directory.**
29
+
30
+
2) Create an output directory to store results specific to `5wlo`:
31
+
```bash
32
+
mkdir -p 5wlo_output
33
+
```
34
+
35
+
3) Set the environment variables for the `5wlo` protein as follows:
36
+
```bash
37
+
export INPUT_VINA=$PWD/5wlo
38
+
export OUTPUT_VINA=$PWD/5wlo_output
39
+
```
40
+
41
+
4) Add the necessary permissions to output folder for Docker to write to it:
42
+
```bash
43
+
sudo chmod -R a+w $OUTPUT_VINA
44
+
```
45
+
46
+
### 3. Run the Docker Container
47
+
Verify that the Docker image was built successfully by listing Docker images:
48
+
```bash
49
+
docker images | grep docker_vina
50
+
```
51
+
If the image is listed, run AutoDock Vina with the following command:
This command will process your receptor and ligand files and place the results in the specified output directory.
56
+
### 4. Expected Output
57
+
After running the above command, you should find the output file (`rand-1_out.pdbqt`) in the output directory, such as `5wlo_output` for this example.
58
+
59
+
---
60
+
The original README content of AutoDock-Vina follows:
61
+
62
+
## AutoDock Vina: Docking and virtual screening program
63
+
64
+
**AutoDock Vina** is one of the **fastest** and **most widely used****open-source** docking engines. It is a turnkey computational docking program that is based on a simple scoring function and rapid gradient-optimization conformational search. It was originally designed and implemented by Dr. Oleg Trott in the Molecular Graphics Lab, and it is now being maintained and develop by the Forli Lab at The Scripps Research Institute.
65
+
66
+
* AutoDock4.2 and Vina scoring functions
67
+
* Support of simultaneous docking of multiple ligands and batch mode for virtual screening
68
+
* Support of macrocycle molecules
69
+
* Hydrated docking protocol
70
+
* Can write and load external AutoDock maps
71
+
* Python bindings for Python 3
72
+
73
+
## Documentation
74
+
75
+
The installation instructions, documentation and tutorials can be found on [readthedocs.org](https://autodock-vina.readthedocs.io/en/latest/).
76
+
77
+
## Citations
78
+
*[J. Eberhardt, D. Santos-Martins, A. F. Tillack, and S. Forli. (2021). AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling.](https://pubs.acs.org/doi/10.1021/acs.jcim.1c00203)
79
+
*[O. Trott and A. J. Olson. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2), 455-461.](https://onlinelibrary.wiley.com/doi/10.1002/jcc.21334)
0 commit comments