Skip to content

Commit 3cfa489

Browse files
vasimuddinsri480673arunvelliyangiri18yuk12Rahamathullah365
authored
All the contents are ready for V3.0 release (#37)
* Autodock (#14) * Added Autodock workload * removed unnecessary contents * removed unnecessary contents * Removed old files and updated README and data download script * updated readme and data download script * updated readme and data download script * updated README * updated README * updated README * updated README * updated README * updated README * updated README * updated README * updated README * updated data download script for the test cases --------- Co-authored-by: arunvelliyangiri18 <[email protected]> * Autodock vina (#15) * Added Autodock workload * Added Autodock-Vina workload * removed unnecessary contents * removed unnecessary contents * removed unnecessary contents * removed unnecessary contents * removed Autodock * added data download script and removed 5wlo * Added updated README * updated README and data download script * updated README * updated README * updated README * updated README * updated README * updated README * updated README * updated data download script for the test cases * updated README * updated README --------- Co-authored-by: arunvelliyangiri18 <[email protected]> * Added ESM (#13) * add esm docker files and patch files * add esm docker files and patch files * add esm docker files and patch files * add esm docker files and patch files --------- * added STAR aligner (v2.7.11b) as a submodule in applications (#17) * Add ProtGPT2, RFdiffusion, and ProteinMPNN projects (#23) * adding protgpt2 * adding ProteinMPNN * adding RFdiffusion * update protgpt2.py file and README * update ProteinMPNN Dockerfile * In this commit, I have update the Dockerfile, RFdiffusion patch, symmetry file, and run_inference file. * Updated the Open Omics block diagram (#20) * Updated the Open Omics block diagram * Updated the block diagram * Update README.md to reflect 3.0 release * Uploaded the new Open Omics diagram for v3.0 * Update README.md * Updated the Open Omics diagram with better quality * Update path to Open Omics diagram * removed rfd, proteinmpnn, protgpt, will send the pr again * cleanup the files for Rfdiffusion and ProteinMPNN and Protgpt2 (#24) * adding protgpt2 * adding ProteinMPNN * adding RFdiffusion * update protgpt2.py file and README * update ProteinMPNN Dockerfile * In this commit, I have update the Dockerfile, RFdiffusion patch, symmetry file, and run_inference file. * removed RFdiffusion optimize code * removed ProteinMPNN optimize code * Update README.md * Update README.md * Update README.md * Update README.md * cleanup hidden files * cleanedup some stray hidden files * Added privacy notice (#16) (#19) Co-authored-by: sanchit-misra <[email protected]> * Autodock (#21) * Added Autodock workload * removed unnecessary contents * removed unnecessary contents * Removed old files and updated README and data download script * updated readme and data download script * updated readme and data download script * updated README * updated README * updated README * updated README * updated README * updated README * updated README * updated README * updated README * updated data download script for the test cases * updated README for proxy build command for docker --------- Co-authored-by: arunvelliyangiri18 <[email protected]> Co-authored-by: manasi-t24 <[email protected]> * Autodock vina (#22) * Added Autodock workload * Added Autodock-Vina workload * removed unnecessary contents * removed unnecessary contents * removed unnecessary contents * removed unnecessary contents * removed Autodock * added data download script and removed 5wlo * Added updated README * updated README and data download script * updated README * updated README * updated README * updated README * updated README * updated README * updated README * updated data download script for the test cases * updated README * updated README * updated README for proxy build command for docker --------- Co-authored-by: arunvelliyangiri18 <[email protected]> Co-authored-by: manasi-t24 <[email protected]> * updated ProteinMPNN and ProtGPT2 and RFdiffusion Dockerfiles (#28) * ESM: Dockerfiles updated to work independently (#26) * Add ESM changes * Add ESM changes * MoFlow workload added (#18) * Added privacy notice (#16) * Add MoFlow updates * Add MoFlow updates * Add MoFlow updates --------- Co-authored-by: sanchit-misra <[email protected]> * Update README.md (#29) * Updated ESM README changes (#31) Corrected LM design command line * Adding multimer support to v3.0-release branch (#33) * Adding support for AlphaFold2 Multimer * single docker files for handling monomer and multimer cases * changing to the main branch of open-omics-alphafold * Added privacy notice (#16) (#25) * removed the commented code --------- Co-authored-by: sanchit-misra <[email protected]> * git clone replaced with wget for release code (#34) * Update README.md to reflect v3.0 additions (#35) * fixed wget OO downloads urls in fq2bam and dv1 dockers for V3.0 release (#36) * updated docker files (fq2bam, dv1) with git download * Update README.md * Update README.md * clean up * clean up dockers --------- Co-authored-by: vasimuddin.md <[email protected]> Co-authored-by: vasimuddin.md <[email protected]> --------- Co-authored-by: sri480673 <[email protected]> Co-authored-by: arunvelliyangiri18 <[email protected]> Co-authored-by: Vasimuddin Md <[email protected]> Co-authored-by: Rahamathullah365 <[email protected]> Co-authored-by: sanchit-misra <[email protected]> Co-authored-by: Chirayu Haryan <[email protected]> Co-authored-by: manasi-t24 <[email protected]> Co-authored-by: Narendra Chaudhary <[email protected]> Co-authored-by: vasimuddin.md <[email protected]> Co-authored-by: vasimuddin.md <[email protected]>
1 parent 9b6d869 commit 3cfa489

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+7526
-85
lines changed

.gitmodules

+3
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,6 @@
2222
[submodule "applications/bcftools"]
2323
path = applications/bcftools
2424
url = https://github.com/samtools/bcftools.git
25+
[submodule "applications/STAR"]
26+
path = applications/STAR
27+
url = https://github.com/alexdobin/STAR.git

README.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,17 @@ Intel lab's open sourced data science framework for accelerating digital biology
55
# Introduction
66
We are in the epoch of digital biology, that is fueled by the convergence of three revolutions: 1) Measurement of biological systems at high resolution resulting in massive multi-modal, multi-scale, unstructured, distributed data, 2) Novel data science (AI and data management) techniques on this data, and 3) Wide-spread cloud use enabling massive compute and public data repositories, large collaborative projects and consortia. It will require computing and data management at unprecedented scale and speed. However, performance alone would not suffice if it significantly compromised the productivity of biologists and data scientists who are at the forefront of this transformation.
77

8-
With a goal to build a performant, cost effective and productive platform, we are building **Open Omics acceleration framework**: a one-click, containerized, customizable, open-sourced framework for accelerating digital biology research. The framework is being built with a modular design that keeps in mind the different ways the users would want to interact with it. As shown in the following block diagram, it consists of three layers:
9-
* **Pipeline layer**: for users who are looking for one click solution to run standard pipelines. Currently, we support the following pipelines:
8+
With a goal to build a performant, cost effective and productive platform, we are building **Open Omics acceleration framework**: a one-click, containerized, customizable, open-sourced framework for accelerating digital biology research. It provides tools and pipelines in the field of genomics, transcriptomics, proteomics, drug molecule search and De novo drug design. The framework is being built with a modular design that keeps in mind the different ways the users would want to interact with it. As shown in the following block diagram, it consists of three layers:
9+
* **Pipeline layer**: for users who are looking for one click solution to run standard pipelines. The pipelines can be accessed in the 'pipelines' subfolder. It provides instrcutions to build & run the docker images. Currently, we support the following pipelines:
1010
* [**fq2sortedbam**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/fq2sortedbam): Given gzipped fastq files of an individual, this workflow performs sequence mapping ([BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2)) and sorting ([SAMtools](https://github.com/samtools/samtools) sort) to output the sorted BAM file.
1111
* [**DeepVariant based germline pipeline for variant calling (fq2vcf)**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/deepvariant-based-germline-variant-calling-fq2vcf): Given paired end gzipped fastq files of an individual, this workflow performs sequence mapping ([BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2)), sorting ([SAMtools](https://github.com/samtools/samtools) sort) and variant calling ([Open Omics DeepVariant](https://github.com/IntelLabs/open-omics-deepvariant)) to call the variants in the genome of the individual.
12-
* [**AlphaFold2-based protein folding**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/alphafold2-based-protein-folding): Given one or more protein sequences, this workflow performs preprocessing (database search and multiple sequence alignment using Open Omics [HMMER](https://github.com/IntelLabs/hmmer) and [HH-suite](https://github.com/IntelLabs/hh-suite)) and structure prediction ([Open Omics AlphaFold2](https://github.com/IntelLabs/open-omics-alphafold)) to output the structure(s) of the protein sequences.
12+
* [**AlphaFold2-based protein folding**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/alphafold2-based-protein-folding): Given one or more protein sequences, this workflow performs preprocessing (database search and multiple sequence alignment using Open Omics [HMMER](https://github.com/IntelLabs/hmmer) and [HH-suite](https://github.com/IntelLabs/hh-suite)) and structure prediction ([Open Omics AlphaFold2](https://github.com/IntelLabs/open-omics-alphafold)) to output the structure(s) of the protein sequences. It has support for both AlphaFold2 monomer and AlphaFold2 multimer.
1313
* [**Single cell RNASeq analysis**](https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/tree/main/pipelines/single-cell-RNA-seq-analysis): Given a cell by gene matrix, this [scanpy](https://github.com/scverse/scanpy) based workflow performs data preprocessing (filter, linear regression and normalization), dimensionality reduction (PCA), clustering (Louvain/Leiden/kmeans) to cluster the cells into different cell types and visualize those clusters (UMAP/t-SNE).
14-
* **Toolkit (applications) layer**: for users who want to use individual tools or to create their own custom pipelines by combining various tools.
15-
* **Building blocks (lib) layer**: for tool developers, this layer consists of key building blocks -- biology specific and generic AI algorithms and data structures -- that can replace ones used in existing tools to accelerate them or can be used as ingredients to build new efficient tools.
14+
* **Toolkit layer**: for users who want to use individual tools or to create their own custom pipelines by combining various tools. The toolkit layer can be accessed in the 'applications' subfolder. For each tool, we provide instructions to build and run it. Currently, the tools supported include: genomics (BWA-MEM, minimap2, bcftools, SAMtools, DeepVariant), transcriptomics (STAR aligner), protein folding (AlphaFold2, ESMFold), protein structure and sequence design (RFDiffusion, ProteinMPNN, LM-design, ESM2-inv, ProtGPT2, ESM2 embeddings), molecular docking (AutoDock, AutoDock-Vina), De novo molecule generation (MoFlow).
15+
* **Building blocks layer**: for tool developers, this layer consists of key building blocks -- biology specific and generic AI algorithms and data structures -- that can replace ones used in existing tools to accelerate them or can be used as ingredients to build new efficient tools. This layer can be accessed in the 'lib' subfolder.
1616

1717
<p align="center">
18-
<img src="https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/blob/main/images/Open-Omics-Acceleration-Framework-v2.0.JPG" height="300"/a></br>
18+
<img src="https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/blob/main/images/Open-Omics-Acceleration-Framework-v3.0.jpg" height="300"/a></br>
1919
</p>
2020

2121
With a goal of providing a one-stop platform, this framework brings our following repositories for digital biology under one umbrella:
@@ -37,7 +37,7 @@ In addition, we also use several existing AI libraries: oneDNN, oneDAL, oneCCL,
3737
# Getting Started
3838
```sh
3939
# Download release
40-
wget https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/releases/download/2.1/Source_code_with_submodules.tar.gz
40+
wget https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/releases/download/3.0/Source_code_with_submodules.tar.gz
4141
tar -xzf Source_code_with_submodules.tar.gz
4242

4343
# Clone master

applications/AutoDock-Vina/Dockerfile

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
FROM condaforge/miniforge3:4.10.2-0
2+
ENV DEBIAN_FRONTEND=noninteractive
3+
RUN apt-get update && apt-get install -y --no-install-recommends \
4+
build-essential \
5+
libboost-all-dev \
6+
swig \
7+
vim \
8+
gcc-8 \
9+
g++-8 \
10+
numactl \
11+
time && \
12+
apt-get clean && \
13+
rm -rf /var/lib/apt/lists/*
14+
ENV CC=gcc-8
15+
ENV CXX=g++-8
16+
WORKDIR /opt
17+
RUN git clone https://github.com/ccsb-scripps/AutoDock-Vina.git
18+
WORKDIR /opt/AutoDock-Vina
19+
RUN git checkout v1.2.2
20+
WORKDIR /opt/AutoDock-Vina/build/linux/release
21+
RUN make -j$(nproc)
22+
ENV SERVICE_NAME="autodock-vina-service"
23+
RUN groupadd --gid 1001 $SERVICE_NAME && \
24+
useradd -m -g $SERVICE_NAME --shell /bin/false --uid 1001 $SERVICE_NAME
25+
RUN chown -R $SERVICE_NAME:$SERVICE_NAME /opt
26+
USER $SERVICE_NAME
27+
ENV PATH="/opt/AutoDock-Vina/build/linux/release:$PATH"
28+
WORKDIR /input
29+
HEALTHCHECK NONE
30+
CMD ["vina","--help"]
31+

applications/AutoDock-Vina/README.md

+79
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
## Open-Omics-Autodock-Vina
2+
Open-Omics-Autodock-Vina is a fast, efficient molecular docking software used to predict ligand-protein binding poses and affinities. It features a refined scoring function, parallel execution on multicore CPUs and user-friendly configuration.
3+
4+
## Docker Setup Instructions
5+
6+
7+
### 1. Build the Docker Image
8+
To build the Docker image with the tag `docker_vina`, use the following commands based on your machine's proxy requirements:
9+
* For machine without a proxy:
10+
```bash
11+
docker build -t docker_vina .
12+
```
13+
* For machine with a proxy:
14+
```bash
15+
docker build --build-arg http_proxy=<http_proxy> --build-arg https_proxy=<https_proxy> --build-arg no_proxy=<no_proxy_ip> -t docker_vina .
16+
```
17+
18+
19+
### 2. Choose and Download Protein Complex Data
20+
Select any protein complex from the available dataset of **140** protein-ligand complexes(https://zenodo.org/records/4031961) which you can download from (https://zenodo.org/records/4031961/files/data.zip?download=1). This guide uses the **5wlo** protein as an example.
21+
22+
1) Run the below commands to make data download script executable, download the complete dataset and extract the data for `5wlo`:
23+
24+
```bash
25+
chmod +x data_download_script.sh
26+
bash data_download_script.sh 5wlo
27+
```
28+
**Note: You can replace 5wlo with any other complex name from the complete dataset available in `data_original/data` directory.**
29+
30+
2) Create an output directory to store results specific to `5wlo`:
31+
```bash
32+
mkdir -p 5wlo_output
33+
```
34+
35+
3) Set the environment variables for the `5wlo` protein as follows:
36+
```bash
37+
export INPUT_VINA=$PWD/5wlo
38+
export OUTPUT_VINA=$PWD/5wlo_output
39+
```
40+
41+
4) Add the necessary permissions to output folder for Docker to write to it:
42+
```bash
43+
sudo chmod -R a+w $OUTPUT_VINA
44+
```
45+
46+
### 3. Run the Docker Container
47+
Verify that the Docker image was built successfully by listing Docker images:
48+
```bash
49+
docker images | grep docker_vina
50+
```
51+
If the image is listed, run AutoDock Vina with the following command:
52+
```bash
53+
docker run -it -v $INPUT_VINA:/input -v $OUTPUT_VINA:/output docker_vina:latest vina --receptor protein.pdbqt --ligand rand-1.pdbqt --out /output/rand-1_out.pdbqt --center_x 16.459 --center_y -19.946 --center_z -5.850 --size_x 18 --size_y 18 --size_z 18 --seed 1234 --exhaustiveness 64
54+
```
55+
This command will process your receptor and ligand files and place the results in the specified output directory.
56+
### 4. Expected Output
57+
After running the above command, you should find the output file (`rand-1_out.pdbqt`) in the output directory, such as `5wlo_output` for this example.
58+
59+
---
60+
The original README content of AutoDock-Vina follows:
61+
62+
## AutoDock Vina: Docking and virtual screening program
63+
64+
**AutoDock Vina** is one of the **fastest** and **most widely used** **open-source** docking engines. It is a turnkey computational docking program that is based on a simple scoring function and rapid gradient-optimization conformational search. It was originally designed and implemented by Dr. Oleg Trott in the Molecular Graphics Lab, and it is now being maintained and develop by the Forli Lab at The Scripps Research Institute.
65+
66+
* AutoDock4.2 and Vina scoring functions
67+
* Support of simultaneous docking of multiple ligands and batch mode for virtual screening
68+
* Support of macrocycle molecules
69+
* Hydrated docking protocol
70+
* Can write and load external AutoDock maps
71+
* Python bindings for Python 3
72+
73+
## Documentation
74+
75+
The installation instructions, documentation and tutorials can be found on [readthedocs.org](https://autodock-vina.readthedocs.io/en/latest/).
76+
77+
## Citations
78+
* [J. Eberhardt, D. Santos-Martins, A. F. Tillack, and S. Forli. (2021). AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling.](https://pubs.acs.org/doi/10.1021/acs.jcim.1c00203)
79+
* [O. Trott and A. J. Olson. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2), 455-461.](https://onlinelibrary.wiley.com/doi/10.1002/jcc.21334)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
url="https://zenodo.org/records/4031961/files/data.zip?download=1"
2+
download_dir="./data_original"
3+
target_folder="$1"
4+
if [ ! -d "$download_dir/data" ]; then
5+
echo "Downloading data.zip..."
6+
mkdir -p "$download_dir"
7+
wget -O "$download_dir/data.zip" "$url"
8+
9+
echo "Unzipping data.zip..."
10+
unzip "$download_dir/data.zip" -d "$download_dir"
11+
rm -f "$download_dir/data.zip"
12+
13+
echo "Data downloaded and extracted to $download_dir/data"
14+
else
15+
echo "Data already exists in $download_dir/data. Skipping download and extraction."
16+
fi
17+
if [ -d "$target_folder" ]; then
18+
echo "The folder '$target_folder' already exists in the current directory. Skipping copy."
19+
else
20+
if [ -d "$download_dir/data/$target_folder" ]; then
21+
cp -r "$download_dir/data/$target_folder" ./
22+
echo "$target_folder folder successfully copied to the current directory."
23+
else
24+
echo "$target_folder folder not found inside '$download_dir/data'."
25+
fi
26+
fi
27+
echo "'$target_folder' folder is now available in the current directory."

applications/Autodock/Dockerfile

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
FROM condaforge/miniforge3:4.10.2-0
2+
ENV DEBIAN_FRONTEND=noninteractive
3+
RUN apt-get update && apt-get install -y --no-install-recommends \
4+
vim \
5+
git \
6+
build-essential \
7+
ocl-icd-opencl-dev \
8+
clinfo && \
9+
apt-get clean && \
10+
rm -rf /var/lib/apt/lists/*
11+
RUN conda install -c conda-forge \
12+
python=3.10 \
13+
requests=2.28.2 \
14+
mkl=2023.1 \
15+
dpcpp_linux-64=2023.1 \
16+
dpcpp-cpp-rt=2023.1 \
17+
mkl-devel=2023.1 && \
18+
conda clean --all -f -y
19+
ENV LD_LIBRARY_PATH="/opt/conda/lib:${LD_LIBRARY_PATH}"
20+
WORKDIR /opt
21+
ENV SERVICE_NAME="autodock-service"
22+
RUN groupadd --gid 1001 $SERVICE_NAME && \
23+
useradd -m -g $SERVICE_NAME --shell /bin/false --uid 1001 $SERVICE_NAME && \
24+
mkdir -p /opt/AutoDock && \
25+
chown -R $SERVICE_NAME:$SERVICE_NAME /opt/AutoDock
26+
USER $SERVICE_NAME
27+
WORKDIR /opt/AutoDock
28+
RUN git clone https://github.com/emascarenhas/AutoDock-GPU.git . && \
29+
git checkout v1.4
30+
RUN make DEVICE=CPU NUMWI=64 && \
31+
rm -rf .git build_temp
32+
ENV PATH="/opt/AutoDock/bin:${PATH}"
33+
HEALTHCHECK NONE
34+
WORKDIR /input
35+
CMD ["autodock_cpu_64wi","--help"]
36+

0 commit comments

Comments
 (0)