Program website: https://websites.auth.gr/appbio/
Instructor: Grigorios Georgolopoulos, PhD
For help contact: [email protected]
Prior to the class you will need to setup a working directory, install tools and download the files which we are going to work with. The folder tutorials contains HTML and PDF files for each step. In order to get started either follow this README or download and follow instructions in the 00_Setup.pdf document.
Although the excercises here can be run on a local machine, it is highly recommended that you work on the AUTH HPC cluster. More information on the AUTH HPC cluster here: https://hpc.it.auth.gr/
Before setting up, there are some necessary steps specific for remote coding which are specific to Windows users. If you are a Linux os MacOS user skip to section
In order to use SSH (remote host access) to the AUTH computer cluster you will either need to have Windows Subprocesses for Linux (WSL) installed and enabled or use and IDE (integrated development environment) such as VSCode (preferable) or MobaXTerm
If you are not logged in an AUTH network (e.g. working from home), make sure you have eduVPN enabled. More info here
Then open a terminal window or your IDE and type the following:
ssh [username]@aristotle.it.auth.grgit clone https://github.com/ggeorgol/ATACseq_course
cd ATACseq_courseThere is an established set of tools required for analyzing high throughput sequencing data, and ATAC-seq in particular. For this reason we will create a virtual environment using the ANACONDA/miniconda (conda for short) package manager.
Specifically, we are going to need the following tools:
- htslib See SAMtools
- SAMtools The holy grail of HTS data processing. Your trusty hammer. An all-in-one kit for manipulating alignment files (BAM)
- picard Next to SAMtools there is Picard. A set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats.
- deepTools A suite of tools for expliring HTS data. Great for QC and visualization.
- bedTools a swiss-army knife of tools for a wide-range of genomics analysis tasks and genome arithmetic
- bedops Similar to bedTools, BEDOPS is a fast, highly scalable and easily-parallelizable genome analysis toolkit
- subread A suite of software programs for processing next-gen sequencing read data with
featureCountsbeing one of the most popular read counters.
The following snippet will take a few minutes to complete
module load gcc miniconda3
source $CONDA_PROFILE/conda.sh
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
conda create -n atac python=3.10 htslib samtools picard deeptools bedtools bedops subreadTo activate the environment type the following:
conda activate atacIn this course we are going to work with ATAC-seq data generated by the ENCODE project. We will work with naïve and activated T-cells from a female adult with the following accession numbers: ENCSR977LVI, and ENCSR558ZSN. We will use the alignment (BAM) files and the already generated peaks.
If you work on the AUTH cluster, the data should be stored in your personal scartch space $SRCATCH. Keep in mind that data in $SCRATCH will be stored for 30 days only before the scratch space is cleaned up.
If you work on the cluster, type:
DATADIR=${SCRATCH}/ATACseq_course/data
ln -s $DATADIR data # Make a data shortcut to your working directoryIf your work locally, type:
DATADIR=dataContinue
mkdir -p ${DATADIR}$/{ENCSR977LVI,ENCSR558ZSN}
# Download ENCSR558ZSN dataset
# BAM files
wget -P ${DATADIR}$/ENCSR558ZSN https://www.encodeproject.org/files/ENCFF287DFF/@@download/ENCFF287DFF.bam
wget -P ${DATADIR}$/ENCSR558ZSN https://www.encodeproject.org/files/ENCFF218OSF/@@download/ENCFF218OSF.bam
# Peaks
wget -P ${DATADIR}$/ENCSR558ZSN https://www.encodeproject.org/files/ENCFF002MKC/@@download/ENCFF002MKC.bed.gz
wget -P ${DATADIR}$/ENCSR558ZSN https://www.encodeproject.org/files/ENCFF235RAD/@@download/ENCFF235RAD.bed.gz
# Download ENCSR558ZSN dataset
# BAM files
wget -P ${DATADIR}/ENCSR558ZSN https://www.encodeproject.org/files/ENCFF287DFF/@@download/ENCFF287DFF.bam
wget -P ${DATADIR}/ENCSR558ZSN https://www.encodeproject.org/files/ENCFF218OSF/@@download/ENCFF218OSF.bam
# Peaks
wget -P ${DATADIR}/ENCSR558ZSN https://www.encodeproject.org/files/ENCFF235RAD/@@download/ENCFF235RAD.bed.gz
wget -P ${DATADIR}/ENCSR558ZSN https://www.encodeproject.org/files/ENCFF002MKC/@@download/ENCFF002MKC.bed.gz