SAW: Stereo-seq Analysis Workflow

Workflow for analyzing Stereo-seq transcriptomic data. Stereo-seq Analysis Workflow (SAW) software suite is a set of pipelines bundled to map sequenced reads to their spatial location on the tissue section, quantify the corresponding gene expression levels and visually present spatial gene expression distribution.

DockerHub Link: https://hub.docker.com/r/stomics/saw/tags

Introduction

SAW processes the sequencing data of Stereo-seq to generate spatial gene expression matrices, and the users could take these files as the starting point to perform downstream analysis. SAW includes thirteen essential and suggest pipelines and auxiliary tools for supporting other handy functions.

System Requirements

Hardware

Stereo-seq Analysis Workflow (SAW) should be run on a Linux system that meets the following requirements:

8-core Intel or AMD processor (24 cores recommended)
128GB RAM (256GB recommended)
1TB free disk space
64-bit CentOS/RedHat 7.8 or Ubuntu 20.04

Software

Singularity: a container platform
SAW in the Singularity Image File (SIF) format
ImageQC version greater than v1.1.0

Quick installation of Singularity

## On Red Hat Enterprise Linux or CentOS install the following dependencies:
$ sudo yum update -y && \
     sudo yum groupinstall -y 'Development Tools' && \
     sudo yum install -y \
     openssl-devel \
     libuuid-devel \
     libseccomp-devel \
     wget \
     squashfs-tools \
     cryptsetup

## On Ubuntu or Debian install the following dependencies:
$ sudo apt-get update && sudo apt-get install -y \
    build-essential \
    uuid-dev \
    libgpgme-dev \
    squashfs-tools \
    libseccomp-dev \
    wget \
    pkg-config \
    git \
    cryptsetup-bin

## Install Go
$ export VERSION=1.14.12 OS=linux ARCH=amd64 && \
    wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
    sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
    rm go$VERSION.$OS-$ARCH.tar.gz

$ echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
    echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
    source ~/.bashrc

## Install singularity on CentOS without compile
$ yum install -y singularity

For additional help or support, please visit https://sylabs.io/guides/3.8/admin-guide/installation.html

Quick download SAW from DockerHub

Currently, the latest version of SAW is v5.4.0. You can download SAW by running the following command:

singularity build SAW_v5.4.0.sif docker://stomics/saw:05.4.0

Preparation

Build index for reference genome

A genome index has to be constructed before performing data mapping. The index files are used as reference when aligning reads. You can prepare the indexed reference before run SAW as follow:

singularity exec <SAW_version.sif> mapping --runMode genomeGenerate \
    --genomeDir reference/STAR_SJ100 \
    --genomeFastaFiles reference/genome.fa \
    --sjdbGTFfile reference/genes.gtf \
    --sjdbOverhang 99 \
    --runThreadN 12

For more information, refer to "script/pre_buildIndexedRef"

Get Stereo-seq Chip T mask file

If you want to access mask file (.h5/.bin) for your own data, please contact your salesmen or your team project administrator.
To access mask file for published paper, please go to CNGBdb > STOmicsDB > Collections.

RUN

The RUN examples and bash script for v5.1.3 are coming very soon.

Usage

# for saw_v5.4.0 & saw_v5.1.3
usage: sh <stereoPipeline.sh> -genomeSize -splitCount -maskFile -fq1 -fq2 -refIndex -genomeFile -speciesName -tissueType -annotationFile -outDir -imageRecordFile -imageCompressedFile -doCellBin -threads -sif
    -genomeSize : file size of the genome.fa (GiB) 
    -splitCount : count of splited stereochip mask file, usually 16 for SE+Q4 fq data and 1 for PE+Q40 fq data
    -maskFile : stereochip mask file
    -fq1 : fastq file path of read1, if there are more than one fastq file, please separate them with comma, e.g:lane1_read_1.fq.gz,lane2_read_1.fq.gz
    -fq2 : fastq file path of read2, if there are more than one fastq file, please separate them with comma, not requested for 'SE+Q4' fastq data, e.g:lane1_read_2.fq.gz,lane2_read_2.fq.gz    
    -refIndex : reference genome indexed folder, please build IT before SAW analysis run
    -speciesName : specie of the sample
    -tissueType : tissue type of the sample
    -annotationsFile :  annotations file in gff or gtf format, the file must contain gene and exon annotations
    -outDir : output directory path
    -imageRecordFile : image file(*.ipr) generated by ImageQC software, not requested
    -imageCompressedFile : image file(*.tar.gz) generated by ImageQC software, not requested 
    -doCellBin : [Y/N]
    -threads : the number of threads to be used in running the pipeline
    -sif : the file format of the visual software

# 1GiB=1024M=10241024KB=10241024*1024B
# SAW version : saw_v5.4.0 & saw_v5.1.3

# for saw_beta_v4.1.0 & saw_v4.1.0
usage: sh <stereoRun.sh> -m maskFile -1 read1 -2 read2 -g indexedGenome -a annotationFile -o outDir -i image -t threads -s visualSif -c genomeSize
    -m stereochip mask file
    -1 fastq file path of read1, if there are more than one fastq file, please separate them with comma, e.g:lane1_read_1.fq.gz,lane2_read_1.fq.gz
    -2 fastq file path of read2, if there are more than one fastq file, please separate them with comma, e.g:lane1_read_2.fq.gz,lane2_read_2.fq.gz
    -g genome that has been indexed by star
    -a annotation file in gff or gtf format, the file must contain gene and exon annotation, and also the transript annotation
    -o output directory path
    -i image directory path, must contains SN*.tar.gz and SN*.json file generated by ImageQC software, not required
    -t thread number will be used to run this pipeline
    -s docker image that packed analysis softwares
    -c genome fasta file size (GiB, Gibibyte)

# 1GiB=1024M=10241024KB=10241024*1024B
# SAW version : saw_beta_v4.1.0 & saw_v4.1.0

Example: Running the entire workflow

For SAW_v4, please use the stereoRun_singleLane_v4.1.0.sh or stereoRun_multiLane_v4.1.0.sh to run whole workflow.

For SAW_v5, please use the stereoPipeline.sh to run whole workflow.

Run stereoPipeline.sh bash script

# for saw_v5.4.0 & saw_v5.1.3
cd <Your Working Directory>

ulimit -n 10240
ulimit -v 33170449147
export NUMBA_CACHE_DIR=<Your Working Directory>

dataDir=<Your Working Directory>/rawData
outDir=<Your Working Directory>/resultresult

export SINGULARITY_BIND=$dataDir,$outDir
bash stereoPipeline.sh \
    -genomeSize 5 \
    -splitCount 1 \
    -maskFile $dataDir/mask/SN.h5 \
    -fq1 $dataDir/reads/lane1_read_1.fq.gz,...,$dataDir/reads/laneN_read_1.fq.gz  \
    -fq2 $dataDir/reads/lane1_read_2.fq.gz,...,$dataDir/reads/laneN_read_2.fq.gz \ # [optional] when the sequenced data is in PE format
    -speciesName <speciesName> \
    -tissueType <tissueName> \
    -refIndex $dataDir/reference/STAR_SJ100 \
    -annotationFile $dataDir/reference/genes.gtf \
    -imageRecordFile $dataDir/SN/image_dir_path/<imageQC result>.ipr \ # [optional] when image is given and has passed QC
    -imageCompressedFile $dataDir/SN/image_dir_path/<imageQC result>.tar.gz \ # [optional] when image is given and has passed QC
    -sif $dataDir/SAW/SAW_version.sif \
    -threads 16 \
    -doCellBin Y \ # [optional] when you want to do the cell segmentation and get cell gene expression data
    -outDir $outDir/result

Run stereoRun_singleLane_v4.1.0.sh bash script

If only one lane sequencing data was given, run the stereoRun_singleLane_v4.1.0.sh bash script as the following:

# for saw_beta_v4.1.0 & saw_v4.1.0
ulimit -n 10240 
dataDir=/Full/Path/Of/Input/File 
outDir=/Full/Path/Of/Output/File 
export SINGULARITY_BIND=$dataDir,$outDir
bash stereoRun_singleLane.sh \
    -m $dataDir/mask/SN.h5 \
    -1 $dataDir/reads/lane1_read_1.fq.gz \
    -2 $dataDir/reads/lane1_read_2.fq.gz \
    -g $dataDir/reference/STAR_SJ100 \
    -a $dataDir/reference/genes.gtf \
    -s $dataDir/SAW/SAW_version.sif \
    -c genome_size \ # (GiB, Gibibyte)
    -i $dataDir/SN/image_dir_path \ # [optional] when image is given and has passed QC
    -o $outDir/result
    
# 1GiB=1024M=10241024KB=10241024*1024B
# SAW version : v4.1.0

Run stereoRun_multiLane_v4.1.0.sh bash script

If more than one lane sequencing data was given, run the stereoRun_multiLane_v4.1.0.sh script as the following:

# for saw_beta_v4.1.0 & saw_v4.1.0
ulimit -n 10240 
dataDir=/Full/Path/Of/Input/File 
outDir=/Full/Path/Of/Output/File 
export SINGULARITY_BIND=$dataDir,$outDir
bash stereoRun_multiLane.sh \
    -m $dataDir/mask/SN.h5 \
    -1 $dataDir/reads/lane1_read_1.fq.gz,$dataDir/reads/lane2_read_1.fq.gz \
    -2 $dataDir/reads/lane1_read_2.fq.gz,$dataDir/reads/lane2_read_2.fq.gz \
    -g $dataDir/reference/STAR_SJ100 \
    -a $dataDir/reference/genes.gtf \
    -s $dataDir/SAW/SAW_version.sif \
    -c genome_size \ # (GiB, Gibibyte)
    -i $dataDir/SN/image_dir_path \ # [option] when tissue image was given
    -o $outDir/result
    

# 1GiB=1024M=10241024KB=10241024*1024B
# SAW version : v4.1.0

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
Documents		Documents
script		script
test_data		test_data
README.md		README.md
SAW_v4.1.0_workflow.jpg		SAW_v4.1.0_workflow.jpg
SAW_v5.1.3_workflow.jpg		SAW_v5.1.3_workflow.jpg
SAW_v5.4.0_workflow.jpg		SAW_v5.4.0_workflow.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SAW: Stereo-seq Analysis Workflow

Introduction

System Requirements

Hardware

Software

Quick installation of Singularity

Quick download SAW from DockerHub

Preparation

Build index for reference genome

Get Stereo-seq Chip T mask file

RUN

Usage

Example: Running the entire workflow

Run stereoPipeline.sh bash script

Run stereoRun_singleLane_v4.1.0.sh bash script

Run stereoRun_multiLane_v4.1.0.sh bash script

About

Uh oh!

Releases

Packages

Languages

James-lin9/SAW

Folders and files

Latest commit

History

Repository files navigation

SAW: Stereo-seq Analysis Workflow

Introduction

System Requirements

Hardware

Software

Quick installation of Singularity

Quick download SAW from DockerHub

Preparation

Build index for reference genome

Get Stereo-seq Chip T mask file

RUN

Usage

Example: Running the entire workflow

Run stereoPipeline.sh bash script

Run stereoRun_singleLane_v4.1.0.sh bash script

Run stereoRun_multiLane_v4.1.0.sh bash script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages