Skip to content

Commit

Permalink
Merge pull request #57 from yyoshiaki/developmentv1.2.3
Browse files Browse the repository at this point in the history
Developmentv1.2.3
  • Loading branch information
yyoshiaki authored Jul 2, 2020
2 parents 058b9fa + 300ece4 commit a91cf5a
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 33 deletions.
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3352573.svg)](https://doi.org/10.5281/zenodo.3352573)

# ikra v1.2.2 -RNAseq pipeline centered on Salmon-<img src="img/ikra.png" width="20%" align="right" />
# ikra v1.2.3 -RNAseq pipeline centered on Salmon-<img src="img/ikra.png" width="20%" align="right" />

A gene expression table (gene × sample) is automatically created from the experiment matrix. The output can be used as an input of [idep](http://bioinformatics.sdstate.edu/idep/). Ikra is an RNAseq pipeline centered on [salmon](https://combine-lab.github.io/salmon/).


## [日本語ドキュメントはこちら](./README_ja.md)

## Note that sra-tools has to be installed locally. This is up to NCBI's tool upgrade. Please install sra-tools (>=2.10.7).

## Usage

```
Expand All @@ -23,7 +25,8 @@ Options:
--fastq use fastq files instead of SRRid. The extension must be foo.fastq.gz (default : False)
-u, --udocker
-w, --without-docker
-pc, --protein-coding use protein coding transcripts instead of comprehensive transcripts.
-pc, --protein-coding use protein coding transcripts instead of comprehensive transcripts. (default : True)
-ct, --comprehensive-transcripts use comprehensive transcripts instead of protein coding transcripts. (default : False)
-t, --threads
-o, --output output file. (default : output.tsv)
-l, --log log file. (default : ikra.log)
Expand All @@ -44,14 +47,14 @@ Options:

**SRR mode**

| name | SRR | Layout | condition1 | ... |
| name | SRR | Layout | condition1 (optional) | ... |
| ---- | ---- | - | - | - |
| Treg_LN_1 | SRR5385247 | SE | Treg | ...|
| Treg_LN_2 | SRR5385248 | SE | Treg | ... |

**fastq mode**

| name | fastq(PREFIX) | Layout | condition1 | ... |
| name | fastq(PREFIX) | Layout | condition1 (optional) | ... |
| ---- | ---- | - | - | - |
| Treg_LN_1 | hoge/SRR5385247 | SE | Treg | ...|
| Treg_LN_2 | hoge/SRR5385248 | SE | Treg | ... |
Expand Down Expand Up @@ -117,13 +120,13 @@ $ git pull origin master
```bash
$ bash ikra.sh --version
...
ikra v1.2.2 -RNAseq pipeline centered on Salmon-
ikra v1.2.3 -RNAseq pipeline centered on Salmon-
...
```

### Version of each tool

- sra-tools : 2.10.0
- sra-tools : > 2.10.7
- FastQC 0.11.5
- MultiQC : 1.4
- Trim Galore! : 0.6.3
Expand Down
9 changes: 5 additions & 4 deletions README_ja.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3352573.svg)](https://doi.org/10.5281/zenodo.3352573)

# ikra v1.2.2 -RNAseq pipeline centered on Salmon-<img src="img/ikra.png" width="20%" align="right" />
# ikra v1.2.3 -RNAseq pipeline centered on Salmon-<img src="img/ikra.png" width="20%" align="right" />

[idep](http://bioinformatics.sdstate.edu/idep/)のinputとして発現量テーブル(gene × sample)をexperiment matrixから自動でつくる。salmonを用いる。

Expand All @@ -20,7 +20,8 @@ Options:
--fastq use fastq files instead of SRRid. The extension must be foo.fastq.gz (default : False)
-u, --udocker
-w, --without-docker
-pc, --protein-coding use protein coding transcripts instead of comprehensive transcripts.
-pc, --protein-coding use protein coding transcripts instead of comprehensive transcripts. (default : True)
-ct, --comprehensive-transcripts use comprehensive transcripts instead of protein coding transcripts. (default : False)
-t, --threads
-o, --output output file. (default : output.tsv)
-l, --log log file. (default : ikra.log)
Expand All @@ -42,14 +43,14 @@ experiment matrixはカンマ区切りで(csv形式)。

**SRR mode**

| name | SRR | Layout | condition1 | ... |
| name | SRR | Layout | condition1 (optional) | ... |
| ---- | ---- | - | - | - |
| Treg_LN_1 | SRR5385247 | SE | Treg | ...|
| Treg_LN_2 | SRR5385248 | SE | Treg | ... |

**fastq mode**

| name | fastq(PREFIX) | Layout | condition1 | ... |
| name | fastq(PREFIX) | Layout | condition1 (optional) | ... |
| ---- | ---- | - | - | - |
| Treg_LN_1 | hoge/SRR5385247 | SE | Treg | ...|
| Treg_LN_2 | hoge/SRR5385248 | SE | Treg | ... |
Expand Down
45 changes: 31 additions & 14 deletions ikra.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ set -xe

PROGNAME="$( basename $0 )"

VERSION="v1.2.2"
VERSION="v1.2.3"

cat << "EOF"
__
Expand Down Expand Up @@ -41,7 +41,8 @@ Options:
--fastq use fastq files instead of SRRid. The extension must be foo.fastq.gz (default : False)
-u, --udocker
-w, --without-docker
-pc, --protein-coding use protein coding transcripts instead of comprehensive transcripts.
-pc, --protein-coding use protein coding transcripts instead of comprehensive transcripts. (defalut : True)
-ct, --comprehensive-transcripts use comprehensive transcripts instead of protein coding transcripts. (default : False)
-t, --threads
-o, --output output file. (default : output.tsv)
-l, --log log file. (default : ikra.log)
Expand Down Expand Up @@ -74,7 +75,7 @@ DOCKER=docker
THREADS=1
IF_TEST=false
IF_FASTQ=false
IF_PC=false
IF_PC=True
SUFFIX_PE_1=_1.fastq.gz
SUFFIX_PE_2=_2.fastq.gz
OUTPUT_FILE=output.tsv
Expand All @@ -95,6 +96,9 @@ for opt in "$@"; do
'-pc'|'--protein-coding' )
IF_PC=true; shift
;;
'-ct'|'--comprehensive-transcripts' )
IF_PC=true; shift
;;
'-u'|'--udocker' )
DOCKER=udocker; shift
;;
Expand Down Expand Up @@ -249,7 +253,7 @@ else
fi

COWSAY=cowsay
PREFETCH=prefetch
# PREFETCH=prefetch
FASTQ_DUMP=fastq-dump
FASTERQ_DUMP=fasterq-dump
FASTQC=fastqc
Expand Down Expand Up @@ -279,7 +283,9 @@ if [[ "$RUNINDOCKER" -eq "1" ]]; then
# chmod 777 .

COWSAY_IMAGE=docker/whalesay
SRA_TOOLKIT_IMAGE=quay.io/biocontainers/sra-tools:2.10.0--pl526he1b5a44_0
# quay.io/biocontainers/sra-tools:2.10.7--pl526haddd2b5_1 had an error.
# the earlier version may stop during the download.
SRA_TOOLKIT_IMAGE=quay.io/biocontainers/sra-tools:2.10.7--pl526haddd2b5_0
FASTQC_IMAGE=biocontainers/fastqc:v0.11.5_cv2
MULTIQC_IMAGE=maxulysse/multiqc:2.0.0
# TRIMMOMATIC_IMAGE=fjukstad/trimmomatic
Expand All @@ -302,10 +308,13 @@ if [[ "$RUNINDOCKER" -eq "1" ]]; then
$DOCKER pull $PIGZ_IMAGE

COWSAY="$DRUN $COWSAY_IMAGE $COWSAY"
PREFETCH="$DRUN -v $PWD:/root/ncbi/public/sra $SRA_TOOLKIT_IMAGE $PREFETCH"
FASTQ_DUMP="$DRUN $SRA_TOOLKIT_IMAGE $FASTQ_DUMP"
FASTERQ_DUMP="$DRUN $SRA_TOOLKIT_IMAGE $FASTERQ_DUMP"
FASTQC="$DRUN $FASTQC_IMAGE $FASTQC"
# PREFETCH="$DRUN -v $PWD:/root/ncbi/public/sra $SRA_TOOLKIT_IMAGE $PREFETCH"
# FASTQ_DUMP="$DRUN $SRA_TOOLKIT_IMAGE $FASTQ_DUMP"
FASTQ_DUMP="$FASTQ_DUMP"
# FASTERQ_DUMP="$DRUN $SRA_TOOLKIT_IMAGE $FASTERQ_DUMP"
# FASTQC="$DRUN $FASTQC_IMAGE $FASTQC"
FASTQ_DUMP="$FASTQ_DUMP"
FASTERQ_DUMP="$FASTERQ_DUMP"
MULTIQC="$DRUN $MULTIQC_IMAGE $MULTIQC"
# TRIMMOMATIC="$DRUN $TRIMMOMATIC_IMAGE $TRIMMOMATIC"
# TRIMMOMATIC="$DRUN $TRIMMOMATIC_IMAGE " # fjukstad/trimmomaticのentrypointのため
Expand Down Expand Up @@ -386,7 +395,7 @@ EOF

if [ $IF_FASTQ = false ]; then
# fastq_dump
for i in `tail -n +2 $EX_MATRIX_FILE`
for i in `tail -n +2 $EX_MATRIX_FILE | tr -d '\r'`
do
name=`echo $i | cut -d, -f1`
SRR=`echo $i | cut -d, -f2`
Expand Down Expand Up @@ -450,8 +459,16 @@ if [[ ! -f "multiqc_report_raw_reads.html" ]]; then
$MULTIQC -n multiqc_report_raw_reads.html .
fi

# determin threads for trim galore.
# the sweet spot for TG is 4
if [ $THREADS -gt 4 ] ; then
THREADS_TRIMGALORE=4
else
THREADS_TRIMGALORE=$THREADS
fi


for i in `tail -n +2 $EX_MATRIX_FILE`
for i in `tail -n +2 $EX_MATRIX_FILE | tr -d '\r'`
do
if [ $IF_FASTQ = false ]; then
# fasterq_dump
Expand Down Expand Up @@ -489,7 +506,7 @@ do
fi

if [[ ! -f "${dirname_fq}${SRR}_trimmed.fq.gz" ]]; then
$TRIMGALORE ${dirname_fq}${SRR}.fastq.gz
$TRIMGALORE --cores ${THREADS_TRIMGALORE} ${dirname_fq}${SRR}.fastq.gz
fi

# fastqc
Expand All @@ -501,7 +518,7 @@ do
else
# trimmomatic
if [[ ! -f "${dirname_fq}${SRR}_1_val_1.fq.gz" ]]; then
$TRIMGALORE --paired ${dirname_fq}${SRR}${SUFFIX_PE_1} ${dirname_fq}${SRR}${SUFFIX_PE_2}
$TRIMGALORE --cores ${THREADS_TRIMGALORE} --paired ${dirname_fq}${SRR}${SUFFIX_PE_1} ${dirname_fq}${SRR}${SUFFIX_PE_2}
fi

# fastqc
Expand All @@ -527,7 +544,7 @@ if [[ ! -d "$SALMON_INDEX" ]]; then
$SALMON index --threads $THREADS --transcripts $REF_TRANSCRIPT --index $SALMON_INDEX --type quasi -k 31 --gencode
fi

for i in `tail -n +2 $EX_MATRIX_FILE`
for i in `tail -n +2 $EX_MATRIX_FILE | tr -d '\r'`
do
if [ $IF_FASTQ = false ]; then
# fasterq_dump
Expand Down
Binary file modified test/.DS_Store
Binary file not shown.
18 changes: 9 additions & 9 deletions test/Illumina_SE/Illumina_SE_SRR.csv
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name,SRR,LayoutFoxp3,CNS2
Foxp3high_CNS2KO_1,SRR1269713,SE,Foxp3high,CNS2KO
Foxp3low_CNS2KO_1,SRR1269714,SE,Foxp3low,CNS2KO
Foxp3high_WT_1,SRR1269715,SE,Foxp3high,WT
Foxp3low_WT_1,SRR1269716,SE,Foxp3low,WT
Foxp3high_WT_2,SRR1269717,SE,Foxp3high,WT
Foxp3low_WT_2,SRR1269718,SE,Foxp3low,WT
Foxp3low_CNS2KO_2,SRR1269712,SE,Foxp3low,CNS2KO
Foxp3high_CNS2KO_2,SRR1269711,SE,Foxp3high,CNS2KO
name,SRR,Layout
Foxp3high_CNS2KO_1,SRR1269713,SE
Foxp3low_CNS2KO_1,SRR1269714,SE
Foxp3high_WT_1,SRR1269715,SE
Foxp3low_WT_1,SRR1269716,SE
Foxp3high_WT_2,SRR1269717,SE
Foxp3low_WT_2,SRR1269718,SE
Foxp3low_CNS2KO_2,SRR1269712,SE
Foxp3high_CNS2KO_2,SRR1269711,SE

0 comments on commit a91cf5a

Please sign in to comment.