Skip to content

2.3. Automated Pipeline Starting with raw counts

Breon Schmidt edited this page Jan 21, 2021 · 3 revisions

CURRENTLY IN PROGRESS

Got some raw counts files from htseq-counts/feature counts/star counts? Given how diverse counts can be generated, e.g. what features are counted, hg19 or hg38 aligned, gene names, gene presence, etc. It is preferable that you use either the FASTQ/FASTA or BAM workflows.

But sometimes you just have some old hg19 aligned files kicking around. Perhaps they're hg38 and you just want to try it anyway. Provided ALLSorts receives all 20625 required genes and they are counted from hg19 aligned files... it will probably work OK. However, outside of that, your results might be a bit sketchy. Though, you might find it useful, so why not?

FOR ALREADY GENERATED COUNTS MATRIX, GO TO THE MANUAL EXECUTION*

Before We Begin

ALLSorts has been installed

Just follow the instructions https://github.com/Oshlack/ALLSorts/wiki.

Counts must be in this format

Ideally the counts will reflect ftp://ftp.ensembl.org/pub/grch37/current/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.chr.gtf.gz. Don't worry about the ensembl naming, that will be converted through this ALLSorts pipeline into gene names.

FOR ALREADY GENERATED COUNTS MATRIX, GO TO THE MANUAL EXECUTION*

Running ALLSorts starting with raw counts files

Ok, ALLSorts has been installed? Raw counts in the correct format?

ALLSorts can be run with this script, note the parameter descriptions below: bpipe -p results=$results -p strand=$strand -p type=$type _$COUNTSDIR/counts.groovy_ $counts

Parameters

Feel free to make these environment variables (I tend to) or just directly insert them into the command line snippet above.

$results = /path/to/desired/output

$type = "counts"

$strand = "yes" or "no" or "reverse" # No and Reverse will be the two most used (no = unstranded, reverse = stranded typically)

$COUNTSDIR/counts.groovy should be the path /your/allsorts/clone/path/tools/counts/counts.groovy

$counts - the path to your counts files. Can be something as simple as /path/to/counts/*.txt.

Still in testing

We're still testing this functionality, please report any issues!