Skip to content
/ tabsat Public
forked from tadKeys/tabsat

Targeted Amplicon Bisulfite Sequencing Analysis Tool

Notifications You must be signed in to change notification settings

b-niu/tabsat

 
 

Repository files navigation

TABSAT

TABSAT - Targeted Amplicon Bisulfite Sequencing Analysis Tool - is a tool for analyzing targeted bisulfite sequencing data generated on an Ion Torrent PGM / Illumina MiSeq. It performs

  • Quality Assessment
  • Alignment using Bismark
  • Result aggregation into a table
  • Visualization as lollipop plots

Available as

  • Fully configured Docker image Dockerfile - see usage information below.
  • Source code

Collaboration

Please contact us if you need help running your analyses. Also we have developed an extended version for our collaborators with the following additional features:

  • Interactive web-based visualization
  • Download FASTA of target regions
  • Strand specific CpGs
  • Automatic mapping of primers
  • Restriction enzyme positions
  • Start using web frontend
  • Pattern visualization and analysis

Publication

TABSAT is published:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0160227

Example usage

${TABSAT} -l NONDIR -g hg19 -q 20 -m 10 -p 0.8 -r 0 -t target.csv -a tmap -o output_dir input.fastq

-t Targetlist in CSV format example [mandatory] - Strand can be "+", "-", "+/-"
-e Sequencing library - SE/PE (PE reads must be called *_1.fastq, *_2.fastq)
-g Genome (hg19, mm10)
-l Library mode of bisulfite experiment
-a [optional] Specify the aligner that should be used
-m [optional] This parameter is used for filtering reads that are shorter than the given threshold.
-q [optional] Bases that are below the given threshold are removed from the 3’ end of the reads (read trimming)
-p [optional] Percent of target covered by a read for pattern creation. This value specifies the percent of the target that needs to be covered by a read to include it for pattern analysis.
-r: [optional] Minimum number of mapped reads that need to be present at each CpG site.
-s: [optional] Sorted list of samples that is used to specify the order in the lollipop plots.
-o Output directory
-d Directory of inputfiles (absolute path); if not specified, the input files are added at the end [optional]

Examples

Test with input file directory

tabsat -l NONDIR -g hg19 -t target.csv -d test_input_dir -a tmap -o test_output_dir

Test with separate input files

tabsat -l NONDIR -g hg19 -t target.csv -o test_output_files xy.fastq abs.fastq

Test data

Test data is available here

Installation

  • Check out the project (git clone)
  • Download the reference genome
  • Human
    • Broad: ftp://[email protected]/bundle/2.8/hg19/ucsc.hg19.fasta.gz
    • ENSEMBL: ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
    • NCBI: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p13/seqs_for_alignment_pipelines/GCA_000001405.14_GRCh37.p13_no_alt_analysis_set.fna.gz
  • Mouse
  • Put the reference genome file into the correct folder
    • Human
      tabsat/reference/human/hg19/hg19.fasta
    • Mouse
      tabsat/reference/mouse/mm10/mm10.fasta
  • Prepare the reference genome
$ tabsat/reference/prepareReference.sh
  • Prepare the CpG file
apt-get install p7zip-full
7za e tabsat/tools/ait/all_cpgs_only_pos_hg19.7z
7za e tabsat/tools/ait/all_cpgs_only_pos_mm10.7z
  • Install Perl modules
    • Cairo.pm
    • Switch.pm
  • Run 'install' script in tabsat folder (installs SAMtools, Bedtools) ./install

Run example

Command line

  • After installation go to tabsat/tools/zz_test
  • Execute
./test_tabsat.sh
  • Inspect output at tabsat/tabsat_test_output

Docker

  • Build the docker file
    docker build -t tabsat:v1 .

  • Run it
    docker run -t --name tabsat -d tabsat:v1

  • Connect to docker
    docker exec -ti tabsat /bin/bash

  • Stop container
    docker stop tabsat

  • Remove container
    docker rm tabsat

  • Remove image
    docker rmi tabsat:v1

About

Targeted Amplicon Bisulfite Sequencing Analysis Tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 81.2%
  • Python 18.8%