TABSAT - Targeted Amplicon Bisulfite Sequencing Analysis Tool - is a tool for analyzing targeted bisulfite sequencing data generated on an Ion Torrent PGM / Illumina MiSeq. It performs
- Quality Assessment
- Alignment using Bismark
- Result aggregation into a table
- Visualization as lollipop plots
Available as
- Fully configured Docker image Dockerfile - see usage information below.
- Source code
Please contact us if you need help running your analyses. Also we have developed an extended version for our collaborators with the following additional features:
- Interactive web-based visualization
- Download FASTA of target regions
- Strand specific CpGs
- Automatic mapping of primers
- Restriction enzyme positions
- Start using web frontend
- Pattern visualization and analysis
TABSAT is published:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0160227
${TABSAT} -l NONDIR -g hg19 -q 20 -m 10 -p 0.8 -r 0 -t target.csv -a tmap -o output_dir input.fastq
-t Targetlist in CSV format example [mandatory] - Strand can be "+", "-", "+/-"
-e Sequencing library - SE/PE (PE reads must be called *_1.fastq, *_2.fastq)
-g Genome (hg19, mm10)
-l Library mode of bisulfite experiment
-a [optional] Specify the aligner that should be used
-m [optional] This parameter is used for filtering reads that are shorter than the given threshold.
-q [optional] Bases that are below the given threshold are removed from the 3’ end of the reads (read trimming)
-p [optional] Percent of target covered by a read for pattern creation. This value specifies the percent of the target that needs to be covered by a read to include it for pattern analysis.
-r: [optional] Minimum number of mapped reads that need to be present at each CpG site.
-s: [optional] Sorted list of samples that is used to specify the order in the lollipop plots.
-o Output directory
-d Directory of inputfiles (absolute path); if not specified, the input files are added at the end [optional]
Test with input file directory
tabsat -l NONDIR -g hg19 -t target.csv -d test_input_dir -a tmap -o test_output_dir
Test with separate input files
tabsat -l NONDIR -g hg19 -t target.csv -o test_output_files xy.fastq abs.fastq
Test data is available here
- Check out the project (git clone)
- Download the reference genome
- Human
- Broad: ftp://[email protected]/bundle/2.8/hg19/ucsc.hg19.fasta.gz
- ENSEMBL: ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
- NCBI: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p13/seqs_for_alignment_pipelines/GCA_000001405.14_GRCh37.p13_no_alt_analysis_set.fna.gz
- Mouse
- Put the reference genome file into the correct folder
- Human
tabsat/reference/human/hg19/hg19.fasta - Mouse
tabsat/reference/mouse/mm10/mm10.fasta
- Human
- Prepare the reference genome
$ tabsat/reference/prepareReference.sh
- Prepare the CpG file
apt-get install p7zip-full
7za e tabsat/tools/ait/all_cpgs_only_pos_hg19.7z
7za e tabsat/tools/ait/all_cpgs_only_pos_mm10.7z
- Install Perl modules
- Cairo.pm
- Switch.pm
- Run 'install' script in tabsat folder (installs SAMtools, Bedtools)
./install
- After installation go to tabsat/tools/zz_test
- Execute
./test_tabsat.sh
- Inspect output at tabsat/tabsat_test_output
-
Build the docker file
docker build -t tabsat:v1 .
-
Run it
docker run -t --name tabsat -d tabsat:v1
-
Connect to docker
docker exec -ti tabsat /bin/bash
-
Stop container
docker stop tabsat
-
Remove container
docker rm tabsat
-
Remove image
docker rmi tabsat:v1