-
Notifications
You must be signed in to change notification settings - Fork 1
Binning Marmics2018
During the assembly session we have produced a mapping file with all reads aligned to our assembly. We also have created a file that contains our coverage statistics.
Visualization of your assembly helps you to understand how complex your community is and to get an overview of the basic parameters that might help you to tease apart your genomes of interest
1. Coverage-GC plot
To visualize the assembly we have created in a coverage-GC plot, we will use gblite.
--> you need the coverage statistics file that you have produced 909_A.covstats.txt
2. Contig connectivity
Visualize assembly in Bandage to see which contigs are connected to each other. Read pair information is used to connect two contigs to each other.
Open program Bandage and import file assembly-graph.fastg, then click draw graph. The input file is automatically produced by the spades assembler
Example for library 909_A:

Automated binning with MetaBat
MetaBat uses tetranucleotide frequencies and abundance information (as inferred from read coverage of each contig) to automatically group your contigs into separate bins.
0. Create links to your input files
ln -s path-to/909_A.bam ./909_A.bam
ln -s path-to/scaffolds.fasta ./909_A.fasta
1. create depth file:
This file provides coverage information of each contig to the binning tool. To create this file run the following 3 commands:
samtools sort 909_A.bam -o 909_A.sort.bam
samtools index 909_A.sort.bam
jgi_summarize_bam_contig_depths --outputDepth 909_A.depth.txt 909_A.sort.bam
--outputDepth specifies the output file with coverage information needed for running metabat in step 2
2. run metabat:
To run the binning tool, using the depth file 909_A.depth.txt that you have created in step 1, run:
metabat2 -i 909_A.fasta -a 909_A.depth.txt -o 909_A.metabat -t 10 -m 1500 -v --unbinned
-i your input assembly file
-o your output identifier
-t number of CPUs
-m minimum contig length
-v verbose mode
3. extract contig names from created bin for visualization with gbtools
Now we want to see which contigs were grouped together into separate bins by MetaBat. For the visualization in gbtools we need a list of contigs that belong to each bin. Run the following command on each of your metabat bins to create this list.
grep ">" 909_A.metabat.1.fa | sed 's/>//g' > 909_A.metabat.1.contigNames
Before you do anything you need to delete the # in the beginning of your covstats file:
nano 909_A.covstats.txt
See gbtools for details
Log into R
Rstudio or R alternatively
Then install required libraries:
install.packages("/home/ransorge/teaching/R_packages/Rcpp_0.12.15.tar.gz",repos=NULL,type="source")
install.packages("/home/ransorge/teaching/R_packages/ply_1.8.4.tar.gz",repos=NULL,type="source")
install.packages("/home/ransorge/teaching/R_packages/sp_1.2-7.tar.gz",repos=NULL,type="source")
install.packages("/home/ransorge/programs/genome-bin-tools/R_source_package/gbtools_2.5.8.tar.gz",repos=NULL,type="source")
Now you you are ready to plot your genome:
open gbtools.Rmd in Rstudio
or, if working in R console:
To load required libraries:
library(plyr)
library(sp)
library(gbtools)
Load coverage-GC file
d <- gbt(covstats="/home/ransorge/teaching/909_A.covstats.txt")
Plot coverage-GC:
plot(d)
Import and visualize bins:
bin5.contigNames <- scan(file="/home/ransorge/teaching/909_A.metabat.1.contigNames",what=character())
d.bin5 <- gbtbin(shortlist=bin5.contigNames,x=d,slice=NA)
points(d.bin5,col="green",slice=1)
Save plot as picture
png("PATH-TO/bin.909_A.png")
plot(d)
points(d.bin5,col="green",slice=1)
dev.off()
OPTIONAL: manual binning with gbtools
Follow data import as before into R or Rstudio.
d <- gbt(covstats="/home/ransorge/teaching/909_A.covstats.txt")
plot(d)
for bin selection run the following command and select bin by manual clicking around the region of choise in coverage-GC plot:
bin1 <- choosebin(d, slice=1, save=TRUE, file="bin1.909_A.list")
Example how chosen bin can look:

Further options included in gbtools wiki to improve binning Use PhylaAmphora to assign marker genes extract 16S rRNA info
Binning and visualization
Tools for automated binning