Skip to content

Binning Marmics2018

Rebecca edited this page Feb 12, 2018 · 37 revisions

During the assembly session we have produced a mapping file with all reads aligned to our assembly. We also have created a file that contains our coverage statistics.

Visualization of your assembly

Visualization of your assembly helps you to understand how complex your community is and to get an overview of the basic parameters that might help you to tease apart your genomes of interest

1. Coverage-GC plot

To visualize the assembly we have created in a coverage-GC plot, we will use gblite. --> you need the coverage statistics file that you have produced 909_A.covstats.txt

2. Contig connectivity

Visualize assembly in Bandage to see which contigs are connected to each other. Read pair information is used to connect two contigs to each other. Open program Bandage and import file assembly-graph.fastg, then click draw graph. The input file is automatically produced by the spades assembler Example for library 909_A: assembly graph library 909_A

Automated binning with MetaBat

MetaBat uses tetranucleotide frequencies and abundance information (as inferred from read coverage of each contig) to automatically group your contigs into separate bins.

0. Create links to your input files

ln -s path-to/909_A.bam ./909_A.bam

ln -s path-to/scaffolds.fasta ./909_A.fasta

1. create depth file:

This file provides coverage information of each contig to the binning tool. To create this file run the following 3 commands:

samtools sort 909_A.bam -o 909_A.sort.bam

samtools index 909_A.sort.bam

jgi_summarize_bam_contig_depths --outputDepth 909_A.depth.txt 909_A.sort.bam

--outputDepth specifies the output file with coverage information needed for running metabat in step 2

2. run metabat:

To run the binning tool, using the depth file 909_A.depth.txt that you have created in step 1, run:

metabat2 -i 909_A.fasta -a 909_A.depth.txt -o 909_A.metabat -t 10 -m 1500 -v --unbinned

-i your input assembly file

-o your output identifier

-t number of CPUs

-m minimum contig length

-v verbose mode

3. extract contig names from created bin for visualization with gbtools

Now we want to see which contigs were grouped together into separate bins by MetaBat. For the visualization in gbtools we need a list of contigs that belong to each bin. Run the following command on each of your metabat bins to create this list.

grep ">" 909_A.metabat.1.fa | sed 's/>//g' > 909_A.metabat.1.contigNames

Visualization of bin with gbtools

Before you do anything you need to delete the # in the beginning of your covstats file:

nano 909_A.covstats.txt

See gbtools for details

Log into R

Rstudio or R alternatively

Then install required libraries:

install.packages("/home/ransorge/teaching/R_packages/Rcpp_0.12.15.tar.gz",repos=NULL,type="source")

install.packages("/home/ransorge/teaching/R_packages/ply_1.8.4.tar.gz",repos=NULL,type="source")

install.packages("/home/ransorge/teaching/R_packages/sp_1.2-7.tar.gz",repos=NULL,type="source")

install.packages("/home/ransorge/programs/genome-bin-tools/R_source_package/gbtools_2.5.8.tar.gz",repos=NULL,type="source")

Now you you are ready to plot your genome:

open gbtools.Rmd in Rstudio

or, if working in R console:

To load required libraries:

library(plyr)

library(sp)

library(gbtools)

Load coverage-GC file

d <- gbt(covstats="/home/ransorge/teaching/909_A.covstats.txt")

Plot coverage-GC:

plot(d)

Import and visualize bins:

bin5.contigNames <- scan(file="/home/ransorge/teaching/909_A.metabat.1.contigNames",what=character())

d.bin5 <- gbtbin(shortlist=bin5.contigNames,x=d,slice=NA)

points(d.bin5,col="green",slice=1)

Save plot as picture

png("PATH-TO/bin.909_A.png")

plot(d)

points(d.bin5,col="green",slice=1)

dev.off()

OPTIONAL: manual binning with gbtools

Follow data import as before into R or Rstudio.

d <- gbt(covstats="/home/ransorge/teaching/909_A.covstats.txt")

plot(d)

for bin selection run the following command and select bin by manual clicking around the region of choise in coverage-GC plot:

bin1 <- choosebin(d, slice=1, save=TRUE, file="bin1.909_A.list")

Example how chosen bin can look:

Further options included in gbtools wiki to improve binning Use PhylaAmphora to assign marker genes extract 16S rRNA info

Other binning resources

Binning and visualization

Blobology

Differential coverage binning

gbtools

Tools for automated binning

MetaBat

MetaWatt

GroopM

ESOM

Clone this wiki locally