Skip to content

BovinePangenomeConsortium/workflows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bovine Pangenome Consortium workflows

This is the home for pipelines related to the Bovine Pangenome Consortium. This is a work in progress, but reach out if you have any questions!

Assessing assembly quality

We need to get a rough idea of how well assembled these genomes are, but for many samples we only have the assembly itself, not the raw data. As such, we have to rely on some proxy metrics. Some should be self-explanatory, but otherwise here is a short description

  • NG50: N50 measure for contiguity assuming a genome size of 3 GB
  • autosome/X/Y single/duplicated/missing copy: How many cetartiodactyla USCOs do we find in single/duplicated/missing states across the different chromosomes
    • we do this to avoid penalising male haplotype-resolved assemblies which should be missing many X chromosome-specific USCOs
  • SNPs/InDels: calculated from minimap2 -c --cs | paftools.js call using the ARS-UCD2.0 reference genome
  • autosome/X/Y/MT covered: the fraction of each chromosome covered by those minimap2 alignments
    • we do this as a proxy to identify the sex of the assembly where not easily available and identify obvious misassemblies

We can also estimate the average gap-compressed identity of the alignments through cut -f 1,6,13 *.alignment.paf

Pangenome processing

Reference pangenome building

Downstream reference pangenomics

Other pangenomic analyses

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages