-
Notifications
You must be signed in to change notification settings - Fork 6
PanPhlAn download pangenome 3_0
Pangenomes are build for species for which at least 2 reference genomes are available. These files are available on this DropBox. They can also be easily downloaded using the panphlan_download_pangenome.py
script.
Example:
panphlan_download_pangenome.py -i Eubacterium_rectale
-
-i INPUT_NAME
the name of a species
The tar.bz2
archive is downloaded if available and uncompressed at the location given by the --output
argument. If none is provided, the pangenome folder will be created in the local directory
The retrieved folder contains the pangenome contigs in a multi-FASTA .fna
file, the bowtie2 indexes, an annotation .tsv
file mapping gene families (UniRef) to GO, KO, KEGG, Pfam, eggNOG... and a pangenome .tsv
file containing all information needed to map the genes to the sequences.
The organization of this last file is UniRef90 cluster IDs, gene ID, genome ID, contig ID, start position, stop position
Some pangenome of the database might have a problem of duplicated sequences leading to an error raised during the mapping step :
[E::sam_hrecs_update_hashes] Duplicate entry “XXX” in sam header".
In these cases, better check the duplication comparing wc -l [species_name]_pangenome_contigs.fna
and sort species_name_pangenome_contigs.fna | uniq | wc -l
that should give roughly half of the previous number.
Then just cut the fna file in half (using for example head -[half of the lines in the fna file] [species_name]_pangenome_contigs.fna > new_contigs.fna
) and then regenerate the indexes using bowtie2-build.
usage: panphlan_download_pangenome.py [-h] [-i INPUT_NAME] [-o OUTPUT] [-v]
optional arguments:
-h, --help show this help message and exit
-i INPUT_NAME, --input_name INPUT_NAME
-o OUTPUT, --output OUTPUT
-v, --verbose Show progress information
PanPhlAn is a project of the Computational Metagenomics Lab at CIBIO, University of Trento, Italy.
- PanPhlAn 3.0
- PanPhlAn 1.3