-
Notifications
You must be signed in to change notification settings - Fork 6
PanPhlAn download pangenome 3_0
Pangenomes are build for species for which at least 2 reference genomes are available. These files are available on this DropBox. They can also be easily downloaded using the panphlan_download_pangenome.py
script.
Example:
panphlan_download_pangenome.py -i Eubacterium_rectale
-
-i INPUT_NAME
the name of a species
The tar.bz2
archive is downloaded if available and uncompressed at the location given by the --output
argument. If none is provided, the pangenome folder will be created in the local directory.
<<<<<<< HEAD
=======
The retrieved folder contains the pangenome contigs in a multi-FASTA .fna
file, the bowtie2 indexes, an annotation .tsv
file mapping gene families (UniRef) to GO, KO, KEGG, Pfam, eggNOG... and a pangenome .tsv
file containing all information needed to map the genes to the sequences.
The organization of this last file is UniRef90 cluster IDs, gene ID, genome ID, contig ID, start position, stop position
Some pangenome of the database might have a problem of duplicated sequences leading to an error raised during the mapping step :
[E::sam_hrecs_update_hashes] Duplicate entry “XXX” in sam header".
In these cases, better check the duplication comparing wc -l [species_name]_pangenome_contigs.fna
and sort species_name_pangenome_contigs.fna | uniq | wc -l
that should give roughly half of the previous number.
Then just cut the fna file in half (using for example head -[half of the lines in the fna file] [species_name]_pangenome_contigs.fna > new_contigs.fna
) and then regenerate the indexes using bowtie2-build.
5d1e687048f2ec27f54b37851af4eab61b416326
usage: panphlan_download_pangenome.py [-h] -i INPUT_NAME -o OUTPUT [-v] [--retry RETRY] [--wait WAIT]
optional arguments:
-h, --help show this help message and exit
-v, --verbose Show progress information
--retry RETRY Number of retry in pangenome download. Default is 5
--wait WAIT Number of second spend waiting between download retries. Default 60
required arguments:
-i INPUT_NAME, --input_name INPUT_NAME
Name of species to download
-o OUTPUT, --output OUTPUT
output location
PanPhlAn is a project of the Computational Metagenomics Lab at CIBIO, University of Trento, Italy.
- PanPhlAn 3.0
- PanPhlAn 1.3