Releases: shenwei356/taxonkit
Releases · shenwei356/taxonkit
TaxonKit v0.13.0
Changes
- TaxonKit v0.13.0
taxonkit reformat
:- add a new placeholder
{K}
for rankkingdom
. #64 - do not panic for invalid TaxIds, e.g., the column name, when using
-I--taxid-field
.
- add a new placeholder
taxonkit create-taxdump
:- fix merged.dmp and delnodes.dmp. Thanks to @apcamargo ! gtdb-taxdump/issues/2.
- fix bug of handling non-GTDB data when using
-A/--field-accession
and no rank names given:
the colname of the accession column would be treated as one of the ranks, which messed up all the ranks. - fix the default option value of
--field-accession-re
which wrongly remove prefix likeSp_
. #65
taxonkit list
:- fix warning message of merged taxids.
TaxonKit v0.12.0
Changes
- TaxonKit v0.12.0
taxonkit create-taxdump
:- accepts arbitrary ranks #60
- better handle of taxa with same names.
- many flags changed.
TaxonKit v0.12.0-alpha
Changes
taxonkit create-taxdump
:- accepts arbitrary ranks #60
- better handle of taxa with same names.
- many flags changed.
TaxonKit v0.11.1
Changes
- TaxonKit v0.11.1
taxonkit create-taxdump
: fix bug of missing Class rank, contributed by @apcamargo. The flag--gtdb
was not effected. #57
TaxonKit v0.11.0
- TaxonKit v0.11.0
- new command
taxonkit create-taxdump
: Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB and ICTV. #56
- new command
v0.11.0-alpha
Changes
- new command
taxonkit create-taxdump
: Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB. #56
Usage:
Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB
Input format:
0. For GTDB taxonomy file, just use --gtdb
1. The input file should be tab-delimited
2. At least one column is needed, please specify the filed index:
1) Kingdom/Superkingdom/Domain, -K/--field-kingdom
2) Phylum, -P/--field-phylum
3) Class, -C/--field-class
4) Order, -O/--field-order
5) Family, -F/--field-family
6) Genus, -G/--field-genus
7) Species (needed), -S/--field-species
8) Subspecies, -T/--field-subspecies
For GTDB, we use the assembly accession (without version number).
3. The column containing the genome/assembly accession is recommended to
generate TaxId mapping file (taxid.map, id -> taxid).
-A/--field-accession, field contaning genome/assembly accession
--field-accession-re, regular expression to extract the accession
Attentions:
1. Names should be distinct in taxa of different rank.
But for these missing some taxon nodes, using names of parent nodes is allowed:
GB_GCA_018897955.1 d__Archaea;p__EX4484-52;c__EX4484-52;o__EX4484-52;f__LFW-46;g__LFW-46;s__LFW-46 sp018897155
It can also detect duplicate names with different ranks, e.g.,
The Class and Genus have the same name B47-G6, and the Order and Family between them have different names.
In this case, we reassign a new TaxId by increasing the TaxId until it being distinct.
GB_GCA_003663585.1 d__Archaea;p__Thermoplasmatota;c__B47-G6;o__B47-G6B;f__47-G6;g__B47-G6;s__B47-G6 sp003663585
Usage:
taxonkit create-taxdump [flags]
Flags:
-A, --field-accession int field index of assembly accession (genome ID), for outputting taxid.map
--field-accession-re string regular expression to extract assembly accession (default
"^\\w\\w_(.+)$")
-C, --field-class int field index of class
-F, --field-family int field index of family
-G, --field-genus int field index of genus
-K, --field-kingdom int field index of kingdom
-O, --field-order int field index of order
-P, --field-phylum int field index of phylum
-S, --field-species int field index of species (needed)
-T, --field-subspecies int field index of subspecies
--force overwrite existed output directory
--gtdb input files are GTDB taxonomy file
--gtdb-re-subs string regular expression to extract assembly accession as the subspecies
(default "^\\w\\w_GC[AF]_(.+)\\.\\d+$")
-h, --help help for create-taxdump
--line-chunk-size int number of lines to process for each thread, and 4 threads is fast
enough. (default 5000)
--null strings null value of taxa (default [,NULL,NA])
-x, --old-taxdump-dir string taxdump directory of older version
--out-dir string output directory
--rank-names strings names of the 8 ranks, order maters (default
[superkingdom,phylum,class,order,family,genus,species,no rank])
TaxonKit v0.10.1
Changes
- TaxonKit v0.10.1
taxonkit cami2-filter
: fix option--show-rank
which did not work in v0.10.0.
TaxonKit v0.10.0
Changes
-
- new command
taxonkit cami2-filter
: Remove taxa of given TaxIds and their descendants in CAMI metagenomic profile taxonkit reformat
: fix panic for deleted taxid using-F/--fill-miss-rank
. #55
- new command
TaxonKit v0.9.0
Changes
- TaxonKit v0.9.0
- new command
taxonkit profile2cami
: converting metagenomic profile table to CAMI format
- new command