Skip to content

6. Modification

jaclew edited this page Nov 21, 2023 · 2 revisions

Step-by-Step Guide to Modifying FlexTaxD Database

Modifications to your FlexTaxD database can refine your data to incorporate improved taxonomic resolutions or alternative taxonomies such as GTDB. Here's a detailed guide on how to integrate these changes.

Modifying Through Database Merge

This process is useful when you want to integrate taxonomic frameworks from two different sources, like NCBI and GTDB.

Steps for Database Merge:

  1. Copy Original Database: Create a copy of the original NCBI database which will serve as the base for modifications.

    cp ncbi.fdb ncbi_gtdbBacAr.fdb
  2. Clean Original Database: Remove non-essential nodes to prevent conflicts with GTDB taxonomy.

    flextaxd --db ncbi_gtdbBacAr.fdb --clean
  3. Merge GTDB Bacteria: Integrate GTDB Bacteria taxonomy, replacing corresponding nodes in the NCBI taxonomy.

    flextaxd --db ncbi_gtdbBacAr.fdb --mod_database gtdb_bac120.fdb --parent Bacteria --replace
  4. Merge GTDB Archaea: Integrate GTDB Archaea taxonomy in a similar manner.

    flextaxd --db ncbi_gtdbBacAr.fdb --mod_database gtdb_ar53.fdb --parent Archaea --replace

Modifying Using Taxonomy Files

This method is handy when expanding or updating specific branches of the taxonomy based on new data or refined classifications.

Steps for Taxonomy File Modification:

  1. Copy Database for Modification: As with the merge, start by copying the original database.

    cp francisellaceae.fdb francisellaceae_tularensis.fdb
  2. Import New Taxonomy: Replace the current taxonomy for "Francisella tularensis" with new data.

    • Using GTDB Example:

      flextaxd --db francisellaceae_tularensis.fdb --mod_file ftd.tree2tax.tul.tsv --genomeid2taxid genomes_map.tul.tsv --parent "Francisella tularensis" --replace
    • Using FTD/CanSNP Example:

      flextaxd --db francisellaceae_tularensis.fdb --mod_file ftd.tree2tax.tul.tsv --genomeid2taxid genomes_map.tul.tsv --parent "Francisellaceae_Francisella_tularensis_GCF_000008985.1" --replace

Visualizing Modifications

To confirm the changes, visualize the Fdb before and after modification.

  • Before Modification:

    flextaxd --db francisellaceae.fdb --vis_type tree --vis_node Francisellaceae --vis_depth 0 --vis_label_size 8

francisellaceae

  • After Modification:

    flextaxd --db francisellaceae_tularensis.fdb --vis_type tree --vis_node Francisellaceae --vis_depth 0 --vis_label_size 8

francisellaceae_tularensis

By following these visualization commands, you should be able to graphically confirm that your database now includes the expanded taxonomy of "Francisella tularensis".

When executing these steps, it's crucial to always verify that the paths and filenames correspond to your actual files and directory structure. Adjust the commands accordingly if your setup differs. This ensures that your modifications are correctly applied and reflected in the database.