TTT stands for Trivial Tangle Traverser. This tool generates "not terrible" traversals through repetitive genomic tangles that somehow matches coverage and the read alignment.
For help run ./TTT.py --help
Requires python ≥ 3.7 and dataclasses, pulp, ahocorasick, networkx, statistics, and logging python libraries.
Slides explaining algorithmic details
UNDER CONSTRUCTION!
./TTT.py --graph assembly.gfa --alignment reads.gaf --output results_dir --boundary-nodes boundary_nodes.tsv --quality-threshold 20Will TTT help with a gap in my scaffold?
Generally there are three main reasons for gaps in a scaffold:
-
Lack of coverage
TTT searches for the "best" path in the assembly graph that traverses the gap. If there's no path because of the coverage gap — nothing can be done.
Scaffold <utig4-1497[N100000N:scaffold]<utig4-340 — nothing can be done
-
Long homozygous nodes
Such gaps happen because of the read length being shorter than homozygous nodes. Typical structure looks like a sequence of "bubbles" of similar length, interlaced with long homozygous nodes. TTT can be run on such tangles. But usually if those structures left unresolved in the assembly graph (especially if homozygous nodes are longer than ~100kbp homopolymer-compressed) then there's just no information in the read alignments helping to traverse this region, and thus it will be essentially a random guess.
Scaffolds <utig4-1225<utig4-1224[N5000N:ambig_bubble]>utig4-1511<utig4-1513 and <utig4-1226<utig4-1224[N5000N:ambig_bubble]>utig4-1511<utig4-1512. Because of long homozygous nodes utig4-1224 and utig4-1511 there's just no long reads connecting utig4-1228/utig4-1227 with utig4-1225/utig4-1226 or utig4-1512/utig4-1513. TTT will make a random guess, but so can you
-
Complex repeats
TTT was designed for such cases. However there can be no more than 2 haplotypes in the tangle (so rDNA tangles connecting multiple chromosomes are usually unresolvable). Also TTT does not scaffold so you need to know how to pair incoming and outgoing nodes for two haplotype cases.
Gap caused by repeat array
Gap caused by large duplication of homozygous region, present in one of the haplotypes
--graph: Path to the GFA file with the graph structure--alignment: Path to a file with GraphAligner alignment
Instead of those two options one can use --verkko-output <verkko output directory> . In this case internal verkko files for HiFi graph, coverage (ONT) and ONT alignments would be used.
-
--outdirOutput directory -
--boundary-nodes <boundary_nodes_file>to locate tangle.boundary_nodes_fileshould contain tab separated pairs of incoming and outgoing boundary nodes, one pair by line. Also they should be non-repetive and heterozygous in case of 'diploid' tangles. Boundary nodes should completely separate the tangle from the rest of the graph — after their removal there should be no path in remaining graph between tangle nodes and any other non-tangle nodes.
Example
For this tangle decent choice of boundary nodes would be
utig1-10326 utig1-2575
utig1-10327 utig1-2574
TTT outputs two files to the <outdir> — traversal.multiplicities.csv with estimated multiplicities of tangle nodes (can be used with Bandage); traversal.gaf with the resulting path and, if graph .gfa file contained node sequences — traversal.hpc.fasta with a patch sequence. However, when combined with verkko (since verkko's graph is based on homopolymer-compressed sequences), this patch is also homopolymer compressed. To get non-hpc sequence you'll need to rerun verkko providing traversal.gaf with --path option — see verkko's manual for details.
In verkko up to (and including )v2.2.1 coverage of the short nodes in tangles in final graph (assembly.homopolymer-compressed.gfa) is deeply flawed. To get the updated coverage file we suggest to run additional scripts
./verkko_coverage_fix/utig4_to_utig1.py <assembly_folder> > utig42utig1.gaf
./verkko_coverage_fix/utig4_coverage_updater.py utig42utig1.gaf <assembly_folder>/assembly.homopolymer-compressed.noseq.gfa <assembly_folder>/2-processGraph/unitig-unrolled-hifi-resolved.ont-coverage.csv > utig4_upt.ont-coverage.csv
and then pass utig4_upt.ont-coverage.csv as --coverage-file in main script.
Alternatively you can find how utig4- nodes match to the utig1- graph in utig42utig1.gaf and run TTT.py on the same tangle in hifi-only graph (2-processGraph/unitig-unrolled-hifi-resolved.gfa within verkko output directory). Usually this provides better results and does not require realigning ONT reads to graph.



