Skip to content

[mergeMSTs] Problems with mst and query #24

Open
@eseiler

Description

@eseiler

Hey there,

While using the mergeMSTs branch, I ran into some trouble with mst and query.

mst

mantis mst doesn't seem to work.

It wants to load eqclass_rr.cls files:

mantis/src/mst.cc

Lines 33 to 34 in 7406e8f

eqclass_files =
mantis::fs::GetFilesExt(prefix.c_str(), mantis::EQCLASS_FILE);

This will later lead to a segmentation fault because the files do not exist.

mantis build will always delete eqclass_rr.cls files at the end:

mantis/src/mst.cc

Lines 729 to 737 in 7406e8f

if (opt.remove_colorClasses && !opt.keep_colorclasses) {
for (auto &f : mantis::fs::GetFilesExt(opt.prefix.c_str(), mantis::EQCLASS_FILE)) {
std::cerr << f.c_str() << "\n";
if (std::remove(f.c_str()) != 0) {
std::cerr << "Unable to delete file " << f << "\n";
std::exit(1);
}
}
}

mantis build doesn't have an option to toggle this behavior.
Changing qopt.remove_colorClasses = true; to qopt.remove_colorClasses = false; here, fixes the issue:

qopt.prefix = bopt.out; qopt.numThreads = bopt.numthreads; qopt.remove_colorClasses = true;

query

The default non-bulk query only works if the eqclass_rr.cls files are present and -1 is used:

mantis query -1 -k 20 -p index/ reads.fasta

To have eqclass_rr.cls files, the above fix is needed, and mst must have been run with -k.

Alternatively, bulk-mode (-b) works without the eqclass_rr.cls files. So, mst can also be run with -d.

mantis query -b -k 20 -p index/ reads.fasta

The problem in non-bulk query seems to be that findSamples is called for every query sequence:

mantis/src/mstQuery.cc

Lines 492 to 498 in 7406e8f

while (ipfile >> read) {
mstQuery.reset();
mstQuery.parseKmers(numOfQueries, read, indexK);
mstQuery.findSamples(cdbg, cache_lru, &rs, queryStats, 1);
output_results(mstQuery, opfile, sampleNames, queryStats, 1);
numOfQueries++;
}

The function then accesses cdbg.get_current_cqf()->keybits():

uint64_t ksize{cdbg.get_current_cqf()->keybits()}, numBlocks{cdbg.get_numBlocks()};

This works fine for the first query, but for the second one there is no CQF to access because it has been replaced with
an invalid one:

cdbg.replaceCQFInMemory(invalid);

I tried loading the first block 0 at the begin of findSamples and just passing the keybits as an extra parameter.
But then there is an out-of-bounds access at

allQueries[q][numSamples]++;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions