Skip to content

Feature request: change the option "--top_hits_only" to "--top_N_hits_only" #592

@tao-bioinfo

Description

@tao-bioinfo

using --top_hits_only in usearch_global is very dangerous since there exists taxonomical mis-annotations in the reference database.

For example, if the identity of hit A is 99.124% while B is 99.123%, the option --top_hits_only will only keep hit A. However I have encountered frequently that A is taxonomical mis-labelled while B seems correct.

Currently, my strategy is to set a low identity threshold such as --id 0.6 to obtain as many hits as possible, and then select the top-N-hits. The remaining hits are useless.

I would be glad if there is an option called --top_N_hits_only N, while the conventional --top_hits_only is equivalent to --top_N_hits_only 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions