Skip to content

Recommended threshold for phrog_model_score filtering in VPF-PLM annotations #14

@xhuang495

Description

@xhuang495

Thank you for providing this excellent tool for viral protein annotation. I am currently using VPF-PLM to annotate viral ORFs predicted from metagenomic data, and the model is performing well overall.

However, I have a question regarding how to properly interpret and filter the phrog_model_score values when assigning functional categories.

For many proteins, VPF-PLM assigns accurate PHROG categories with high confidence (e.g., phrog_model_score > 0.9).
But I also found a considerable number of cases where:

The protein is independently annotated by other gene callers (e.g., PHANOTATE /pharokka HMM hits) as a tail / tail fiber / tail spike protein.

VPF-PLM assigns the same category (phrog_model = tail), but the phrog_model_score is relatively low (e.g., ~0.3–0.5).

This raises the question of how users should set a reasonable score threshold for confidence filtering.

Questions

Is there a recommended cutoff for accepting PHROG functional predictions based on phrog_model_score?
For example, is 0.7 or 0.9 suggested for high-confidence functional assignment?

How should one interpret cases where VPF-PLM assigns the correct PHROG class (e.g., tail), but with a moderate score (e.g., ~0.4)?
Should these be treated as valid predictions with lower confidence, or should they be filtered out?

Thank you very much for your time and for developing VPF-PLM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions