Recommended threshold for phrog_model_score filtering in VPF-PLM annotations

Thank you for providing this excellent tool for viral protein annotation. I am currently using VPF-PLM to annotate viral ORFs predicted from metagenomic data, and the model is performing well overall.

However, I have a question regarding how to properly interpret and filter the phrog_model_score values when assigning functional categories.

For many proteins, VPF-PLM assigns accurate PHROG categories with high confidence (e.g., phrog_model_score > 0.9).
But I also found a considerable number of cases where:

The protein is independently annotated by other gene callers (e.g., PHANOTATE /pharokka HMM hits) as a tail / tail fiber / tail spike protein.

VPF-PLM assigns the same category (phrog_model = tail), but the phrog_model_score is relatively low (e.g., ~0.3–0.5).

This raises the question of how users should set a reasonable score threshold for confidence filtering.

Questions

Is there a recommended cutoff for accepting PHROG functional predictions based on phrog_model_score?
For example, is 0.7 or 0.9 suggested for high-confidence functional assignment?

How should one interpret cases where VPF-PLM assigns the correct PHROG class (e.g., tail), but with a moderate score (e.g., ~0.4)?
Should these be treated as valid predictions with lower confidence, or should they be filtered out?

Thank you very much for your time and for developing VPF-PLM.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended threshold for phrog_model_score filtering in VPF-PLM annotations #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recommended threshold for phrog_model_score filtering in VPF-PLM annotations #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions