-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Thank you for providing this excellent tool for viral protein annotation. I am currently using VPF-PLM to annotate viral ORFs predicted from metagenomic data, and the model is performing well overall.
However, I have a question regarding how to properly interpret and filter the phrog_model_score values when assigning functional categories.
For many proteins, VPF-PLM assigns accurate PHROG categories with high confidence (e.g., phrog_model_score > 0.9).
But I also found a considerable number of cases where:
The protein is independently annotated by other gene callers (e.g., PHANOTATE /pharokka HMM hits) as a tail / tail fiber / tail spike protein.
VPF-PLM assigns the same category (phrog_model = tail), but the phrog_model_score is relatively low (e.g., ~0.3–0.5).
This raises the question of how users should set a reasonable score threshold for confidence filtering.
Questions
Is there a recommended cutoff for accepting PHROG functional predictions based on phrog_model_score?
For example, is 0.7 or 0.9 suggested for high-confidence functional assignment?
How should one interpret cases where VPF-PLM assigns the correct PHROG class (e.g., tail), but with a moderate score (e.g., ~0.4)?
Should these be treated as valid predictions with lower confidence, or should they be filtered out?
Thank you very much for your time and for developing VPF-PLM.