Skip to content

Conversation

@tibvdm
Copy link
Collaborator

@tibvdm tibvdm commented Dec 3, 2025

Remove duplicate taxa before calculating the LCA. This will also remove the duplicate taxa when reporting all taxa.

…ve the duplicate taxa when reporting all taxa.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds deduplication of taxa before calculating the Lowest Common Ancestor (LCA) and when reporting all taxa in the pept2data endpoint. The change ensures that duplicate taxon IDs from multiple proteins are removed, which should improve both the accuracy of LCA calculations and the clarity of reported results.

Key Changes:

  • Added itertools::Itertools import to enable the unique() method
  • Applied .unique() to the taxa iterator before collecting into a vector, removing duplicate taxon IDs while preserving the order of first occurrences

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pverscha pverscha merged commit 6fa195b into develop Dec 3, 2025
7 checks passed
@pverscha pverscha deleted the fix/report-taxa-duplicates branch December 3, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants