fix(pd-converter): Remove QuanInfo column after cleaning up PD data #106
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
https://groups.google.com/g/msstats/c/8OrKxfxMxOo - The PD converter does not remove the QuanInfo column from the input data frame.
QuanInfo
is a PD specific column and is not referenced anywhere else in the code. However, this column causes problems with summarizing multiple PSMs:Workflow:
max
where if we have duplicate PSMs with equal max intensities here, we cannot determine which PSM to use to summarize a feature.QuanInfo
is included, the code may mistaken two duplicate PSMs as being associated with different features here, leading to duplicate PSMs remaining in the input data.proteinSummarization
/dataProcess
crashes when there's multiple PSMs of the same feature.I've determined the best solution is to remove
QuanInfo
from the input data frame during conversion.Changes
QuanInfo
column from the pd input table after cleanup since it is not needed anymore.Testing
Fixed existing unit tests
Checklist Before Requesting a Review