-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Description
When using Enzeptional for enzyme sequence optimization with the XGBoost model, the predicted Kcat values in the output include both low positive numbers (e.g., 0.499) and negative values (e.g., -0.874), which appear physically invalid for Kcat measurements.
From the framework documentation and previous papers, I understand that:
- The training process uses logarithmic transformation on Kcat data to improve linearity
- The
scaler.pklfile is applied to reverse transformations during prediction - However, it's unclear whether the final scores from
SequenceScorerare already in the original scale or if manual conversion (e.g., 10^x) is required
Questions
-
Scale Interpretation: Are the Kcat values in the optimization output already converted back to the original (non-logarithmic) scale, or do they require additional transformation?
-
Negative Values: How should negative Kcat values be interpreted? Do they indicate:
- Invalid sequences?
- Pipeline errors?
- Artifacts of the model prediction range?
- Values below a certain detection threshold?
-
Expected Value Range: What is the expected valid range for Kcat predictions, and how should outliers or physically impossible values be handled?
Context
Example output values observed:
- Positive but low: 0.499, 0.123, 0.876
- Negative: -0.874, -1.234, -0.567
Additional Information
- Using Enzeptional via GT4SD library
- XGBoost model for Kcat prediction
- Following the example from:
examples/enzeptional/example_enzeptional.py
Any clarification on the proper interpretation of these output values would be greatly appreciated for correct analysis of optimization results.