You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, very clever thinking on PiML's scored testing not requiring the actual model object.
Context:
I have the input features, target response data, and the corresponding model predictions - all for a particular dataset and task.
The problem is the features (one text, remaining non-text) are "transformed"/learned via embeddings in the original model. Text feature is via huggingface fine-tuning and non-text is via FT-Transformer. Then the original model brings both sets of embeddings together to make predictions via a fusion MLP (3 components in total).
I am not concerned about the text feature and I figured I can pass in the tuned n-dimensional representation of the text feature as n-d columns (because the probabilities factor the full X or i/p); and then apply scored testing for all original non-text features as it is.
1. All features don't start off as embeddings. But the final prediction result and probabilities are. Am I at risk of misleading results? Because my non-text is also treated as embeddings by the original model.
2. Would sincerely appreciate it if you can share thoughts as to how I can apply scored testing here/PiML in general.
Hi, very clever thinking on PiML's scored testing not requiring the actual model object.
Context:
I have the input features, target response data, and the corresponding model predictions - all for a particular dataset and task.
The problem is the features (one text, remaining non-text) are "transformed"/learned via embeddings in the original model. Text feature is via huggingface fine-tuning and non-text is via FT-Transformer. Then the original model brings both sets of embeddings together to make predictions via a fusion MLP (3 components in total).
I am not concerned about the text feature and I figured I can pass in the tuned n-dimensional representation of the text feature as n-d columns (because the probabilities factor the full X or i/p); and then apply scored testing for all original non-text features as it is.
1. All features don't start off as embeddings. But the final prediction result and probabilities are. Am I at risk of misleading results? Because my non-text is also treated as embeddings by the original model.
2. Would sincerely appreciate it if you can share thoughts as to how I can apply scored testing here/PiML in general.
@ajzhanghk @ZebinYang @simoncos @CnBDM-Su
The text was updated successfully, but these errors were encountered: