-
Notifications
You must be signed in to change notification settings - Fork 158
Closed
Labels
apidesign discussionDiscussing design issuesDiscussing design issuesenhancementNew feature or requestNew feature or request
Description
This follows a discussion on Slack and @ablaom's suggestion to open an issue to brainstorm ideas.
The problem
Consider a machine learning model with p-dimensional features X. Now assume that for each feature j, the analyst has access to external information Z(j) and furthermore that this information can potentially be used by machine learning models to improve predictive performance. A concrete example of such external information would be the case of categorical Z(j), which induces a partition of features into groups of related features.
How could the MLJ API account for that?
Example use cases for grouping structure
- Such information is quite frequently used in the context of supervised methods with linear predictors, with the most common method being the (sparse) Group Lasso. This is also the use case I am interested in.
- It comes up less in the supervised non-linear setting, but one exception is https://link.springer.com/article/10.1186/s12859-017-1993-1, where they study Random Forests: the probability that a feature is selected within the candidate set for a split is modulated by the group it belongs to (so features of some groups are prioritized).
- Grouping structure comes up frequently in the unsupervised setting, e.g., the following is a method popular in the computational biology community https://www.embopress.org/doi/full/10.15252/msb.20178124. In the Machine Learning community such problems sometimes come up under the name multi-view learning (e.g., https://arxiv.org/pdf/1604.04939.pdf).
ablaom and azev77
Metadata
Metadata
Assignees
Labels
apidesign discussionDiscussing design issuesDiscussing design issuesenhancementNew feature or requestNew feature or request