Generalizing the stacks API to enable more customization

Hi @simonpcouch,

I love the `{stacks}` package and over the last several months have thought about it quite a bit and whether there is room to broaden the API to be more flexible. It seems to me that the current API is _opinionated_ on a few things:

1. The stacking model must be Ridge/LASSO/ElasticNet
2. The resampling is done via bootstrapping
3. The training routine is `tune::tune_grid()`

I am wondering if there is interest in a function one level _lower_ than `blend_predictions()` that is more flexible on the three design considerations described above. Most importantly, a more general API for stacking would allow users to take advantage of the __huge__ breadth of models available through `parsnip` et al. for stacking predictions (e.g., random forest, XGBoost, etc.). In theory, any model that supports the `mode` would be a candidate for the stacking model.

Without actually considering the implementation too much, I image some function, let's call it `stack_predictions()`, because I don't have a better name off the top of my head, that looks something like:

```r
stack_predictions(
  data_stack,
  model = parsnip::linear_reg(engine = "glmnet", penalty = tune(), mixture = tune())
  fn = tune::tune_grid,
  resamples = rsample::bootstrap(times = 25),
  control = tune::control_grid(),
  ... # passed on to `fn` (metric, grid, param_info)
)
```

What do you think? This way the user can control the stacking more finely and `blend_predictions()` would be a special case of `stack_predictions()` and could potentially call this function internally. That way if you wanted to stack with a random forest, tune with `{finetune}`, and use 100 Monte Carlo resamples, you could do something like:

```r
stacks() |>
  add_candidates(wflow_set_results) |>
  stack_predictions(
    model = parsnip::rand_forest(...),
    fn = finetune::tune_race_anova,
    resamples = rsample::mc_cv(times = 100)
  )
```

I have thought about this a few times and figured it was worth going full stream of consciousness and laying it all out for you to think about. Happy to chat more and think about this more thoroughly. As always, happy to contribute and not just request features.

While I'm thinking about it, in order to support more stacking models, there needs to be a way to define what it means to be a "non-zero stacking coefficient" for models for which coefficients don't really exist (e.g., Random Forest). Perhaps for tree-based models, if a model's prediction are used for a split in any tree, it is "non-zero" - this requires some more thinking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generalizing the stacks API to enable more customization #199

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generalizing the stacks API to enable more customization #199

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions