Generalizing the stacks API to enable more customization #199

mattwarkentin · 2023-05-03T17:12:18Z

I love the {stacks} package and over the last several months have thought about it quite a bit and whether there is room to broaden the API to be more flexible. It seems to me that the current API is opinionated on a few things:

The stacking model must be Ridge/LASSO/ElasticNet
The resampling is done via bootstrapping
The training routine is tune::tune_grid()

I am wondering if there is interest in a function one level lower than blend_predictions() that is more flexible on the three design considerations described above. Most importantly, a more general API for stacking would allow users to take advantage of the huge breadth of models available through parsnip et al. for stacking predictions (e.g., random forest, XGBoost, etc.). In theory, any model that supports the mode would be a candidate for the stacking model.

Without actually considering the implementation too much, I image some function, let's call it stack_predictions(), because I don't have a better name off the top of my head, that looks something like:

stack_predictions(
  data_stack,
  model = parsnip::linear_reg(engine = "glmnet", penalty = tune(), mixture = tune())
  fn = tune::tune_grid,
  resamples = rsample::bootstrap(times = 25),
  control = tune::control_grid(),
  ... # passed on to `fn` (metric, grid, param_info)
)

What do you think? This way the user can control the stacking more finely and blend_predictions() would be a special case of stack_predictions() and could potentially call this function internally. That way if you wanted to stack with a random forest, tune with {finetune}, and use 100 Monte Carlo resamples, you could do something like:

stacks() |>
  add_candidates(wflow_set_results) |>
  stack_predictions(
    model = parsnip::rand_forest(...),
    fn = finetune::tune_race_anova,
    resamples = rsample::mc_cv(times = 100)
  )

I have thought about this a few times and figured it was worth going full stream of consciousness and laying it all out for you to think about. Happy to chat more and think about this more thoroughly. As always, happy to contribute and not just request features.

While I'm thinking about it, in order to support more stacking models, there needs to be a way to define what it means to be a "non-zero stacking coefficient" for models for which coefficients don't really exist (e.g., Random Forest). Perhaps for tree-based models, if a model's prediction are used for a split in any tree, it is "non-zero" - this requires some more thinking.

The text was updated successfully, but these errors were encountered:

simonpcouch · 2023-05-16T17:22:55Z

Just wanted to drop a note that I've seen this and appreciate the thorough issue description! Related to #54. We've been benchmarking some variants on this generalization and still have some work to do before we'd feel confident moving forward with an implementation.

simonpcouch · 2024-01-26T14:49:00Z

Related StackOverflow post: https://stackoverflow.com/questions/77872965/using-user-defined-weights-for-an-ensemble-model

jrosell · 2024-02-15T20:35:02Z

Other interesting post, doing ensembles manually
https://www.mm218.dev/posts/2021/01/model-averaging/

JavOrraca · 2024-05-30T20:17:57Z

Seconding this feature request! Stacks is beautifully fast but I'd love a native way to build a stacked ensemble from a workflowsets trained object that uses finetune::tune_race_anova and finetune::control_race. 🙏

simonpcouch added the feature a feature request or enhancement label Oct 31, 2023

simonpcouch mentioned this issue Apr 4, 2024

Simpler stacking of model specs or workflows without considering metrics tidymodels/parsnip#685

Closed

simonpcouch mentioned this issue Aug 19, 2024

Adding stratification to bootstrapping in blend_predictions #226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalizing the stacks API to enable more customization #199

Generalizing the stacks API to enable more customization #199

mattwarkentin commented May 3, 2023 •

edited

Loading

simonpcouch commented May 16, 2023

simonpcouch commented Jan 26, 2024

jrosell commented Feb 15, 2024

JavOrraca commented May 30, 2024 •

edited

Loading

Generalizing the stacks API to enable more customization #199

Generalizing the stacks API to enable more customization #199

Comments

mattwarkentin commented May 3, 2023 • edited Loading

simonpcouch commented May 16, 2023

simonpcouch commented Jan 26, 2024

jrosell commented Feb 15, 2024

JavOrraca commented May 30, 2024 • edited Loading

mattwarkentin commented May 3, 2023 •

edited

Loading

JavOrraca commented May 30, 2024 •

edited

Loading