Simpler stacking of model specs or workflows without considering metrics #685

brunocarlin · 2022-03-17T18:30:38Z

I don't know the best way to do it but the basic idea would be to define a set of model specs as a named list and define a final spec as the model that would combine all the predictions.

library(tidymodels)

data(ames)

set.seed(4595)
data_split <- initial_split(ames, strata = "Sale_Price", prop = 0.75)

ames_train <- training(data_split)
ames_test  <- testing(data_split)


rf_spec <- rand_forest(mode = "regression")


boost_spec <- 
  boost_tree() |> 
  set_mode('regression')

final_spec <- 
  linear_reg(penalty = 0.001, mixture = 0.5) %>% 
  set_engine("glmnet")

Then there would exist this function which would receive the list of base specs and the finalizing model



list_base_specs <- list(randon_forest = rf_spec,xgboost = boost_spec)

simple_stack <- simpler_stacking(base_components = list_base_specs,final_component = final_spec)

And this final model would be the 'spec' responsible for keeping track of the fits of all base models, so you could add this spec to a workflow or just a simple model fit

simple_stack |> fit(Sale_Price ~. ,ames_train)

This simpler approach to stacking would be specially nice to 'embed' sparse variable spaces with complex models and get regularized predictions with simpler models as the final_component, it may be better to force the components to be all workflows, but that would make it impossible to work with censored... Or we could add recipes for survival data as well to solve that problem, I am very open to help with pr's but I don't know if you guys like the idea or how to best implement it.

The text was updated successfully, but these errors were encountered:

EmilHvitfeldt · 2022-03-17T18:47:43Z

I have two quick comments:

Have you looked at the {stacks} package? https://stacks.tidymodels.org/ It provides a nice interface to do ensemble methods.
We are planning to add proper workflow support for {censored} models. But we haven't gotten to that part yet. There are way to get around some of the current limitations as outlined here workflows::add_formula doesn't work with censored censored#159

brunocarlin · 2022-03-17T18:52:20Z

Yes I have, the problem is that stacks checks for error metrics on the base workflows, and those don't exist yet for survival methods, also there aren't any methods to currently blend_predictions on a censored regression.
I will look into that, that workaround may help solve the problem by itself nice, if I can use workflow_map I can get the underlying code to be way less complex!

EmilHvitfeldt · 2022-03-17T18:58:44Z

Hopefully you can get it working! At a first glance I don't think we will implement this feature since it is trying to deal with a problem that eventually will be fixed once {censored} becomes mature enough and get full support from the other core packages.

brunocarlin · 2022-03-17T19:46:46Z

Yeah I still fail at check metrics, would probably involve change the check_metrics function on tune and creating a censored metric on yardstick?

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.1.3
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
#> Warning: package 'broom' was built under R version 4.1.3
#> Warning: package 'dials' was built under R version 4.1.3
#> Warning: package 'dplyr' was built under R version 4.1.3
#> Warning: package 'infer' was built under R version 4.1.3
#> Warning: package 'modeldata' was built under R version 4.1.3
#> Warning: package 'parsnip' was built under R version 4.1.3
#> Warning: package 'recipes' was built under R version 4.1.3
#> Warning: package 'rsample' was built under R version 4.1.3
#> Warning: package 'tidyr' was built under R version 4.1.3
#> Warning: package 'tune' was built under R version 4.1.3
#> Warning: package 'workflows' was built under R version 4.1.3
#> Warning: package 'yardstick' was built under R version 4.1.3
library(censored)
#> Carregando pacotes exigidos: survival
#> Warning: package 'survival' was built under R version 4.1.3
library(stacks)
#> Warning: package 'stacks' was built under R version 4.1.3

ph_spec <- 
  proportional_hazards() %>%
  set_engine("survival") %>% 
  set_mode("censored regression")

ph_workflow <- workflow() %>%
  add_variables(outcomes = c(time, status),
                predictors = everything()) %>% 
  add_model(ph_spec, formula = Surv(time, status) ~ .) 

ph_workflow |> tune_grid(resamples = vfold_cv(lung))
#> Error in `check_metrics()`:
#> ! Unknown `mode` for parsnip model.

^{Created on 2022-03-17 by the reprex package (v2.0.1)}

simonpcouch · 2024-04-04T14:00:55Z

Closing in favor of tidymodels/stacks#199 and tidymodels/stacks#152. tidymodels/stacks#199 and a model specification from the bespoke extension package would do the trick for you, I think. Depending on your application, modeltime.ensemble could be another helpful tool.

github-actions · 2024-04-19T00:51:12Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch closed this as completed Apr 4, 2024

github-actions bot locked and limited conversation to collaborators Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simpler stacking of model specs or workflows without considering metrics #685

Simpler stacking of model specs or workflows without considering metrics #685

brunocarlin commented Mar 17, 2022

EmilHvitfeldt commented Mar 17, 2022

brunocarlin commented Mar 17, 2022 •

edited

Loading

EmilHvitfeldt commented Mar 17, 2022

brunocarlin commented Mar 17, 2022

simonpcouch commented Apr 4, 2024

github-actions bot commented Apr 19, 2024

Simpler stacking of model specs or workflows without considering metrics #685

Simpler stacking of model specs or workflows without considering metrics #685

Comments

brunocarlin commented Mar 17, 2022

EmilHvitfeldt commented Mar 17, 2022

brunocarlin commented Mar 17, 2022 • edited Loading

EmilHvitfeldt commented Mar 17, 2022

brunocarlin commented Mar 17, 2022

simonpcouch commented Apr 4, 2024

github-actions bot commented Apr 19, 2024

brunocarlin commented Mar 17, 2022 •

edited

Loading