Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler stacking of model specs or workflows without considering metrics #685

Closed
brunocarlin opened this issue Mar 17, 2022 · 6 comments
Closed

Comments

@brunocarlin
Copy link

I don't know the best way to do it but the basic idea would be to define a set of model specs as a named list and define a final spec as the model that would combine all the predictions.

library(tidymodels)

data(ames)

set.seed(4595)
data_split <- initial_split(ames, strata = "Sale_Price", prop = 0.75)

ames_train <- training(data_split)
ames_test  <- testing(data_split)


rf_spec <- rand_forest(mode = "regression")


boost_spec <- 
  boost_tree() |> 
  set_mode('regression')

final_spec <- 
  linear_reg(penalty = 0.001, mixture = 0.5) %>% 
  set_engine("glmnet")

Then there would exist this function which would receive the list of base specs and the finalizing model



list_base_specs <- list(randon_forest = rf_spec,xgboost = boost_spec)

simple_stack <- simpler_stacking(base_components = list_base_specs,final_component = final_spec)

And this final model would be the 'spec' responsible for keeping track of the fits of all base models, so you could add this spec to a workflow or just a simple model fit

simple_stack |> fit(Sale_Price ~. ,ames_train)

This simpler approach to stacking would be specially nice to 'embed' sparse variable spaces with complex models and get regularized predictions with simpler models as the final_component, it may be better to force the components to be all workflows, but that would make it impossible to work with censored... Or we could add recipes for survival data as well to solve that problem, I am very open to help with pr's but I don't know if you guys like the idea or how to best implement it.

@EmilHvitfeldt
Copy link
Member

I have two quick comments:

  1. Have you looked at the {stacks} package? https://stacks.tidymodels.org/ It provides a nice interface to do ensemble methods.
  2. We are planning to add proper workflow support for {censored} models. But we haven't gotten to that part yet. There are way to get around some of the current limitations as outlined here workflows::add_formula doesn't work with censored censored#159

@brunocarlin
Copy link
Author

brunocarlin commented Mar 17, 2022

  1. Yes I have, the problem is that stacks checks for error metrics on the base workflows, and those don't exist yet for survival methods, also there aren't any methods to currently blend_predictions on a censored regression.
  2. I will look into that, that workaround may help solve the problem by itself nice, if I can use workflow_map I can get the underlying code to be way less complex!

@EmilHvitfeldt
Copy link
Member

Hopefully you can get it working! At a first glance I don't think we will implement this feature since it is trying to deal with a problem that eventually will be fixed once {censored} becomes mature enough and get full support from the other core packages.

@brunocarlin
Copy link
Author

Yeah I still fail at check metrics, would probably involve change the check_metrics function on tune and creating a censored metric on yardstick?

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.1.3
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
#> Warning: package 'broom' was built under R version 4.1.3
#> Warning: package 'dials' was built under R version 4.1.3
#> Warning: package 'dplyr' was built under R version 4.1.3
#> Warning: package 'infer' was built under R version 4.1.3
#> Warning: package 'modeldata' was built under R version 4.1.3
#> Warning: package 'parsnip' was built under R version 4.1.3
#> Warning: package 'recipes' was built under R version 4.1.3
#> Warning: package 'rsample' was built under R version 4.1.3
#> Warning: package 'tidyr' was built under R version 4.1.3
#> Warning: package 'tune' was built under R version 4.1.3
#> Warning: package 'workflows' was built under R version 4.1.3
#> Warning: package 'yardstick' was built under R version 4.1.3
library(censored)
#> Carregando pacotes exigidos: survival
#> Warning: package 'survival' was built under R version 4.1.3
library(stacks)
#> Warning: package 'stacks' was built under R version 4.1.3

ph_spec <- 
  proportional_hazards() %>%
  set_engine("survival") %>% 
  set_mode("censored regression")

ph_workflow <- workflow() %>%
  add_variables(outcomes = c(time, status),
                predictors = everything()) %>% 
  add_model(ph_spec, formula = Surv(time, status) ~ .) 

ph_workflow |> tune_grid(resamples = vfold_cv(lung))
#> Error in `check_metrics()`:
#> ! Unknown `mode` for parsnip model.

Created on 2022-03-17 by the reprex package (v2.0.1)

@simonpcouch
Copy link
Contributor

Closing in favor of tidymodels/stacks#199 and tidymodels/stacks#152. tidymodels/stacks#199 and a model specification from the bespoke extension package would do the trick for you, I think. Depending on your application, modeltime.ensemble could be another helpful tool.

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants