Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I tried to stack 3 models. The first 2 can stack, but adding the last, a cubist, gave me this error: Error in { : task 1 failed - "data.frame_ is unknown.". #106

Closed
amcmahon17 opened this issue Mar 18, 2022 · 3 comments

Comments

@amcmahon17
Copy link

amcmahon17 commented Mar 18, 2022

I tried to stack 3 models. The first 2 can stack, but adding the last, a cubist, gave me this error: Error in { : task 1 failed - "data.frame_ is unknown.". Also: Warning message:
The ... are not used in this function but one or more objects were passed: 'parallel_over'

library(chemometrics)
library(tidymodels)
library(stacks)

library(doParallel)

ctrl_grid <- control_stack_grid()

cl <- makeCluster(detectCores())
registerDoParallel(cl)

data("NIR")
specsampledata<-bind_cols(NIR$yGlcEtOH,NIR$xNIR)
regmetrics<-metric_set(yardstick::rmse, yardstick::rsq, yardstick::mae)
set.seed(565)



specsampledata_split <- initial_split(specsampledata, prop = .5,strata=Ethanol)
specsampledata_train_data <- training(specsampledata_split)
specsampledata_test_data  <- testing(specsampledata_split)


train_data_cv <- vfold_cv(specsampledata_train_data ,repeats=3,strata=Ethanol)

xgb_spec <- boost_tree(
  trees = tune(), 
  tree_depth = tune(), min_n = tune(), 
  loss_reduction = tune(),                     
  sample_size = tune(), mtry = tune(),         
  learn_rate = tune(),                        
) %>% 
  set_engine("xgboost") %>% 
  set_mode("regression")




xgb_rec <- 
  recipe(Ethanol ~ ., data = specsampledata_train_data ) %>% 
  step_corr(all_numeric_predictors())



xgb_workflow <- 
  workflow() %>% 
  add_model(xgb_spec) %>% 
  add_recipe(xgb_rec)









xgb_grid <- grid_latin_hypercube(
  tree_depth(),
  min_n(),
  trees(),
  loss_reduction(),
  sample_size = sample_prop(),
  finalize(mtry(), specsampledata_train_data ),
  learn_rate(),
  size = 30
)

xgb_grid_results <- xgb_workflow %>%
  tune_grid(resamples = train_data_cv, 
            grid=xgb_grid,
            metrics = regmetrics, 
            control = ctrl_grid,
            parallel_over = "resamples"
  )

 










bagmars_spec <- bag_mars(
  num_terms = tune(),
  prod_degree = tune()                      ## step size
) %>% 
  set_mode("regression")


library(baguette)

bagmars_grid <- grid_latin_hypercube(
  finalize(num_terms(), specsampledata_train_data ),
  prod_degree(),
  size = 30
)

bagmars_rec <- 
  recipe(Ethanol ~ ., data = specsampledata_train_data ) 


bagmars_workflow <- 
  workflow() %>% add_model(bagmars_spec) %>% add_recipe(bagmars_rec)

bagmars_tune_grid_results <-bagmars_workflow  %>%
  tune_grid(resamples = train_data_cv, 
            grid=bagmars_grid,
            metrics = regmetrics, 
            control = ctrl_grid,
            parallel_over = "resamples"
  )



bagmars_tune_grid_results %>% collect_metrics()






library(rules)

cubist_spec<-
  cubist_rules(
    committees = tune(),
    neighbors = tune(),
    max_rules = tune()
  )


cubist_rec <- 
  recipe(Ethanol ~ ., data = specsampledata_train_data )


cubist_workflow <- 
  workflow() %>% 
  add_model(cubist_spec) %>% 
  add_recipe(cubist_rec)

cubist_grid <- grid_latin_hypercube(
  max_rules(),
  committees(),
  neighbors(),
  size = 30
)




cubist_workflow <- 
  workflow() %>% add_model(cubist_spec) %>% add_recipe(cubist_rec)


cubist_tune_grid_results <-cubist_workflow  %>%
  tune_grid(resamples = train_data_cv, 
            grid=cubist_grid,
            metrics = regmetrics, 
            control = ctrl_grid,
            parallel_over = "resamples"
  )




library(stacks)
stackedmodel <- 
  stacks() %>%
  add_candidates(bagmars_tune_grid_results) %>%
  add_candidates(cubist_tune_grid_results) %>%
  add_candidates(xgb_grid_results) %>%
  blend_predictions(penalty = c(.5, 1),metric = metric_set(rmse))  %>% fit_members()
 


bind_cols(specsampledata_test_data,stackedmodel %>% predict(specsampledata_test_data)) %>% 
  select(Ethanol,.pred) %>% regmetrics(truth=Ethanol,estimate=.pred)

stopImplicitCluster(cl)
@amcmahon17
Copy link
Author

amcmahon17 commented Mar 18, 2022

Quick update: I tried this again and it worked. I think stacks is blameless. It's the parallel steps that seem to generate the errors, and only sometimes. Perhaps my machine is the problem.

@simonpcouch
Copy link
Collaborator

Thanks for the issue! :)

I will indeed close this issue, but I do think {stacks} ought to fail more gracefully here. Related to #105 in that {stacks} fails to point out that a candidate failed to train / had some sort of issue pre-stacking and supplies some other, uninformative error. Will work a change into the next release that gives an eye to erroring informatively in this situation.

@github-actions
Copy link

github-actions bot commented Apr 2, 2022

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants