-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group_bootstraps() errors when it gets original dataset in a resample #356
Comments
This makes sense to me! It's worth noting that rsample::bootstraps(mtcars[1, ])
#> # Bootstrap sampling
#> # A tibble: 25 × 2
#> splits id
#> <list> <chr>
#> 1 <split [1/0]> Bootstrap01
#> 2 <split [1/0]> Bootstrap02
#> 3 <split [1/0]> Bootstrap03
#> 4 <split [1/0]> Bootstrap04
#> 5 <split [1/0]> Bootstrap05
#> 6 <split [1/0]> Bootstrap06
#> 7 <split [1/0]> Bootstrap07
#> 8 <split [1/0]> Bootstrap08
#> 9 <split [1/0]> Bootstrap09
#> 10 <split [1/0]> Bootstrap10
#> # … with 15 more rows
#> # ℹ Use `print(n = ...)` to see more rows Created on 2022-08-08 by the reprex package (v2.0.1) So there's an argument here for just not checking at all. But of course, in practice sampling each observation is going to happen much less frequently when sampling observations rather than groups, so I do think either a warning or an argument would be useful. We might also consider warning when there's 0 assessment data in a regular bootstrap rset, because that will basically not work with For this function the error is in a pretty high-level spot, so I don't think this would be a hard change either: Lines 233 to 244 in 86f56df
@juliasilge what do you think? |
My inclination is to move from an error to a warning on |
Thanks for the issue @tjmahr 😄 |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Thanks for adding
group_bootstraps()
! One quick problem from my testing: Currently,group_bootstraps()
throws an error when it resamples the original dataset. This happens easily when the number of resamples is larger than the number of group combinations:Here 25 resamples of 4 groups hits the problem.
But I don't think an error/deadend is the correct design here. There are bootstrap workflows where it is fine to resample the original data. For example, suppose you want to just a fit a bunch of models on resampled groups:
The assessment split data is never used in this workflow. Require non-empty assessment splits also means we are losing the ability to average over resamples that just happen to be the apparent dataset, so it's kind of a bias thing too.
Two possible suggestions:
group_bootstraps(..., allow_empty_assessment = TRUE)
.The second option is more dangerous because it means users will hit preventable errors when they try to use assessment(), so I like the first one better.
Created on 2022-08-08 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: