-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify strata_check defaults from vfold_cv #110
Comments
Hi Michael (@skinnider), May I ask if you have an update on this issue you've raised. I'm just working on doing some cross-validation and stratified samplified and I've exactly the same question. I've implemented my own 'version' of Let me know |
@skinnider, I work with Camille and we've put together the PR #132 above to address this. If this is still an issue for you then feel free to give it a try and let us know if it works in your case. |
I'd rather try to solve this by lowering the threshold a bit (as opposed to adding another argument). With |
Let's try lowering this default for the next release. |
The PR in #149 lowers the threshold so that your example would successfully stratify and no longer give a warning: library(tidyverse)
library(rsample)
X <- matrix(rnorm(140 * 100), ncol = 100, nrow = 140)
y <- rep(letters[1:7], each = 20)
df <- tibble(X) %>%
mutate(label = y)
vfold_cv(df, v = 3, strata = label)
#> # 3-fold cross-validation using stratification
#> # A tibble: 3 x 2
#> splits id
#> <named list> <chr>
#> 1 <split [91/49]> Fold1
#> 2 <split [91/49]> Fold2
#> 3 <split [98/42]> Fold3 Created on 2020-05-07 by the reprex package (v0.3.0) |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue. |
I have a dataset with seven classes, and 20 observations in each class. I want to do three-fold cross-validation on this dataset, using stratified sampling, because when using unstratified sampling there is a chance that a particular fold will not contain any positive examples for a given class.
However, when I try to do this with
vfold_cv
I get the following warning:Created on 2019-09-12 by the reprex package (v0.3.0)
It seems this is related to the
check_strata
function, specifically the default valuepool = 0.15
. In the context ofcheck_strata
,pcts
is a vector of length 7 where every value is equal to 1 / 7 and so the function returns a single stratum:This wouldn't be an issue if I could just change the default value of
pool
fromvfold_cv
but at present it doesn't seem like I can. Is it possible to pass the ellipsis fromvfold_cv
->vfold_splits
->make_strata
?sessionInfo():
The text was updated successfully, but these errors were encountered: