-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation split as a 3-way split #403
Conversation
#' validation_data <- validation(ames_split) | ||
#' test_data <- testing(ames_split) | ||
initial_validation_split <- function(data, | ||
prop = c(0.6, 0.2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is analogous to the single-element vector specifying the proportion for training in initial_split()
, here taking two of the three proportions (training and validation).
#' @return An `initial_validation_split` object that can be used with the | ||
#' `training()`, `validation()`, and `testing()` functions to extract the data | ||
#' in each split. | ||
#' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should wait until we have a function to extract a validation split set object from this...
Can you add a details section describing that this is going to be the preferred methods for this splittings strategy and show the two-step code using initial_split()
/validation_split()
sequence compared to what we suggest doing now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that sounds good! Once we have the function to make the rset
, I'll update the docs here and for validation_split()
showing the code for a 3-way split vs two consecutive binary splits.
Co-authored-by: Max Kuhn <mxkuhn@gmail.com>
Co-authored-by: Max Kuhn <mxkuhn@gmail.com>
Add `validation_set()` for 3-way split approach
this matters for the tests on `reshuffle_rset()`
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
This PR is the start of introduction functionality to work with a split into training/validation/testing as a 3-way split as opposed to two consecutive binary splits (testing vs not-testing and then not-testing into training and validation). Closes #369.
As an illustration of where this is going:
I'll comment on the code for specific things I'd like feedback on.
In addition to that: Do we need methods for
obj_sum()
andtype_sum()
forinitial_validation_split
objects? I assume they are only needed for usage with pillar / within tibbles and we don't do that for initial splits?Edit: Don't merge this yet, I'm collecting changes in this branch.