Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation split as a 3-way split #403

Merged
merged 23 commits into from
Dec 20, 2022
Merged

Validation split as a 3-way split #403

merged 23 commits into from
Dec 20, 2022

Conversation

hfrick
Copy link
Member

@hfrick hfrick commented Dec 13, 2022

This PR is the start of introduction functionality to work with a split into training/validation/testing as a 3-way split as opposed to two consecutive binary splits (testing vs not-testing and then not-testing into training and validation). Closes #369.

As an illustration of where this is going:

library(rsample)
data(ames, package = "modeldata")

# in this PR
first_split <- initial_validation_split(ames)

# in the next PR
rset_for_tuning <- validation_set(first_split)

# access to individual subsets 

# in this PR
ames_train <- training(first_split)
ames_val <- validation(first_split)
ames_test <- testing(first_split) 

# in the next PR
ames_analysis <- analysis(rset_for_tuning$splits[[1]])
ames_assesment <- assessment(rset_for_tuning$splits[[1]])

I'll comment on the code for specific things I'd like feedback on.

In addition to that: Do we need methods for obj_sum() and type_sum() for initial_validation_split objects? I assume they are only needed for usage with pillar / within tibbles and we don't do that for initial splits?

Edit: Don't merge this yet, I'm collecting changes in this branch.

#' validation_data <- validation(ames_split)
#' test_data <- testing(ames_split)
initial_validation_split <- function(data,
prop = c(0.6, 0.2),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is analogous to the single-element vector specifying the proportion for training in initial_split(), here taking two of the three proportions (training and validation).

@hfrick hfrick requested a review from topepo December 13, 2022 12:04
R/initial_validation_split.R Outdated Show resolved Hide resolved
R/initial_validation_split.R Outdated Show resolved Hide resolved
R/initial_validation_split.R Outdated Show resolved Hide resolved
R/initial_validation_split.R Outdated Show resolved Hide resolved
R/initial_validation_split.R Outdated Show resolved Hide resolved
R/initial_validation_split.R Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
R/initial_validation_split.R Show resolved Hide resolved
#' @return An `initial_validation_split` object that can be used with the
#' `training()`, `validation()`, and `testing()` functions to extract the data
#' in each split.
#'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should wait until we have a function to extract a validation split set object from this...

Can you add a details section describing that this is going to be the preferred methods for this splittings strategy and show the two-step code using initial_split()/validation_split() sequence compared to what we suggest doing now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that sounds good! Once we have the function to make the rset, I'll update the docs here and for validation_split() showing the code for a 3-way split vs two consecutive binary splits.

R/initial_validation_split.R Outdated Show resolved Hide resolved
hfrick and others added 2 commits December 13, 2022 16:04
@hfrick hfrick requested a review from topepo December 13, 2022 16:36
@hfrick hfrick merged commit b1d98fa into main Dec 20, 2022
@hfrick hfrick deleted the validation-split-as-3 branch December 20, 2022 11:17
@github-actions
Copy link

github-actions bot commented Jan 4, 2023

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

three-way initial splits as as_validation_split()
2 participants