Validation split as a 3-way split #403

hfrick · 2022-12-13T11:09:29Z

This PR is the start of introduction functionality to work with a split into training/validation/testing as a 3-way split as opposed to two consecutive binary splits (testing vs not-testing and then not-testing into training and validation). Closes #369.

As an illustration of where this is going:

library(rsample)
data(ames, package = "modeldata")

# in this PR
first_split <- initial_validation_split(ames)

# in the next PR
rset_for_tuning <- validation_set(first_split)

# access to individual subsets 

# in this PR
ames_train <- training(first_split)
ames_val <- validation(first_split)
ames_test <- testing(first_split) 

# in the next PR
ames_analysis <- analysis(rset_for_tuning$splits[[1]])
ames_assesment <- assessment(rset_for_tuning$splits[[1]])

I'll comment on the code for specific things I'd like feedback on.

In addition to that: Do we need methods for obj_sum() and type_sum() for initial_validation_split objects? I assume they are only needed for usage with pillar / within tibbles and we don't do that for initial splits?

Edit: Don't merge this yet, I'm collecting changes in this branch.

hfrick · 2022-12-13T12:01:45Z

R/initial_validation_split.R

+#' validation_data <- validation(ames_split)
+#' test_data <- testing(ames_split)
+initial_validation_split <- function(data,
+                                     prop = c(0.6, 0.2),


this is analogous to the single-element vector specifying the proportion for training in initial_split(), here taking two of the three proportions (training and validation).

R/initial_validation_split.R

NEWS.md

R/initial_validation_split.R

topepo · 2022-12-13T15:23:07Z

R/initial_validation_split.R

+#' @return An `initial_validation_split` object that can be used with the
+#' `training()`, `validation()`, and `testing()` functions to extract the data
+#' in each split.
+#'


Maybe this should wait until we have a function to extract a validation ~~split~~ set object from this...

Can you add a details section describing that this is going to be the preferred methods for this splittings strategy and show the two-step code using initial_split()/validation_split() sequence compared to what we suggest doing now?

yes, that sounds good! Once we have the function to make the rset, I'll update the docs here and for validation_split() showing the code for a 3-way split vs two consecutive binary splits.

R/initial_validation_split.R

Co-authored-by: Max Kuhn <mxkuhn@gmail.com>

Add `validation_set()` for 3-way split approach

this matters for the tests on `reshuffle_rset()`

github-actions · 2023-01-04T01:40:42Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

hfrick added 5 commits December 13, 2022 11:05

add initial_validation_split()

c474f4d

fix reference to tuning functions in docs

394394a

add to pkgdown index

0023e00

add NEWS bullet

ca978b6

remove reference to validation_set() for now

ee5cf33

hfrick commented Dec 13, 2022

View reviewed changes

R/initial_validation_split.R Show resolved Hide resolved

hfrick requested a review from topepo December 13, 2022 12:04

topepo reviewed Dec 13, 2022

View reviewed changes

hfrick and others added 2 commits December 13, 2022 16:04

Apply suggestions from code review

2cc685d

Co-authored-by: Max Kuhn <mxkuhn@gmail.com>

suggestions from code review

e8f5940

hfrick requested a review from topepo December 13, 2022 16:36

hfrick and others added 16 commits December 13, 2022 18:55

add validation_set()

2976804

add references to validation_set() in other docs

5cdb692

Update R/validation_set.R

41f13d3

Co-authored-by: Max Kuhn <mxkuhn@gmail.com>

update snapshot

e8e5f87

update arg name

b36706c

add to pkgdown index

734ee53

tmp change for this non-standard PR

5e8c8fd

gha fun

8b093e2

fix documentation

0c7aa32

remove unnecessary test

ea3e4f6

undo temporary changes to gha

4d73d73

Merge pull request #404 from tidymodels/validation_set

9f7650f

Add `validation_set()` for 3-way split approach

add compat for vctrs and dplyr

7de422d

move so that original order is preserved

6f8417e

this matters for the tests on `reshuffle_rset()`

fix reshuffle_rset() for validation_set

14724a4

update NEWS

b1ec5f0

hfrick merged commit b1d98fa into main Dec 20, 2022

hfrick deleted the validation-split-as-3 branch December 20, 2022 11:17

hfrick mentioned this pull request Dec 20, 2022

time based validation sets #374

Closed

github-actions bot locked and limited conversation to collaborators Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation split as a 3-way split #403

Validation split as a 3-way split #403

hfrick commented Dec 13, 2022 •

edited

Loading

hfrick Dec 13, 2022

topepo Dec 13, 2022

hfrick Dec 13, 2022

github-actions bot commented Jan 4, 2023

Validation split as a 3-way split #403

Validation split as a 3-way split #403

Conversation

hfrick commented Dec 13, 2022 • edited Loading

hfrick Dec 13, 2022

Choose a reason for hiding this comment

topepo Dec 13, 2022

Choose a reason for hiding this comment

hfrick Dec 13, 2022

Choose a reason for hiding this comment

github-actions bot commented Jan 4, 2023

hfrick commented Dec 13, 2022 •

edited

Loading