Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd error when combining nested_cv and sliding_period #459

Closed
pietrofranceschi opened this issue Nov 2, 2023 · 3 comments · Fixed by #461
Closed

Odd error when combining nested_cv and sliding_period #459

pietrofranceschi opened this issue Nov 2, 2023 · 3 comments · Fixed by #461

Comments

@pietrofranceschi
Copy link

The problem

I'm having trouble when combining sliding_period with nested_cv, in particular if I specify the origin argument together with lookback (even with its default value).

The error is the following

Error in if (grepl("^bootstraps", deparse(outer_cl))) { : 
  the condition has length > 1

Here a reproducible example:

library(tidyverse)
library(rsample)
library(modeldata)

data(Chicago)


## sliding period alone: it works fine!!
outside <-  sliding_period(Chicago, 
                           index = date, 
                           period = "month", 
                           lookback = 0, 
                           origin = Chicago$date[1])


## Now the same in a nested structure: it works fine!
test_1 <- nested_cv(Chicago, 
                   outside = sliding_period(index = date, 
                                            period = "month", 
                                            origin = Chicago$date[1]),  ## here I'm only using the origin
                   inside = vfold_cv(v = 4)
)


## Now the same structure adding the loopback argument: Error!
test_2 <- nested_cv(Chicago, 
                    outside = sliding_period(index = date, 
                                             period = "month", 
                                             origin = Chicago$date[1],
                                             lookback = 0),
                    inside = vfold_cv(v = 4)
)
#> Error in if (grepl("^bootstraps", deparse(outer_cl))) {: the condition has length > 1


## And finally with the loopback but without the origin and with the lookback: it works fine!
test_3 <- nested_cv(Chicago, 
                    outside = sliding_period(index = date, 
                                             period = "month", 
                                             lookback = 0),
                    inside = vfold_cv(v = 4)
)
Created on 2023-11-02 with [reprex v2.0.2](https://reprex.tidyverse.org/)

 Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       Fedora Linux 38 (Workstation Edition)
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Rome
#>  date     2023-11-02
#>  pandoc   2.19.2 @ /usr/libexec/rstudio/bin/pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.1)
#>  codetools     0.2-19  2023-02-01 [2] CRAN (R 4.3.1)
#>  colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.1)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.1)
#>  dplyr       * 1.1.3   2023-09-03 [1] CRAN (R 4.3.1)
#>  evaluate      0.22    2023-09-29 [1] CRAN (R 4.3.1)
#>  fansi         1.0.5   2023-10-08 [1] CRAN (R 4.3.1)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.1)
#>  forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.1)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.1)
#>  furrr         0.3.1   2022-08-15 [1] CRAN (R 4.3.1)
#>  future        1.33.0  2023-07-01 [1] CRAN (R 4.3.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.1)
#>  ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
#>  globals       0.16.2  2022-11-21 [1] CRAN (R 4.3.1)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.1)
#>  gtable        0.3.4   2023-08-21 [1] CRAN (R 4.3.1)
#>  hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.1)
#>  htmltools     0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.1)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.1)
#>  listenv       0.9.0   2022-12-16 [1] CRAN (R 4.3.1)
#>  lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.3.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.1)
#>  modeldata   * 1.2.0   2023-08-09 [1] CRAN (R 4.3.1)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.1)
#>  parallelly    1.36.0  2023-05-26 [1] CRAN (R 4.3.1)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.1)
#>  purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.1)
#>  readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.1)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.1)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.1)
#>  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.1)
#>  rsample     * 1.2.0   2023-08-23 [1] CRAN (R 4.3.1)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.1)
#>  scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)
#>  slider        0.3.1   2023-10-12 [1] CRAN (R 4.3.1)
#>  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.1)
#>  stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.1)
#>  tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.1)
#>  tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.1)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.1)
#>  tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.1)
#>  timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.1)
#>  tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.1)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.1)
#>  vctrs         0.6.4   2023-10-12 [1] CRAN (R 4.3.1)
#>  warp          0.2.0   2020-10-21 [1] CRAN (R 4.3.1)
#>  withr         2.5.1   2023-09-26 [1] CRAN (R 4.3.1)
#>  xfun          0.40    2023-08-09 [1] CRAN (R 4.3.1)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.1)
#> 
#>  [1] /home/franceschp/R/x86_64-redhat-linux-gnu-library/4.3
#>  [2] /usr/lib64/R/library
#>  [3] /usr/share/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
mikemahoney218 added a commit that referenced this issue Nov 2, 2023
deparse1() was added in 4.0.0, so copied the internals for back-compatibility.
@mikemahoney218
Copy link
Member

mikemahoney218 commented Nov 2, 2023

This is a really funny bug -- the issue is that nested_cv() is checking to see if you're passing bootstraps() to outside, which will trigger a warning. To check that, it deparses the value of outside using deparse(), which returns a character vector, then uses grepl() to see if bootstraps is anywhere in the actual text of the arguments you passed.

The issue is that deparse() will break long calls into multiple elements, which causes the grepl() call to return a vector of multiple TRUE/FALSE values, which breaks the if() statement. So the triggering issue is that your argument to outside is too long. You can actually sneak around this by using partial argument matching, which gets the call to outside under the limit:

library(tidyverse)
library(rsample)
library(modeldata)

data(Chicago)
nested_cv(Chicago, 
          outside = sliding_period(i = date, 
                                   p = "month", 
                                   origin = Chicago$date[1]),
          inside = vfold_cv(v = 4)
)
#> # Nested resampling:
#> #  outer: Sliding period resampling
#> #  inner: 4-fold cross-validation
#> # A tibble: 187 × 3
#>    splits          id       inner_resamples
#>    <list>          <chr>    <list>         
#>  1 <split [10/28]> Slice001 <vfold [4 × 2]>
#>  2 <split [28/31]> Slice002 <vfold [4 × 2]>
#>  3 <split [31/30]> Slice003 <vfold [4 × 2]>
#>  4 <split [30/31]> Slice004 <vfold [4 × 2]>
#>  5 <split [31/30]> Slice005 <vfold [4 × 2]>
#>  6 <split [30/31]> Slice006 <vfold [4 × 2]>
#>  7 <split [31/31]> Slice007 <vfold [4 × 2]>
#>  8 <split [31/30]> Slice008 <vfold [4 × 2]>
#>  9 <split [30/31]> Slice009 <vfold [4 × 2]>
#> 10 <split [31/30]> Slice010 <vfold [4 × 2]>
#> # ℹ 177 more rows

Created on 2023-11-02 with reprex v2.0.2

To be clear, this is a bug and I opened #461 to fix it. I just think it's very funny that this bug winds up being "the argument to outside used too many characters". Who would have thought that was a thing that could happen?

Thanks for the excellent, excellent reprex -- made it so I could immediately see what was wrong, and I've stolen your example for #461 as a test to make sure this gets & stays fixed.

hfrick pushed a commit that referenced this issue Nov 3, 2023
…459) (#461)

* Paste the outputs of deparse() to guarantee a length-1 vector (#459)

deparse1() was added in 4.0.0, so copied the internals for back-compatibility.

* Style test
@hfrick
Copy link
Member

hfrick commented Nov 3, 2023

To echo Mike here: thanks a lot for the bug report with the excellent reprex! The fix is merged into the dev version 👍

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants