Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_inseq() changes values without warning (but it has to order levels only) #221

Closed
GegznaV opened this issue Nov 19, 2019 · 3 comments
Closed
Labels
bug an unexpected problem or unintended behavior tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day

Comments

@GegznaV
Copy link

GegznaV commented Nov 19, 2019

The actual behavior of function fct_inseq() is not documented well (see here) and the behavior of this function violates some principles of the Tidyverse:

  1. I expect the function to change the order of levels only. It must not modify the labels of the levels and must not modify the values of a variable (see Ex. A-1, C-1 and read the comments there).
  2. In some situations, the function fails, but it should not (see Ex. B-1).
  3. If something is changed in a variable, I expect at least a warning and not silent modifications.
  4. What I really expected, is the output similar to the result of gtools::mixedsort() (see Ex. A-2, B-2, C-2). Can this be implemented in forcats?
  5. I expect all peculiarities of the function to be well documented with a section of representative examples.
  6. Shouldn't there be a unit test that checks if all levels are included (despite the order of levels)?

The examples of fct_inseq() (A-1, B-1, C-1 shows actual and A-2, B-2, C-2 shows expected behavior of fct_inseq()):

my_fct_a <- factor(c("010", "001", "005"))

# (Ex. A-1): 
# 1. values are converted to NA;
# 2. leading zeros were dropped;
# 3. no warning was issued.
forcats::fct_inseq(my_fct_a)
#> [1] <NA> <NA> <NA>
#> Levels: 1 5 10

# (Ex. A-2): 
# The expected result for my_fct_a
forcats::fct_relabel(my_fct_a, gtools::mixedsort)
#> [1] 010 001 005
#> Levels: 001 005 010


my_fct_b <- factor(c("10", "001", "1", "23",  "05", "01"))

# (Ex. B-1):
# 1. Original values are different, but function fails with error
forcats::fct_inseq(my_fct_b)
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated

# (Ex. B-2)
# The expected result for my_fct_b
forcats::fct_relabel(my_fct_b, gtools::mixedsort)
#> [1] 10  001 05  23  1   01 
#> Levels: 001 01 1 05 10 23


my_fct_c <- factor(c("A10", 9, "A1", "B1",  6, 5, "A05", "01"))

# (Ex. C-1)
# 1) The levels of the result are different than the original values
# 2) some values are SILENTLY converted to NAs
forcats::fct_inseq(my_fct_c)
#> [1] <NA> 9    <NA> <NA> 6    5    <NA> <NA>
#> Levels: 1 5 6 9

# (Ex. C-2)
# The expected result for my_fct_c 
forcats::fct_relabel(my_fct_c, gtools::mixedsort)
#> [1] A10 9   A05 B1  6   5   A1  01 
#> Levels: 01 5 6 9 A1 A05 A10 B1

# (Ex. C-3)
# 1) mixed sorting (C-2) gives different result than regular sorting (C-3)
forcats::fct_relabel(my_fct_c, sort)
#> [1] A10 9   A1  B1  6   5   A05 01 
#> Levels: 01 5 6 9 A05 A1 A10 B1

Created on 2019-11-19 by the reprex package (v0.3.0)

For unit testing:

x <- factor(c("A10", 9, "A1", "B1",  6, 5, "A05", "01"))
testthat::expect_silent({x_after  <- forcats::fct_inseq(x)})
testthat::expect_true(all(levels(x_after) %in% levels(x)))
testthat::expect_true(all(unique(x_after) %in% unique(x)))
Session info ```r > sessioninfo::session_info("forcats") - Session info ------------------------------------------------------------------------------------------------- setting value version R version 3.6.1 (2019-07-05) os Windows 10 x64 system x86_64, mingw32 ui RStudio language (EN) collate English_United States.1252 ctype English_United States.1252 tz Europe/Helsinki date 2019-11-19
  • Packages -----------------------------------------------------------------------------------------------------
    package * version date lib source
    assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
    backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
    cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.1)
    crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
    digest 0.6.22 2019-10-21 [1] CRAN (R 3.6.1)
    ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
    fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
    forcats 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
    glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
    magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
    pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
    pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
    rlang 0.4.1.9000 2019-11-17 [1] Github (r-lib/rlang@88deeb2)
    tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
    utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0)
    vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1)
    zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
</details>
@batpigandme batpigandme added the tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day label Jan 21, 2020
@hadley
Copy link
Member

hadley commented Jan 21, 2020

The implementation of fct_inseq() looks rather different to fct_infreq() and fct_inorder(). To fix the problems above, I'd suggest that it needs to be rewritten to use lvls_reorder(), which will require a different strategy for non-numeric inputs (since the current strategy loses the connection between input and output levels).

Providing a natural sort is outside the scope of forcats, because it would have to pull in another dependency.

@hadley hadley closed this as completed Jan 21, 2020
@hadley hadley added the bug an unexpected problem or unintended behavior label Jan 21, 2020
@hadley hadley reopened this Jan 21, 2020
@robinsones
Copy link
Contributor

robinsones commented Jan 31, 2020

@hadley after talking it over with @njtierney, I'm thinking that fct_inseq() should throw an error when given levels that have leading 0s or letters, since it's not obvious how those should be sorted (or trivial to do it right) and that would be better than the current behavior of changing the labels and values sometimes. What do you think? I wasn't sure if that was what you meant by your last comment about natural sort being outside the scope of forcats

@hadley hadley closed this as completed in 874315b Feb 28, 2020
@hadley
Copy link
Member

hadley commented Feb 28, 2020

@robinsones unfortunately my response comes a bit late, but my intuition that this function should be using lvls_reorder() seems right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day
Projects
None yet
Development

No branches or pull requests

4 participants