-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vec_proxy_missing() #1928
Add vec_proxy_missing() #1928
Conversation
Ok, I know I passionately advocated for this yesterday at conf, but after starting thinking about this again after a long break, I'm embarrassingly reconsidering the need for this (for my use case, at least). I realized that for my application, it may actually be more theoretically consistent for comparisons with missing values (including missing reasons) to result in missing values. In base R, we have: 1 == NA => NA similarly: 1 + NA => NA In interlacer, I say "operations on interlaced values are applied to the value channel". This way, when adding two interlaced vectors, only the values are added in the vector and missing values get automatically "masked out". This relies on: 1 + na("reason") => NA Therefore, equality should also only apply to the value channel, just like for missing values without reasons: 1 == na("reason") => NA But what I was asking for in this PR was the ability to compare missing reasons: na("reason1") == na("reason2") => false But this is actually a different type of comparison, and breaks the interlacer rule if we allow it! (that is, breaks the rule that operations apply on values, masking out missingness). Instead, we should think of it as a new kind of comparison that compares values AND missing values (rather than comparing values and masks out missing). So I think what I actually need is to define a new form of the na("reason1") %==% na("reason2") => false This operator would also make sense with unlabelled missingness (raw 1 %==% NA => false Anyway, tldr I need to think about this a little more… but I think we'll probably be able to close the PR! (if you have any theory input/perspective on this I'd really appreciate it but no pressure; I know y'all are super busy) Sorry for the flip flop on this! |
The interlacer rule is "operations are applied to the value channel" and missing values are automatically masked out. the original behavior did not follow this in order to allow for comparsion of missing reasons. Instead, for missing reason comparsions we will need to define new equality operators that compare both values and missing reason channels. See discussion here for more info: r-lib/vctrs#1928
The interlacer rule is "operations are applied to the value channel" and missing values are automatically masked out. the original behavior did not follow this in order to allow for comparsion of missing reasons. Instead, for missing reason comparsions we will need to define new equality operators that compare both values and missing reason channels. See discussion here for more info: r-lib/vctrs#1928
The interlacer rule is "operations are applied to the value channel" and missing values are automatically masked out. the original behavior did not follow this in order to allow for comparsion of missing reasons. Instead, for missing reason comparsions we will need to define new equality operators that compare both values and missing reason channels. See discussion here for more info: r-lib/vctrs#1928
The latest version of interlacer (v0.4.0) now has the behavior I describe above, which so far seems to be working well... I'm closing this issue now! |
I think it makes sense that all the vctrs ops operate on the underlying value, but then you could have special It's kind of like how in ivs you can do |
Good parallel! (and ivs looks super cool btw) The other nice part of the infix library(interlacer)
a <- c(1, 2, NA, 3)
b <- c(1, 2, NA, NA)
a == b
#> [1] TRUE TRUE NA NA
a %==% b
#> [1] TRUE TRUE TRUE FALSE
# This becomes handy if I want to see if the vectors are identical
# in values and in missing pattern:
all(a %==% b)
#> [1] FALSE
all(a %==% a)
#> [1] TRUE
# By contrast, if I naively use base `==` I get `NA`
all(a == b)
#> [1] NA
all(a == a)
#> [1] NA
# To get the same behavior as all(a %==% b) requires a bit of gymnastics:
all(ifelse(is.na(a == b), is.na(a) & is.na(b), a == b))
#> [1] FALSE Created on 2024-09-05 with reprex v2.0.2 As far as I'm aware, there aren't any other convenient ways to make this basic kind of value & missingness pattern comparison in base or rlang or vctrs... Right? Sure, there's |
a <- c(1, 2, NA, 3)
b <- c(1, 2, NA, NA)
a == b
#> [1] TRUE TRUE NA NA
vctrs::vec_equal(a, b)
#> [1] TRUE TRUE NA NA
vctrs::vec_equal(a, b, na_equal = TRUE)
#> [1] TRUE TRUE TRUE FALSE Created on 2024-09-05 with reprex v2.0.2 |
Ah nice! So I can simplify / generalize the definition of
(Note With this definition the operator becomes meaningful for all vctrs record types as a "record equal" (or "deep equal"?) operator, as opposed to the usual "value equal". Cool. For ivs, for example, this would allow you to compare intervals with one or more bounds missing. Interval vectors essentially can have 3 types of structural missingness: missing start, missing end, and missing both. So you'd have:
Compared to behavior of regular
This makes sense because when we test Of course right now in ivs when either the start or end value of the input is NA you force both to be NA (to avoid needing to deal with the complexity of incomplete intervals, which is totally valid). Still, it's interesting to think about applications of this operator outside of interlacer... If this has more broad application, it might be worth considering for inclusion into vctrs. Anyway, thanks for thinking this thru with me! I'll continue to stew, and pls don't hesitate to tag me if you see any more applications or parallels on anything else you're working on... |
vec_proxy_missing()
adds a new proxy that allows developers to extend theis.na()
,na.omit()
etc. behavior of their vctrs (as I describe in (#1925). It's useful/needed when vctrs contain special values (or data structures) that represent different types of missingness in their data. In these cases, you need a separate proxy forequal
andmissing
. It also lets you redefine the meaning ofcomplete
.I wanted to take an experimental shot at the implementation here because it didn't seem to hard, and I wanted to play around with the resulting API. I'm extremely satisfied with the result -- this is exactly what I need for implementing a generic vctrs-backed Result<> type in the library I'm working on, and I think could have similar applications elsewhere (like
haven::labelled_spss
, or declared)I've added a few tests and documentation updates, but it's only a first-pass. I would be happy to do a deeper & more thorough dive if this is considered for inclusion into vctrs.
Thank you for your consideration!