-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add keep argument to joins #330
Comments
The basic idea of how to solve this is we can use the data.table uses
library(data.table)
df1 <- data.table(a = c("a", "b", "c"), b = 1:3)
df2 <- data.table(a = c("a", "b"), c = 1:2)
# left_join with keep = TRUE
df2[df1, .(a.x = i.a, b = i.b, a.y = x.a, c = x.c), on = .(a), allow.cartesian = TRUE]
#> a.x b a.y c
#> <char> <int> <char> <int>
#> 1: a 1 a 1
#> 2: b 2 b 2
#> 3: c 3 <NA> NA In the case of left joins this will be useful without # left_join with keep = FALSE
df2[df1, .(a = i.a, b = i.b, c = x.c), on = .(a), allow.cartesian = TRUE]
#> a b c
#> <char> <int> <int>
#> 1: a 1 1
#> 2: b 2 2
#> 3: c 3 NA
For inner/right joins we'll only need to use the selection syntax if I'll just use an example of a right join here, but we can use the exact same selection syntax for inner joins. # right_join with keep = TRUE
df1[df2, .(a.x = x.a, b = x.b, a.y = i.a, c = i.c), on = .(a), allow.cartesian = TRUE]
#> a.x b a.y c
#> <char> <int> <char> <int>
#> 1: a 1 a 1
#> 2: b 2 b 2
I don't think we can implement this in library(data.table)
df1 <- data.table(a = c("a", "b", "c"), b = 1:3)
df2 <- data.table(a = c("a", "b"), c = 1:2)
unique_keys_df <- unique(rbindlist(list(
df1[, .(a)],
df2[, .(a)]
)))
# right_join with keep = TRUE
# Note: a__keep__ comes from df2.
## Join column is preserved from unique_keys_df for join in the next step
step_df <- df2[unique_keys_df, .(a__keep__ = x.a, c = x.c, a = i.a), on = .(a)]
# Another right_join onto step_df
# Don't need join cols from step_df,
## but need to rename join cols originally from df2.
## I appended these join cols with __keep__ to show which ones are needed
df1[step_df, .(a.x = x.a, b = x.b, a.y = a__keep__, c = i.c), on = .(a)]
#> a.x b a.y c
#> <char> <int> <char> <int>
#> 1: a 1 a 1
#> 2: b 2 b 2
#> 3: c 3 <NA> NA |
Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
Brief description of the problem
# insert reprex here
The text was updated successfully, but these errors were encountered: