Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task cbind breaks when task's backend has primary_key different to ..row_id #961

Closed
sebffischer opened this issue Aug 31, 2023 · 3 comments · Fixed by #1079
Closed

Task cbind breaks when task's backend has primary_key different to ..row_id #961

sebffischer opened this issue Aug 31, 2023 · 3 comments · Fixed by #1079
Assignees

Comments

@sebffischer
Copy link
Member

sebffischer commented Aug 31, 2023

library(mlr3verse)
#> Loading required package: mlr3
library(data.table)

d = data.table(
  x = factor(letters[1:10]),
  y = rnorm(10),
  my_key = 1:10
)

backend = as_data_backend(d, primary_key = "my_key")

task = as_task_regr(backend, target = "y")

learner = as_learner(ppl("robustify") %>>% lrn("regr.rpart"))

learner$train(task)
#> Error: All backends to rbind must have the primary_key 'my_key'
#> This happened PipeOp encode's $train()

Created on 2023-08-31 with reprex v2.0.2

@mb706
Copy link
Collaborator

mb706 commented Aug 31, 2023

probably an issue with Task$cbind()

@sebffischer sebffischer transferred this issue from mlr-org/mlr3pipelines Aug 31, 2023
@sebffischer
Copy link
Member Author

When a data.frame is passed to Task$cbind as_data_backend.data.frame is called which automtically sets the primary key to ..row_id

@sebffischer sebffischer changed the title PipeOpFeatureUnion does not work when underlying backend has primary_key different to ..row_id Task cbind breaks when task's backend has primary_key different to ..row_id Aug 31, 2023
@sebffischer
Copy link
Member Author

sebffischer commented Aug 31, 2023

we could handle both cases:

  1. A data.frame is passed to $cbind() --> then we create the primary_key under the name of the existing primary_key
  2. A backend is passed to $cbind() --> then we can call DataBackendRename in case the primary key's don't match and the primary key of the task's backend is not a column name in the backend passed to $cbind().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants