Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prediction for rank-deficient fit results in .pred_res outcome name #985

Closed
EmilHvitfeldt opened this issue Jul 3, 2023 · 1 comment · Fixed by #987
Closed

prediction for rank-deficient fit results in .pred_res outcome name #985

EmilHvitfeldt opened this issue Jul 3, 2023 · 1 comment · Fixed by #987
Labels
bug an unexpected problem or unintended behavior

Comments

@EmilHvitfeldt
Copy link
Member

As the title says, when you predict from a rank-deficient fit, in 4.3.0 or later, the output of predict.lm() comes with an attribute attr(*, "non-estim"), which messes up the prediction coming from {parsnip}, giving it an column name of .pred_res instead of .pred. Which THEN turns into errors for most of the {tune} functions. Should be a fairly easy fix.

I found it when working on tidymodels/workshops#105.

R 4.2.0

data <- data.frame(
  y = c(1,2,3,4), 
  x1 = c(1,1,2,3), 
  x2 = c(3,4,5,2), 
  x3 = c(4,2,6,0), 
  x4 = c(2,1,3,0)
)
data2 <- data.frame(
  x1 = c(3,2,1,3),
  x2 = c(3,2,1,4),
  x3 = c(3,4,5,1),
  x4 = c(0,0,2,3)
)

lm(y ~ ., data = data) |> 
  predict(data2)
#> Warning in predict.lm(lm(y ~ ., data = data), data2): prediction from a
#> rank-deficient fit may be misleading
#>          1          2          3          4 
#>  3.8888889  1.7777778 -0.3333333  4.8888889

library(parsnip)

linear_reg() |>
  fit(y ~ ., data = data) |>
  predict(new_data = data2)
#> Warning in predict.lm(object = object$fit, newdata = new_data, type =
#> "response"): prediction from a rank-deficient fit may be misleading
#> # A tibble: 4 × 1
#>    .pred
#>    <dbl>
#> 1  3.89 
#> 2  1.78 
#> 3 -0.333
#> 4  4.89

R 4.3.0

data <- data.frame(
  y = c(1,2,3,4), 
  x1 = c(1,1,2,3), 
  x2 = c(3,4,5,2), 
  x3 = c(4,2,6,0), 
  x4 = c(2,1,3,0)
)
data2 <- data.frame(
  x1 = c(3,2,1,3),
  x2 = c(3,2,1,4),
  x3 = c(3,4,5,1),
  x4 = c(0,0,2,3)
)

lm(y ~ ., data = data) |> 
  predict(data2)
#> Warning in predict.lm(lm(y ~ ., data = data), data2): prediction from
#> rank-deficient fit; attr(*, "non-estim") has doubtful cases
#>          1          2          3          4 
#>  3.8888889  1.7777778 -0.3333333  4.8888889 
#> attr(,"non-estim")
#> 1 2 3 4 
#> 1 2 3 4

library(parsnip)

linear_reg() |>
  fit(y ~ ., data = data) |>
  predict(new_data = data2)
#> Warning in predict.lm(object = object$fit, newdata = new_data, type =
#> "response"): prediction from rank-deficient fit; attr(*, "non-estim") has
#> doubtful cases
#> # A tibble: 4 × 1
#>   .pred_res
#>       <dbl>
#> 1     3.89 
#> 2     1.78 
#> 3    -0.333
#> 4     4.89

Created on 2023-07-02 with reprex v2.0.2

@github-actions
Copy link

github-actions bot commented Aug 2, 2023

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
1 participant