Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flexsurvspline support for survival_reg model #213

Merged
merged 15 commits into from
Nov 9, 2022

Conversation

mattwarkentin
Copy link
Contributor

Hi @hfrick,

This PR is a start for addressing #115.

I have marked the PR as a draft because I think some changes to {parsnip} are required before this PR should be merged into {censored}.

I look forward to your feedback and hopefully getting this integrated into censored eventually.

@hfrick
Copy link
Member

hfrick commented Sep 2, 2022

Hi @mattwarkentin , thanks for the PR!

Which changes in parsnip are you referring to? Let me know when you want feedback on the PR! (Now?)

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Sep 2, 2022

I am somewhat new to contributing to parsnip-adjacent packages, but I thought that an update to the parsnip documentation would be important for showing the flexsurvspline engine and tunable arguments in the documentation (i.e., num_knots, survival_link).

Basically flexsurvspline versions of:

But maybe I'm wrong.

Sure, happy to receive feedback on the PR any time!

Copy link
Member

@hfrick hfrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good already! Can you add tests though, please? Re parsnip changes: yes, the documentation updates need to happen in parsnip, like you highlighted already 👍

Next steps (at least what I can think of right now)

  • add unit tests to censored
  • PR to parsnip with documentation and making the engine args tunable (I left a more detailed comment on this on the code directly)
  • Update vignette/articles/examples.Rmd
  • Update README.Rmd
  • todo for myself: add tests on case weights
  • update NEWS, including PR number and contributor github name

R/survival_reg-data.R Outdated Show resolved Hide resolved
R/survival_reg-data.R Show resolved Hide resolved
@mattwarkentin
Copy link
Contributor Author

Sorry for dropping the ball on this. I was moving across the country for a new job and just totally got sidetracked. Want me to get this back on track, @hfrick?

@hfrick
Copy link
Member

hfrick commented Oct 25, 2022

No worries, there is no rush! If you want, that'd be great! The main things are the unit tests and the documentation over in parsnip.

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Oct 27, 2022

For me:

  • add unit tests to censored
  • PR to parsnip with documentation and making the engine args tunable (I left a more detailed comment on this on the code directly)
  • Update vignette/articles/examples.Rmd
  • Update README.Rmd
  • update NEWS, including PR number and contributor github name

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Oct 27, 2022

Okay @hfrick, we are getting close. The one thing I'm stumbling on is that there surely needs to be somewhere where we connect the fact that OUR parameter is called num_knots, while flexsurv::flexsurvspline() uses k.

I thought maybe I handled this when I made the PR to dials, but it doesn't look like it. Where do we make that mapping?

This seems like something handled by parsnip::set_model_arg(), but decided to remove that above...how do we do this??

@hfrick
Copy link
Member

hfrick commented Oct 28, 2022

Awesome!

If the argument you want to make tunable is a main argument, ie an argument directly to survival_reg(), you use set_model_arg() like you initially tried. Since k is an engine-specific argument, it's more subtle. This happens in the tunable() method for survival_reg() (which sits in parsnip) - you already modified the right things, the last missing bit was just to use the name of the arg as it is used in flexsurv. Then you have your link to dials and the tidymodels machinery can work. The changes you made in dials were to create the parameter object (tidymodels uses that to get possible parameter values for tuning). With the change in parsnip you link that to the engine argument and the machinery is connected.

Thanks for adding the tests! Could you update them so that they make use of the spline functionality? Then we know that this aspect also works! For survival probability and hazard, we don't need to test against rms if we test against flexsurv.

Re predictions of the linear predictor: I noticed in the test example that flexsurv returns negative values and censored then tries to log those, resulting in NaN. For flexsurvreg(), predictions by flexsurv are exp(x * beta), which is why censored logs them before returning them as predictions of type linear_pred (so that it's x * beta). How is that with flexsurvspline()? What exactly does it return, does it make sense to log here? See reprex below.

The main branch has changed a lot since this branch got started but we are not seeing any merge conflicts - maybbe that's because it's a draft PR so just as a heads up that we might still encounter some.

library(flexsurv)
#> Loading required package: survival

spline_fit <- flexsurvspline(
  Surv(time, status) ~ age + sex,
  data = lung
)
predict(spline_fit, lung[1:5,], type = "linear")
#> # A tibble: 5 × 2
#>   .time .pred_link
#>   <dbl>      <dbl>
#> 1     0      -7.63
#> 2     0      -7.72
#> 3     0      -7.92
#> 4     0      -7.90
#> 5     0      -7.85

non_spline_fit <- flexsurvreg(
  Surv(time, status) ~ age + sex,
  data = lung, dist = "weibull"
)
predict(non_spline_fit, lung[1:5,], type = "linear")
#> # A tibble: 5 × 2
#>   .time .pred_link
#>   <dbl>      <dbl>
#> 1     0       314.
#> 2     0       338.
#> 3     0       392.
#> 4     0       387.
#> 5     0       373.

Created on 2022-10-28 with reprex v2.0.2

@mattwarkentin
Copy link
Contributor Author

Could you update them so that they make use of the spline functionality? Then we know that this aspect also works! For survival probability and hazard, we don't need to test against rms if we test against flexsurv.

Tests are updated and removed the rms comparison.

How is that with flexsurvspline()?

A flexsurvspline model is just a flexsurvreg model with a different distribution so the predictions are made the same way with predict.flexsurvreg() which ultimately relies on the machinery of summary.flexsurvreg(). So unless something is wrong, they should be returning comparable statistics, just on a different scale. Am I misunderstanding?

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Oct 31, 2022

For flexsurvreg(), predictions by flexsurv are exp(x * beta), which is why censored logs them before returning them as predictions of type linear_pred (so that it's x * beta).

Are we sure this is the case? Maybe we should loop in @chjackson and get his input.

@chjackson
Copy link

Yes in flexsurv, predict(...,type="linear") returns the "fitted values of the location parameter" - understood as being on the natural scale, not logged. Perhaps I should disambiguate this doc.

@hfrick
Copy link
Member

hfrick commented Nov 2, 2022

Thanks @chjackson! So just to be super clear: this applies to the predict methods for both the models from flexsurvreg() and flexsurvspline()?

Where I'm coming from with this question: For the existing engine, which uses flexsurvreg(), censored logs the prediction values returned from flexsurv because it standardizes across engines and most other engines return on that scale. So now I'm asking if censored needs to do the same for this new flexsurvspline engine?

@chjackson
Copy link

Both flexsurvspline and flexsurvreg return objects of class "flexsurvreg", so the same predict method will be used.

Is there a confusion here since flexsurv models can be based on different parametric survival distributions? The meaning of "location parameter" depends on the distribution. Sometimes this parameter is defined to be positive (such as rate and scale parameters in the exponential, Weibull or gamma), and sometimes it is unrestricted (such as meanlog in the log-normal, and the gamma0 parameter in flexsurvspline). So the "natural scale" of the parameter could either be positive or unrestricted. The "transformed scale" used for estimation is the log scale for positive parameters, and the same as the natural scale for unrestricted parameters.

I guess flexsurv conflicts here with user expectations that predict(...,type="linear") methods should always return quantities on the unrestricted scale?

@chjackson
Copy link

FYI if this is helpful, for a parameter named "parname" in a fitted flexsurv model x, the function x$dlist$transforms[["parname"]]() transforms a parameter value from its natural scale to the unrestricted scale. This is nearly always either log or the identity transformation, depending on the model. x$dlist$location contains the name of the location parameter.

@mattwarkentin
Copy link
Contributor Author

@hfrick You may wish to use the functions in x$dlist$inv.transforms to get location parameters on the unrestricted scale. As Chris mentioned, it is often either identity() or log().

@chjackson
Copy link

No it's x$dlist$transforms to go from the natural scale to the unrestricted scale, and x$dlist$inv.transforms to go the other way.

@mattwarkentin
Copy link
Contributor Author

Oops, my mistake. Please ignore my previous comment.

@hfrick hfrick marked this pull request as ready for review November 9, 2022 15:38
@hfrick
Copy link
Member

hfrick commented Nov 9, 2022

@mattwarkentin @chjackson Thank you both; that's really helpful to know! I've now changed the transformation for the link/linear predictor in this commit 31fd6b9 - could you confirm that this is correct now?

Other than that, I think this PR is ready!

@chjackson
Copy link

Can't see anything wrong with the transformation procedure there

@mattwarkentin
Copy link
Contributor Author

LGTM!

@hfrick
Copy link
Member

hfrick commented Nov 9, 2022

Great, thanks so much both!

@hfrick hfrick merged commit 04478f1 into tidymodels:main Nov 9, 2022
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants