-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep IPCW results in the list column format predicted by the predict() methods #937
Conversation
(could be `NULL` but not `FALSE`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that it just expects the output of predict()
, that seems sensible. I fell down a rabbit hole a little bit while looking at this code so this review also contains elements that are likely out of scope for this PR. I wanted to jot them down though so we don't loose it.
Specifically on the unnesting/this PR:
The .filter_eval_time()
functions doesn't get used anymore now which means infinite time points can get through to the prodlim function which returns the probabilities of being censored (which makes it error). I supposed add_graf_weights_vec()
would be a good place to doctor with the eval_time
values.
Generally:
We should look a consistent handling of "non-standard" values for eval_time
, i.e. -Inf
, generally < 0
, Inf
(and NA
). Most predictions from censored are less restrictive than what's outlined in .filter_eval_time()
which I think is nice. survival and prodLim work (at least in parts) with negative values for time so I don't think it's such a big philosophical stretch to get to infinite values. Regardless, if we decide to restrict possible values for eval_time
rather than pad results appropriately, we should at least warn when we do that.
The other thing I noticed is that we are slightly modifying the censoring probabilities in two places:
predict.censoring_model_reverse_km()
adds an epsilon to the probabilities, the amount is data-derived, not set via an argument
and then, after that:
trunc_probs()
prevents very small probs, ie adds enough to make them at leasttrunc
So depending how the data is inside of predict.censoring_model_reverse_km()
we might be changing it to more than trunc
, so we might want to make the amount an argument to predict.censoring_model_reverse_km()
to give us control of the final/total amount.
Co-authored-by: Hannah Frick <hfrick@users.noreply.github.com>
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
The current code unnests the data. This is a problem if we also are using static predictions (like the predicted event time).
This PR keeps the list format and adds the required columns to each tibble within the list.