-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add helper for bridging causal fits #955
base: main
Are you sure you want to change the base?
Conversation
|
||
if (rlang::is_missing(data) || !is.data.frame(data)) { | ||
abort("`data` must be the data supplied as the data argument to `fit()`.") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only checks this PR makes on inputted data is that it's a data frame. This gives a window for folks to supply different data to fit()
and weight_propensity()
, probably resulting in uninformative errors at weight_propensity()
. We could implement some functionality to "fingerprint" the training data, noting dims/column names or some other coarse set of identifying features, as a full hash would be too expensive to justify. Note that this is not an issue for the tune_results
method (and its use of the workflow method), which accesses the training data via the underlying split and does not take a data
argument.
#' @export | ||
weight_propensity.model_fit <- function(object, | ||
wt_fn, | ||
.treated = object$lvl[2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to pass .treated
explicitly because the predicted probabilities depend on the treatment level and must match the argument supplied as .treated
to propensity. This argument is roughly analogous to our event_level
argument, where the event level is either "first" or "second" (rather than the actual level of the factor). Propensity not only parameterizes the argument differently, but the default is the second level, while ours is the first.
I've tried out a few different interfaces to this argument and don't feel strongly on how we can best handle this. We could alternatively add an event_level
argument with the usual parameterization and then translate it to the right .treated
level when interfacing with propensity. We could also ask that propensity changes the default/parameterization, though there is some information loss in the multi-level setting, and we need to translate "first"/"second" back to the level anyway at predict()
(see L81-82).
Note that the current form of that argument is not checked / tested, pending a decision on how we want it to feel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's worth discussing with Lucy and Malcolm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucyMcGowan and @malcolmbarrett:
We're currently working on a set of PRs to better accommodate causal workflows in tidymodels via a helper, weight_propensity()
, that bridges the propensity and outcome models during resampling. Some description of the bigger picture is here. The interface feels something like this:
fit_resamples(
propensity_workflow,
resamples = bootstraps(data),
control = control_resample(extract = identity)
) %>%
weight_propensity(wt_ate, ...) %>%
fit_resamples(
outcome_workflow,
resamples = .
)
where the second argument to weight_propensity()
is a propensity weighting function and further arguments are passed to that function—the helper handles the arguments .propensity
and .exposure
internally. The result of weight_propensity()
in the above case is what the output of bootstraps(data %>% mutate(.wts = wt_ate(...)))
would look like, where .propensity
and .exposure
are handled internally for the user.
There's surely lots to digest here, but do you have opinions on how we should open up the interface to the .treated
argument? Feel free to give me a holler if you'd appreciate additional context. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all awesome!
A few comments:
- we noticed that in propensity we also use the
.treated
terminology but now think that's a poor idea because not everything is a treatment. So, we're going to change that to.exposed
, and I think it should probably be that here, too. - One issue with that language and with that default value for what is currently
.treated
is that it only applies to binary and multiclass variables. For continuous variables, there won't be a default level. If this really becomes an issue to make it fit nicely, we're actually moving away from PS models for continuous exposures because of some mathematical issues with them. - As for the default value, we do pick the second level because we assume 0 is unexposed and 1 is exposed, but that's mostly to do with what a common logistic model spec looks like. I'm going back to propensity soon and would be happy to work with you all to make this all consistent.
#' [workflow][workflows::workflow()], or | ||
#' tuning results (`?tune::fit_resamples`) object. If a tuning result, the | ||
#' object must have been generated with the control argument | ||
#' (`?tune::control_resamples`) `extract = identity`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to instead require save_pred = TRUE
, but we couldn't make use of weight_propensity.workflow
in that case. This approach is a bit more DRY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a little bit of a rough edge to me. I'm not sure we need to sand over it right now in terms of the interface but I would add more documentation, especially on the "main" doc page in tune, which currently only mentions this in an example. What about adding a sentence to the Details section, explaining why this needs to be set like this? I think that would help people remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Most of this interface/bridge now feels buttery-smooth, well done!
#' @param wt_fn A function used to calculate the propensity weights. The first | ||
#' argument gives the predicted probability of exposure, the true value for | ||
#' which is provided in the second argument. See `?propensity::wt_ate()` for | ||
#' an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find that second sentence the easiest to read with the "the true value for which". Is the following correct?
#' @param wt_fn A function used to calculate the propensity weights. The first | |
#' argument gives the predicted probability of exposure, the true value for | |
#' which is provided in the second argument. See `?propensity::wt_ate()` for | |
#' an example. | |
#' @param wt_fn A function used to calculate the propensity weights. The first | |
#' argument gives the predicted probability of exposure, the second argument | |
#' gives the true value of exposure. See `?propensity::wt_ate()` for | |
#' an example. |
# TODO: I'm not sure we have a way to identify `y` via a model | ||
# spec fitted with `fit_xy()`---this will error in that case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we started out with the "censored regression" mode, we required models to be fit via the formula interface, i.e., fit()
, and had fit_xy()
throw an error saying to use fit()
for that mode.
In that spirit, you could add an error here to point people towards fit()
.
#' [workflow][workflows::workflow()], or | ||
#' tuning results (`?tune::fit_resamples`) object. If a tuning result, the | ||
#' object must have been generated with the control argument | ||
#' (`?tune::control_resamples`) `extract = identity`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a little bit of a rough edge to me. I'm not sure we need to sand over it right now in terms of the interface but I would add more documentation, especially on the "main" doc page in tune, which currently only mentions this in an example. What about adding a sentence to the Details section, explaining why this needs to be set like this? I think that would help people remember.
#' @export | ||
weight_propensity.model_fit <- function(object, | ||
wt_fn, | ||
.treated = object$lvl[2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's worth discussing with Lucy and Malcolm
PR 1/3 to address tidymodels/tune#652. This PR implements the generic and the
model_fit
method--note that this method isn't dispatched to from theworkflow
method (and thus thetune_results
method).