add helper for bridging causal fits #955

simonpcouch · 2023-04-28T20:06:07Z

PR 1/3 to address tidymodels/tune#652. This PR implements the generic and the model_fit method--note that this method isn't dispatched to from the workflow method (and thus the tune_results method).

simonpcouch · 2023-04-28T20:30:33Z

R/weight_propensity.R

+
+  if (rlang::is_missing(data) || !is.data.frame(data)) {
+    abort("`data` must be the data supplied as the data argument to `fit()`.")
+  }


The only checks this PR makes on inputted data is that it's a data frame. This gives a window for folks to supply different data to fit() and weight_propensity(), probably resulting in uninformative errors at weight_propensity(). We could implement some functionality to "fingerprint" the training data, noting dims/column names or some other coarse set of identifying features, as a full hash would be too expensive to justify. Note that this is not an issue for the tune_results method (and its use of the workflow method), which accesses the training data via the underlying split and does not take a data argument.

simonpcouch · 2023-04-28T20:31:05Z

R/weight_propensity.R

+#' @export
+weight_propensity.model_fit <- function(object,
+                                        wt_fn,
+                                        .treated = object$lvl[2],


We need to pass .treated explicitly because the predicted probabilities depend on the treatment level and must match the argument supplied as .treated to propensity. This argument is roughly analogous to our event_level argument, where the event level is either "first" or "second" (rather than the actual level of the factor). Propensity not only parameterizes the argument differently, but the default is the second level, while ours is the first.

I've tried out a few different interfaces to this argument and don't feel strongly on how we can best handle this. We could alternatively add an event_level argument with the usual parameterization and then translate it to the right .treated level when interfacing with propensity. We could also ask that propensity changes the default/parameterization, though there is some information loss in the multi-level setting, and we need to translate "first"/"second" back to the level anyway at predict() (see L81-82).

Note that the current form of that argument is not checked / tested, pending a decision on how we want it to feel.

I think that's worth discussing with Lucy and Malcolm

@LucyMcGowan and @malcolmbarrett:

We're currently working on a set of PRs to better accommodate causal workflows in tidymodels via a helper, weight_propensity(), that bridges the propensity and outcome models during resampling. Some description of the bigger picture is here. The interface feels something like this:

fit_resamples( propensity_workflow, resamples = bootstraps(data), control = control_resample(extract = identity) ) %>% weight_propensity(wt_ate, ...) %>% fit_resamples( outcome_workflow, resamples = . )

where the second argument to weight_propensity() is a propensity weighting function and further arguments are passed to that function—the helper handles the arguments .propensity and .exposure internally. The result of weight_propensity() in the above case is what the output of bootstraps(data %>% mutate(.wts = wt_ate(...))) would look like, where .propensity and .exposure are handled internally for the user.

There's surely lots to digest here, but do you have opinions on how we should open up the interface to the .treated argument? Feel free to give me a holler if you'd appreciate additional context. :)

This is all awesome!

A few comments:

we noticed that in propensity we also use the .treated terminology but now think that's a poor idea because not everything is a treatment. So, we're going to change that to .exposed, and I think it should probably be that here, too.

One issue with that language and with that default value for what is currently .treated is that it only applies to binary and multiclass variables. For continuous variables, there won't be a default level. If this really becomes an issue to make it fit nicely, we're actually moving away from PS models for continuous exposures because of some mathematical issues with them.

As for the default value, we do pick the second level because we assume 0 is unexposed and 1 is exposed, but that's mostly to do with what a common logistic model spec looks like. I'm going back to propensity soon and would be happy to work with you all to make this all consistent.

simonpcouch · 2023-04-28T20:31:52Z

R/weight_propensity.R

+#' [workflow][workflows::workflow()], or
+#' tuning results (`?tune::fit_resamples`) object. If a tuning result, the
+#' object must have been generated with the control argument
+#' (`?tune::control_resamples`) `extract = identity`.


Another option would be to instead require save_pred = TRUE, but we couldn't make use of weight_propensity.workflow in that case. This approach is a bit more DRY.

This feels like a little bit of a rough edge to me. I'm not sure we need to sand over it right now in terms of the interface but I would add more documentation, especially on the "main" doc page in tune, which currently only mentions this in an example. What about adding a sentence to the Details section, explaining why this needs to be set like this? I think that would help people remember.

hfrick

Nice! Most of this interface/bridge now feels buttery-smooth, well done!

hfrick · 2023-05-01T08:48:44Z

R/weight_propensity.R

+#' @param wt_fn A function used to calculate the propensity weights. The first
+#' argument gives the predicted probability of exposure, the true value for
+#' which is provided in the second argument. See `?propensity::wt_ate()` for
+#' an example.


I didn't find that second sentence the easiest to read with the "the true value for which". Is the following correct?

Suggested change

#' @param wt_fn A function used to calculate the propensity weights. The first

#' argument gives the predicted probability of exposure, the true value for

#' which is provided in the second argument. See `?propensity::wt_ate()` for

#' an example.

#' @param wt_fn A function used to calculate the propensity weights. The first

#' argument gives the predicted probability of exposure, the second argument

#' gives the true value of exposure. See `?propensity::wt_ate()` for

#' an example.

hfrick · 2023-05-01T09:26:03Z

R/weight_propensity.R

+  # TODO: I'm not sure we have a way to identify `y` via a model
+  #       spec fitted with `fit_xy()`---this will error in that case.


When we started out with the "censored regression" mode, we required models to be fit via the formula interface, i.e., fit(), and had fit_xy() throw an error saying to use fit() for that mode.

In that spirit, you could add an error here to point people towards fit().

hfrick · 2023-05-01T10:07:09Z

R/weight_propensity.R

+#' [workflow][workflows::workflow()], or
+#' tuning results (`?tune::fit_resamples`) object. If a tuning result, the
+#' object must have been generated with the control argument
+#' (`?tune::control_resamples`) `extract = identity`.


This feels like a little bit of a rough edge to me. I'm not sure we need to sand over it right now in terms of the interface but I would add more documentation, especially on the "main" doc page in tune, which currently only mentions this in an example. What about adding a sentence to the Details section, explaining why this needs to be set like this? I think that would help people remember.

hfrick · 2023-05-01T10:15:40Z

R/weight_propensity.R

+#' @export
+weight_propensity.model_fit <- function(object,
+                                        wt_fn,
+                                        .treated = object$lvl[2],


I think that's worth discussing with Lucy and Malcolm

add helper for bridging causal fits

0365498

simonpcouch mentioned this pull request Apr 28, 2023

add helper for bridging causal fits tidymodels/workflows#199

Open

name .treated

540d5e8

simonpcouch mentioned this pull request Apr 28, 2023

add helper for bridging causal fits tidymodels/tune#679

Open

add pkgdown entry

dc7898d

simonpcouch commented Apr 28, 2023

View reviewed changes

simonpcouch requested review from hfrick and topepo April 28, 2023 21:08

hfrick reviewed May 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add helper for bridging causal fits #955

add helper for bridging causal fits #955

simonpcouch commented Apr 28, 2023

simonpcouch Apr 28, 2023

simonpcouch Apr 28, 2023 •

edited

Loading

hfrick May 1, 2023

simonpcouch May 8, 2023

malcolmbarrett May 26, 2023 •

edited

Loading

simonpcouch Apr 28, 2023

hfrick May 1, 2023

hfrick left a comment

hfrick May 1, 2023

hfrick May 1, 2023

hfrick May 1, 2023

hfrick May 1, 2023

		# TODO: I'm not sure we have a way to identify `y` via a model
		# spec fitted with `fit_xy()`---this will error in that case.

add helper for bridging causal fits #955

Are you sure you want to change the base?

add helper for bridging causal fits #955

Conversation

simonpcouch commented Apr 28, 2023

Choose a reason for hiding this comment

simonpcouch Apr 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

malcolmbarrett May 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hfrick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonpcouch Apr 28, 2023 •

edited

Loading

malcolmbarrett May 26, 2023 •

edited

Loading