Creating Parsnip Model Functions #832

jonthegeek · 2022-10-27T19:22:00Z

I'm trying to grok the addition of a parsnip-friendly model function to a package. I'm specifically working to make our parsnip-friendly implementation in {tidybert} work, but I'd like to move it from "change this argument and see what happens" like I'm tending to do right now toward "do the thing because that's clearly the thing to do."

I'm trying to understand how I can actually make our parameters tunable. I'm getting an error similar to epochs in #815, which was I think mostly fixed via this baguette PR. I'm not sure I grok what happened nor if my issue is similar.

The (most) important tidybert code is in parsnip.R.

# remotes::install_github("macmillancontentscience/tidybert")
library(tidybert)
library(tidymodels)
library(modeldata)
tidymodels_prefer()

data(tate_text)
tate_text <- tate_text %>% 
  # We'll just work with the top 6 artists.
  dplyr::filter(
    artist %in% c(
      "Schütte, Thomas", "Zaatari, Akram", "Beuys, Joseph", "Ferrari, León", 
      "Kossoff, Leon", "Tillmans, Wolfgang"
    )
  ) %>% 
  dplyr::mutate(
    artist = as.factor(as.character(artist))
  )
set.seed(4242)
tate_folds <- rsample::vfold_cv(tate_text, v = 2)

tidybert_spec <- bert(
  mode = "classification",
  epochs = 1
) %>% 
  parsnip::set_engine(
    "tidybert", 
    bert_type = tune(),
    n_tokens = tune()
  )
tidybert_workflow <- workflows::workflow(
  artist ~ title,
  tidybert_spec
)

grid <- tidybert_workflow %>% 
  workflows::extract_parameter_set_dials() %>% 
  update(
    bert_type = bert_type(
      c("bert_small_uncased", "bert_tiny_uncased")
    ),
    n_tokens = n_tokens(c(4, 5))
  ) %>% 
  dials::grid_latin_hypercube(size = 1)

torch::torch_manual_seed(4242)
tidybert_result <- tidybert_workflow %>% 
  tune::tune_grid(
    resamples = tate_folds,
    grid = grid
  )
#> ! Fold1: preprocessor 1/1, model 1/1: The following arguments cannot be manually modified and were removed: be...
#> ! Fold2: preprocessor 1/1, model 1/1: The following arguments cannot be manually modified and were removed: be...
tidybert_result$.notes[[1]]$note
#> [1] "The following arguments cannot be manually modified and were removed: bert_type, n_tokens."

^{Created on 2022-10-27 with reprex v2.0.2}

jonthegeek · 2022-10-27T19:23:43Z

Oh, one of the things I don't grok is the model_info.R file in {baguette}. Where are those variables used? Or were they just for internal logging of things as you worked?

simonpcouch · 2022-10-28T19:55:34Z

re:

I'm getting an error similar to epochs in 815, which was I think mostly fixed via baguette#64. I'm not sure I grok what happened nor if my issue is similar.

and

Oh, one of the things I don't grok is the model_info.R file in {baguette}. Where are those variables used? Or were they just for internal logging of things as you worked?

My sense is that the fix in the relevant commit doesn't apply here. baguette is a parsnip extension with a good few tricks up its sleeves—I may recommend a more straightforward extension like bonsai or discrim to help with pattern matching. :)

My 6-months-into-this-gig-understanding:

The inconsistency I'm seeing here is that parsnip::set_model_arg is being used for registering engine arguments, rather than main arguments to the bert() model. If you add the arguments you register with set_model_arg as arguments to the actual bert() model definition, and then supply those tune() arguments to bert() rather than set_engine(), you'll get tuning "for free."

If you feel those arguments are indeed engine-specific, then you need not (i.e. should not) register them with parsnip::set_model_arg(), and can tune them via passing tune() to set_engine() as you've done here. Registering those arguments as able to be tuned, then, would happen in a tunable() method. See this PR for an example of making an engine argument tunable, as well as Hannah's explanation for how those pieces are connecting at the top of this comment. Since the bert() model definition lives in your package, you can define the tunable method there—no PR to parsnip needed. (Another consideration here, though: it's okay for some engines not to support some main model arguments. See penalty and the lm engine for linear_reg(), for instance.)

A PR incoming to yall in a moment that demonstrates my first suggestion. :)

[EDIT: corrected misleading info about tunable engine arguments]

jonthegeek · 2022-10-28T21:18:35Z

Yeah, I was thinking I'd probably move them up to model args, but wanted to grok what was happening before doing that. Thanks for the clarifications!

The reason I kept the engine separate is that conceivably we COULD integrate something like {text} that fits BERT models with a different engine... but getting it to work the way we defined things takes precedent, so we'll start there!

Thanks for the help!

simonpcouch · 2022-12-09T15:25:46Z

Will go ahead and close. Feel free to holler if you feel this issue ought to remain open!

jonthegeek · 2022-12-09T15:40:08Z

Will do! I don't work on this package as part of my day job anymore, but I still plan to dig into it before TOO long.

github-actions · 2022-12-24T00:53:10Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch mentioned this issue Oct 28, 2022

transition parsnip::set_model_args() inputs to main arguments macmillancontentscience/tidybert#13

Merged

simonpcouch closed this as completed Dec 9, 2022

github-actions bot locked and limited conversation to collaborators Dec 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating Parsnip Model Functions #832

Creating Parsnip Model Functions #832

jonthegeek commented Oct 27, 2022

jonthegeek commented Oct 27, 2022

simonpcouch commented Oct 28, 2022 •

edited

Loading

jonthegeek commented Oct 28, 2022

simonpcouch commented Dec 9, 2022

jonthegeek commented Dec 9, 2022

github-actions bot commented Dec 24, 2022

Creating Parsnip Model Functions #832

Creating Parsnip Model Functions #832

Comments

jonthegeek commented Oct 27, 2022

jonthegeek commented Oct 27, 2022

simonpcouch commented Oct 28, 2022 • edited Loading

jonthegeek commented Oct 28, 2022

simonpcouch commented Dec 9, 2022

jonthegeek commented Dec 9, 2022

github-actions bot commented Dec 24, 2022

simonpcouch commented Oct 28, 2022 •

edited

Loading