-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set up correct OHE labels for subsets that use default model labels #236
Conversation
✅ Deploy Preview for silly-keller-664934 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nicer to me!
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #236 +/- ##
======================================
Coverage 87.2% 87.2%
======================================
Files 28 28
Lines 1961 1962 +1
======================================
+ Hits 1710 1711 +1
Misses 251 251
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #234
#229 introduced a bug whereby the new columns added to the labels file were in a different order than what is on the model. This PR fixes that by setting up the correct one hot encoded labels in the
preprocess_labels
validator rather thaninstantiate_model
. Using theuse_default_model_labels
, we know whether the labels file should contain columns (with all zeroes) for species that are not present in the labels but are on the base model. Using a pd.Categorical beforeget_dummies
allows us to generate these columns.Running
zamba train --config tests/assets/sample_train_config.yaml
now works; the labels file has three species present in zamba but trains a model that outputs the full set of 32 labels.