Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLEP006 on Sample Properties #16

Merged
merged 16 commits into from
Jun 29, 2020
Merged

SLEP006 on Sample Properties #16

merged 16 commits into from
Jun 29, 2020

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Mar 7, 2019

No description provided.

@adrinjalali
Copy link
Member

and (somehow) move slep004 to rejected?

validation
* (maybe in scope) passing sample properties (e.g. `sample_weight`) to some
scorers and not others in a multi-metric cross-validation setup
* (likley out of scope) passing sample properties to non-fit methods, for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this particularly "harder" to implement than the others?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well firstly the use cases for it will need further definition; we don't currently pass around anything like weights to predict or transform methods. But yes it is hard in part because we have fused method like fit_transform

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I guess the sample_weight usecases maybe less frequent than things which take gender or race into account. There a predict may be some postprocessing on the output of another estimator based on these sample properties.

@adrinjalali
Copy link
Member

Could I somehow be of any help here @jnothman ?

@SSaishruthi
Copy link

Hi,

Is there any place that I can help with this SLEP to move forward so that we can accelerate AIF360 and scikit-learn integration?

The issue has been referenced above.

@animeshsingh

@animeshsingh
Copy link

@adrinjalali @jnothman any updates on this will be useful - something we need in the context of AIF360 work we are doing?

@jnothman
Copy link
Member Author

jnothman commented Jul 1, 2019 via email

@jnothman
Copy link
Member Author

jnothman commented Aug 5, 2019

I consider each of the solutions here a family of solutions, rather than an entirely specific syntax. The way forward involves defining a possible syntax for each, then coding up each of the test cases for each solution.

@jnothman
Copy link
Member Author

jnothman commented Aug 5, 2019

Apparently I was pushing to the wrong remote...


* we are able to consider kwargs to `fit` that are not sample-aligned, so that
we can add further functionality (some that have been proposed:
`with_warm_start`, `feature_names_in`, `feature_meta`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like feature_names_in_ would be through NamedArray?

Benefits:

* This will not need to affect legacy estimators, since no props will be
passed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also think about the cases where sample_weights is not passed now, but that's because we've forgotten to pass it along (not sure how many of those we have left though).

* The implementation changes in meta-estimators may be easy to provide via a
helper or two (perhaps even `call_with_props(method, target, props)`).
* Easy to reconfigure what props an estimator gets in a grid search.
* Could make use of existing `**fit_params` syntax rather than introducing new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda orthogonal to this solution, isn't it? I still like the idea of passing things that are not sample aligned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's orthogonal, but the benefit of reusing **fit_params syntax is that legacy estimators inheriting from BaseEstimator will automatically work; no need to require that they accept props.

Also, because Solution 4 requires estimators to explicitly declare that a fit parameter is a sample prop, it's possible (at least in theory) for other fit parameters to not be sample-aligned, and to be passed by some other mechanism...

from sklearn.metrics import accuracy

# Case A:
weighted_acc = make_scorer(accuracy, request_props=['sample_weight'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think having the same argument in the __init__ of all estimators would be a bad idea.

We can also probably call it sample_props instead. It'd be probably more intuitive for the users, or even required_sample_props.

@adrinjalali
Copy link
Member

Awesome, I really like solution 4.

@NicolasHug
Copy link
Member

What's the status of this? Does it need more reviews? LMK if I can help in any way

@jnothman
Copy link
Member Author

jnothman commented Oct 7, 2019

What's the status of this? Does it need more reviews? LMK if I can help in any way

It needs to go from conceptual approaches to example code of each use case... I'm unlikely to find time this month.

Copy link

@hermidalc hermidalc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also important use cases for needing to pass test sample metadata to transform calls. One use case that I’m needing it for is batch effect correction methods, where I need to know which batch each test sample is in along with the fitted train parameters to perform the test transform.

I’ve looked over the BaseSearchCV and Pipeline codebase and a possible issue with using a multiple **kwargs argument like **transform_params is that how do we keep it separate from the existing **predict_params?

This might be one argument for using a single kwarg such as sample_prop across the board.

@hermidalc
Copy link

hermidalc commented Oct 22, 2019

Also forgot in relation to my comment above for needing to pass test metadata, this is also related to your bullet point on needing to pass test sample_weight to scorers during CV. Looking at https://github.com/scikit-learn/scikit-learn/blob/5c9f0906102e4677b045744a24228b6c57a6c471/sklearn/model_selection/_validation.py#L490-L493 the information is there where the code to split **fit_params for train indices should also be done for test indices and then that passed to _score. Either way we need to pass test sample metadata through here so that scorer can pass it to all the predict-like methods.

Though again what do we call these... **predict_params or **transform_params? Unlike **fit_params the appropriate name isn’t obvious. Some test sample metadata might be used in transform only and some in predict only but all predict-like methods will need to pass them thru.

@jnothman
Copy link
Member Author

I'm keen to push this soon towards vote, so that we can consider @adrinjalali's PR for v0.25 (2021Q2). Needs some review. Then I can review Successive Halving, and the world will be a better place. Are you with me, @adrinjalali and @hermidalc?

@jnothman jnothman marked this pull request as ready for review June 18, 2020 06:08
@adrinjalali
Copy link
Member

Yeah I'm down.

Comment on lines +356 to +360
Having considered the above solutions, we propose:

TODO

* which solution?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we leaning toward solution 4 and using prop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're leaning towards solution 4, and I personally prefer **kwargs kinda thing rather than a dict.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using an implementation based on @jnothman's solution 3 in my own sklearn extensions and code for a long time. I also personally prefer **kwargs over dict. One thing that I've found to be crucial is that whatever framework/API is implemented it needs to support which properties you want to pass to each of fit/transform/predict/score etc. Sometimes you only want it to go to fit, sometimes both fit and transform, sometimes only predict, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation in scikit-learn/scikit-learn#16079 supports this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax I've currently got in this proposal for Soln 4 currently doesn't cover the "send to transform only" case. Adrin has adopted something more like:

trs = MyTrs().set_props_request(
    None).set_props_request(
    {'fit': {'new_param': 'my_sw'}})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, I've not had a use case yet where you need to send a param only to transform, but I've had all the others. Though I would say, and you guy know this better than I do, that it's maybe a good idea to handle use cases not yet encountered... I'm not sure up to you. Intuitively I don't see a case where a param goes to transform and not to fit, but who knows.

BTW does this Solution 4 handle fit_transform properly? This is something I've had to handle extending from @jnothman's Solution 3

Copy link
Member

@amueller amueller Jun 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally forgot how this is made easier by having kwarg only arguments! It might have actually been one of the initial motivations lol.

Copy link
Member Author

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice for this to be merged and rendered, and available for more incremental improvements soon. Not sure what the right time/process for merging is.

@adrinjalali
Copy link
Member

We haven't passed it, but we kinda sorta agreed that if the author and another maintainer are happy with it being merged in "draft" status. So I'm happy to merge and work on it on separate issues.

@jnothman
Copy link
Member Author

jnothman commented Jun 29, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants