Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sample-weight interface point? #177

Closed
ablaom opened this issue Jul 1, 2019 · 7 comments
Closed

Add sample-weight interface point? #177

ablaom opened this issue Jul 1, 2019 · 7 comments
Labels
enhancement New feature or request

Comments

@ablaom
Copy link
Member

ablaom commented Jul 1, 2019

Some supervised and unsupervised algorithms allow one to weight instances and currently there is no obvious interface for passing this information to theses algorithms

@ablaom ablaom added the enhancement New feature or request label Jul 1, 2019
@fkiraly
Copy link
Collaborator

fkiraly commented Jul 2, 2019

Dare I say ... tasks?

Of course one can also just add it to signatures wherever an X appears, and check whether row lengths agree.

@ablaom
Copy link
Member Author

ablaom commented Jul 19, 2019

I'm thinking:

  • add model trait supports_sample_weights to MLJBase, with default value false
  • when true, MLJBase.fit has extended signature: fit(model, verbosity, X, y, w)
  • when constructing a machine for such models the w is optional: machine(model, X, y, w) or machine(model, X, y) as before (for uniform weights).

As far as I can tell this breaks nothing. Note the length of mach.args for a machine is not constrained - it can be 1, 2 or 3 or whatever, so no problems there.

If people want to discuss contriving sample-weight support for non-supporting models (oversampling, and so forth), please open a new thread.

A test case already exists: the SVM models at https://github.com/alan-turing-institute/MLJModels.jl/blob/master/src/LIBSVM.jl supports weights (currently weights passed as a hyperparameter).

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 1, 2019

Small comment: I think the default/baseline case for the weighted version should be "ignore the weights". That way, every learner would be able to take weights as additional input, which should make building of learners easier and avoid an interface case distinction.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 1, 2019

If you want oversampling (and so forth), I'd consider these reduction strategies, hence wrappers (or more generally, first-order compositors).

@ablaom
Copy link
Member Author

ablaom commented Aug 1, 2019

@fkiraly I like your suggestion to make the fit signature the same for alll cases (w gets ignored when not used) but this will break all existing model implementations.

@ablaom
Copy link
Member Author

ablaom commented Aug 14, 2019

Evaluation (as opposed to training) now supports per-observation weights #206

@ablaom
Copy link
Member Author

ablaom commented Apr 29, 2020

Resolved a while ago.

@ablaom ablaom closed this as completed Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants