-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-organizing the MLJ stack #317
Comments
It is wonderful to see this modularisation of the stack ❤️ Thank you for the brainstorm, and for the initiative. I would like to take this opportunity and share my own experience regarding similar modularisation in GeoStats.jl. In my case, I decided to put everything in The downside of modularising a project that way is that The diagram that was brainstormed above makes a lot of sense to me. I only point out that some of these modules do not work standalone. For example, Regardless of the final modularisation, we will be able to collaborate much more now that the implementations and names are self-contained in smaller packages as opposed to the umbrella |
As another data point, I've been training However, I can get the component distributions to satisfy the MLJ interface. Then I use a custom sample-fit-combine approach to construct the ensemble. See this SampleFitCombine.jl package (unregistered) for details and examples. I'd love to be able to do this using |
Thanks for you query. I'm guessing that MLJ's Happy to discuss extensions. Contributions welcome. See https://alan-turing-institute.github.io/MLJ.jl/dev/homogeneous_ensembles/ (manual) and https://alan-turing-institute.github.io/MLJTutorials/pub/getting-started/ensembles-2.html (tutorial) |
Yes the homogenous ensembles provide some of the functionality. I think it could be more general, as per For example, see the test case which partitions the training data by clustering the predicted training data according to I'm happy to try and make contributions, but at this point I think it better to flesh out what it would look like first.
The |
@JockLawrie Continuing the EnsembleModel discussion at #363 |
Returning to the proposed re-organization: Based on the discussion here and elsewhere, I think the following is not too controversial: Pulling the composite model API down to MLJBase makes sense because implementers of the MLJ model interface may want to include versions of their models wrapped in pre-transformations of the inputs and target, for example (to provide versions that handle mixed data types) - or to get creative in other ways. So let's start with that:
|
Related discussion: #417 |
With auto-merge in place, the pains of rolling out breaking changes in the stack are greatly mitigated and I think the time is ripe for increasing modularisation. If nothing else, we could all benefit from a greater distribution of testing, as this is becoming increasingly painful.
Here are some suggestions to start the discussion rolling.
New repos
MLJManual: for the MLJ manual, modelled on the present MLJTutorials. Generation of documentation is a serious slowdown for testing in the present MLJ repo (Move the MLJ manual to new repo MLJManual #316)
MLJResampling: for the general resampling algorithm (
evaluate!
) and in-house resampling strategies (Holdout
,CV
, etc).MLJTuning: for the general tuning algorithm and in-house tuning strategies (Improve the tuning strategy interface #315).
MLJComposition: for the machine and composite model functionality (the two are currently integrated).
I suggest the raw interface for tuning strategies and resampling strategies live in MLJBase. Then out-of-house implementations of these strategies need only import MLJBase (and import MLJTuning for testing only).
Dependencies
A question remains how best to handle a composite model that one wants to add to the registry. If it is defined in an external package, then that package can have MLJComposition as a dependency. If, however, it is defined somewhere in MLJModels, then I guess MLJModels would have to add MLJComposition to its hard dependencies, which would be unfortunate.
Thoughts on any of this, anyone?
cc: @DilumAluthge
The text was updated successfully, but these errors were encountered: