-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Hook to implement early stopping #1223
Comments
I agree that a |
Hi @thomashlvt, I think this would be an excellent feature especially if it's completely optional. I will have a look tomorrow and point you to the relevant places to consider. There are some challenges to this though which essentially means your stopping criteria won't fully reflect the final performance.
The early-stopping criteria would only be able to consider individual model performance and metrics, with no information of how strong an ensemble created at that point would be. To create an ensemble after each model evaluation would be too costly. If you think this would be enough for your feature idea, here are some extra question so I can give you the information you would need!
We are working on a contribution guide which should be here soon and should help! |
Hi @thomashlvt, After looking at how one might implement this and talking to some colleagues, it appears it's not as easy as it seems. Autosklearn relies on a library called SMAC for performing the model search. The two ways to currently stop SMAC are:
I agree this use case seems to be something we should support so if you were very keen to get this feature working we would be very appreciative! We'd also be happy to help you work through this. I've written about a possible solution to do so. SolutionThere is a mechanism within autosklearn to pass callbacks to SMAC with the argument Therefore, the solution I see would be to have SMAC look at the return values of the callbacks and use this in some way to either tell SMAC to continue or stop. |
Hi @eddiebergman - I'm happy you're also excited about this feature! Are you familiar with the internal code of autosklearn? Have you contributed to open-source github projects before? Solution I can start working on a PR in the SMAC3 repository. What interface do you suggest?
I think interrupting the loop itself can be easily done by setting the Thanks! |
I have created a PR in SMAC :) |
Ello again @thomashlvt, glad to see you could get it done so quickly! I think the boolean method you implemented should be fine but I am not familiar with the full development of SMAC. You should be able to work off that branch quite safely without any API changes to test an early stopping mechanism with AutoSklearn. Some more pointer about what I think would make an excellent PR! ImplementationI'm not sure anything would actually be need to put into AutoSklearn as the callbacks are already forwarded. If you do however it would make sense to put it into the AutoML class. Unit TestsFor creating unit tests, you could put that in test/test_automl.py
DocumentationWe would need some documentation on this functionality
Please let me know if there's any more information you need! As a side note, I think this feature will actually make testing some other components of autosklearn easier, being able to interrupt it as required :) |
Hi there! Thanks for the additional pointers, I'm looking forward to trying to make this work :) I indeed think, given the change in SMAC, that there have to be no additional changes to make this work, besides actually implementing the callback. Would you suggest to also put this callback into the actual codebase and expose it in the API somehow, or only test it and document it in the examples? |
I think for clarity sake, it might make sense to expose a specific parameter, something like The last decision is then whether it should be an argument to |
Wouldn't it maybe suffice to rename the current argument and improve its documentation? |
That's a much better idea! I think having the dedicated doc section in SMAC that we can link to would make it a lot easier. A simple description of the callback here, one or two use cases and then simply refer to SMAC documentation for more details. |
Raising this in a new issue so we can clearly state what needs to be done on our end. |
Short Question Description
I would like to implement a hook for the user to be able to implement his own stopping strategy. Is this interesting for you? How would I go about implementing the hook myself?
Context Information
First off, I really like the project and I'm very much impressed with what you have accomplished. The autoML engine works extremely well for our use cases.
Different datasets require different training lengths. In some cases, I noticed that the autoML engine already finds the most optimal configuration in a matter of seconds, whereas for others, longer training times do benefit the performance. Without knowing the dataset in advance, it is hard to find the optimal training time - we are trying to minimize computation time while keeping the same model performance.
This could be done by providing a hook to the user that is called after every new model is trained. I would then check the new model's performance with the best one so far, and make a heuristic decision to continue training based on this. The
time_left_for_this_task
would still be the maximum training time - the hook would thus implement some sort of early stopping strategy.Similar Work
I did not find a similar example/tutorial in the documentation, nor a similar GitHub Issue.
The text was updated successfully, but these errors were encountered: