-
-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create example for custom dask client #998
Comments
I stumbled on this issue and saw that it was opened just shortly after I was looking for this, what a lucky conincidence. I would be particularly interested in an example that uses dask to run smac on a slurm-based cluster. From looking at dask, this seems like an option, but I don't know how to use it: https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html |
Hi @FlorianPommerening , |
Hey @benjamc, it's great to see progress in this direction. I would suggest to also add an example that does not require a custom client, but rather a standard client, and shows how to connect manually spawned workers (in case someone doesn't have a SLURM cluster but still wants to do similar things). As a starting point one could have a look into this example in Auto-sklearn which can be easily adapted for SMAC. |
Thanks a lot @benjamc, that was super quick. |
Unfortunately, the example doesn't work on our cluster. I changed the name of the queue and increased the number of trials to 1000 and then ran the process on the login node of our cluster. I can see worker jobs spawning on the cluster but they don't seem to be doing anything. The work is all done on the login node instead ( When I look into the logs in
(edit: simplified long log since it is no longer relevant, see below.) |
Ok, I figured out that the nanny was not connecting to the workers because I had to specify the "interface" parameter. Otherwise, the public IP of the login nodes was used which does not accept connections. I now no longer see the error but the work still seem to be done exclusively on the login node. |
I managed to get it to work but I had to do additional changes:
Maybe some of those points are worth adding to the example. |
Hi, |
I could not reproduce the problem in the third point (the one about retries of the intensifier) but the second one (sleep before optimize) is reproducible for me. The script I used is available here: https://ai.dmi.unibas.ch/_experiments/pommeren/innosuisse/mwe/
I tried cutting the example down to the essentials but it still optimizes a model that relies on Gurobi and some instances that I unfortunately cannot share. If you have a simpler model, you want me to try instead, I can see if the error still occurs there. So far, if I execute the code as is, everything works fine, but if I remove the
The first warning about |
Hi Florian, The warning is just for information that we use the number of workers specified in the dask client and |
Thanks. This seems to help but it is somewhat complicated to test. I'll open new issues for the two problems as you suggested by email. |
Thanks for the issues, I will close this one then. :) |
No description provided.
The text was updated successfully, but these errors were encountered: