-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] Add scheduler address to dask config. #7581
Conversation
56d5ff5
to
b913c2f
Compare
76ed1cd
to
cff0733
Compare
7de4a15
to
c51eabe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't review this :) Can a dask person take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for the comment about how the RabitTracker constructor fallback is handled. Just don't want calls in an except block that can cause another exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dask things in here generally seem fine to me. A couple of comments about how the config system is being used.
# 2. Guessed by xgboost `get_host_ip` function | ||
# 3. From dask scheduler | ||
# We try 1 and 3 if 1 is available, otherwise 2 and 3. | ||
valid_config = ["scheduler_address"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When extending the Dask config system we typically create a new YAML with default values in and extend the Dask config with it.
See how we do it in dask-ctl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing, let me look into that.
for k in dconfig: | ||
if k not in valid_config: | ||
raise ValueError(f"Unknown configuration: {k}") | ||
host_ip: Optional[str] = dconfig.get("scheduler_address", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this dconfig
object is being passed around instead of accessing dask.config.get("xgboost.scheduler_address")
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't access it on the worker side for some reason (key not found). I read the doc from https://docs.dask.org/en/stable/configuration.html , it's not quite clear to me how to make sure everything is synced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also read this section https://docs.dask.org/en/stable/configuration.html#downstream-libraries . In XGBoost dask is loaded lazily so having dask in init might be less desirable due to the size of dependency. So I went on and read this:
However, downstream libraries may choose alternative solutions, such as isolating their configuration within their library, rather than using the global dask.config system. All functions in the dask.config module also work with parameters, and do not need to mutate global state.
So far using this parameter passing approach seems to be the easiest way to get 1 parameter in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Generally the Dask config should be available on the scheduler/workers but that can be deployment dependent.
- Add user configuration. - Bring back to the logic of using scheduler address from dask. This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.
[backport] [dask] Add scheduler address to dask config. (dmlc#7581)
See added doc for use case.