-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask arrays without distributed? #6652
Comments
No, currently we do not support training with a Dask array unless you set up a |
You will have to load the dataset into memory if using xgboost. |
Thanks for your answers and making this a feature request :) |
We can probably integrate it with the internal |
I looked into this issue again. For using sklearn interface, loading the whole dataset is required. The only way to avoid loading the whole dataset is using external memory in XGBoost. For a quick start see https://github.com/dmlc/xgboost/blob/master/demo/guide-python/external_memory.py . I wrote a simple wrapper for dask array during the development of this new interface. The idea is to get the chunks of data by calling But the external memory is still experimental until #7214 can be merged. Closing this issue as the interface is here now. |
Hi, I'd like to run the XGBRegressor with dask arrays (from xarray zarr dataset) on a Ray cluster.
However, when running:
I just get this error message:
TypeError: Not supported type for data.<class 'dask.array.core.Array'>
Is there a simple way to achieve this?
dask.config.set(scheduler=ray_dask_get)
as scheduler for dask.The text was updated successfully, but these errors were encountered: