-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dask prediction. #4941
Fix dask prediction. #4941
Conversation
I am coming up against this assertion on some training runs also |
Em... I can't reproduce what you said. Will keep looking. |
Let me create some benchmark scripts for prediction. |
That's the reason I added so many insertions .. Here are 3 insertions you might fail:
|
Here was the problem, I had the following partition_size = 1000
X = da.from_array(data.X_train, partition_size) The reason this fails is the dataset had 2000 features - more than the partition size. When you specify a single value for partition size it actually sets all dimensions to this value not just the first. This fixes it partition_size = 1000
X = da.from_array(data.X_train, (partition_size, data.X_train.shape[1])) I think you can reproduce this bug in your demo (https://github.com/dmlc/xgboost/blob/master/demo/dask/gpu_training.py) by setting I found this unintuitive, how can we make this better? We could coerce the data into the correct dimensions and log a warning instead, or just fail with a better message. The demos will need updating also. Maybe something like: |
Codecov Report
@@ Coverage Diff @@
## master #4941 +/- ##
=========================================
Coverage ? 71.05%
=========================================
Files ? 11
Lines ? 2301
Branches ? 0
=========================================
Hits ? 1635
Misses ? 666
Partials ? 0
Continue to review full report at Codecov.
|
No description provided.