Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BLOCKING] /root/repo/xgboost/src/objective/regression_obj.cu:65: Check failed: preds.Size() == info.labels_.Size() (904 vs. 3616) labels are not correctly providedpreds.size=904, label.size=3616 #4163

Closed
pseudotensor opened this issue Feb 19, 2019 · 16 comments · Fixed by #4165

Comments

@pseudotensor
Copy link
Contributor

1dac5e2 on fresh build off dmlc

failxgbregression.zip

@hcho3 seems to be major regression after recent fixes. Seems all regression problems have the issue.

@pseudotensor
Copy link
Contributor Author

I get this on all or most (not sure) regression problems

@hcho3
Copy link
Collaborator

hcho3 commented Feb 19, 2019

Can you try running the same thing with 0ff84d9 ? #4147 might have introduced the bug.

@trivialfis
Copy link
Member

@pseudotensor It might be my fault on #4147 . But I ran your script fine:

...
[535]	validation_0-rmse:0.14538
[536]	validation_0-rmse:0.145462
[537]	validation_0-rmse:0.145495
[538]	validation_0-rmse:0.145489
[539]	validation_0-rmse:0.145514
[540]	validation_0-rmse:0.145533
[541]	validation_0-rmse:0.145551
[542]	validation_0-rmse:0.145574
[543]	validation_0-rmse:0.145422
[544]	validation_0-rmse:0.145423
[545]	validation_0-rmse:0.145428
Stopping. Best iteration:
[445]	validation_0-rmse:0.145298

@trivialfis trivialfis changed the title /root/repo/xgboost/src/objective/regression_obj.cu:65: Check failed: preds.Size() == info.labels_.Size() (904 vs. 3616) labels are not correctly providedpreds.size=904, label.size=3616 [BLOCKING] /root/repo/xgboost/src/objective/regression_obj.cu:65: Check failed: preds.Size() == info.labels_.Size() (904 vs. 3616) labels are not correctly providedpreds.size=904, label.size=3616 Feb 19, 2019
@trivialfis
Copy link
Member

@hcho3 If confirmed that PR is to blame, I will summit a PR to revert and reopen the closed bug since I can't reproduce.

@pseudotensor
Copy link
Contributor Author

pseudotensor commented Feb 19, 2019

Ok, I will check different hash

@hcho3
Copy link
Collaborator

hcho3 commented Feb 19, 2019

@pseudotensor @trivialfis I actually reproduced the bug on my Linux machine with the latest master. On the other hand, with commit hash 0ff84d9, the example runs fine.

@hcho3
Copy link
Collaborator

hcho3 commented Feb 19, 2019

Let's see if there is a quick fix for this

@hcho3
Copy link
Collaborator

hcho3 commented Feb 19, 2019

Okay, found the issue: somehow the variable num in

xgboost/src/data/data.cc

Lines 140 to 141 in c8c472f

void MetaInfo::SetInfo(
const char* key, const void* dptr, DataType dtype, size_t stride, size_t num) {

is not correct. The num should have been 904 in the given example but now is being set to 3616.

@trivialfis
Copy link
Member

@hcho3 Thanks! Seems my master branch is one commit behind. Sorry about that. Looking into this.

@hcho3
Copy link
Collaborator

hcho3 commented Feb 19, 2019

@trivialfis I actually have a one-liner fix. Trying to fix the test test_np_view now

@trivialfis
Copy link
Member

trivialfis commented Feb 19, 2019

@hcho3 Yap. the length from Python call should be divided by data.dtype.itemsize:

length = len(data.base)

@hcho3
Copy link
Collaborator

hcho3 commented Feb 19, 2019

@trivialfis It turns out that len(data.base) may mean different things. Sometimes it shows the number of bytes, and sometimes it shows the number of elements. Rather than dealing with NumPy internals, I'd prefer to simply make copies.

@trivialfis
Copy link
Member

@hcho3 Okay, I will revert it tomorrow. Really need some sleep now. It's 6AM at my place...

@hcho3
Copy link
Collaborator

hcho3 commented Feb 19, 2019

@trivialfis I'll take care of it. Good night

hcho3 added a commit to hcho3/xgboost that referenced this issue Feb 19, 2019
@hcho3
Copy link
Collaborator

hcho3 commented Feb 19, 2019

@pseudotensor Fix is available at #4165. Thanks for reporting the bug.

@pseudotensor
Copy link
Contributor Author

Thanks!

hcho3 added a commit that referenced this issue Feb 20, 2019
* Revert "Accept numpy array view. (#4147)"

This reverts commit a985a99.

* Fix #4163: always copy sliced data

* Remove print() from the test; check shape equality

* Check if 'base' attribute exists

* Fix lint

* Address reviewer comment

* Fix lint
@lock lock bot locked as resolved and limited conversation to collaborators May 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants