Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Objective.CoxRegressionGPair test failed because the label sorting is not stable #7754

Closed
hmchen-github opened this issue Mar 25, 2022 · 0 comments · Fixed by #7756
Closed

Comments

@hmchen-github
Copy link
Contributor

We built the xgboost with our toolchain (clang) and I found Objective.CoxRegressionGPair test sometimes failed as follows:

xgboost/tests/cpp/helpers.cc:86
The difference between gpair[i].GetGrad() and out_grad[i] is 0.081465564668178558, which exceeds 0.01, where
gpair[i].GetGrad() evaluates to 0.081465564668178558,
out_grad[i] evaluates to 0, and
0.01 evaluates to 0.01.
Unexpected grad for pred=0.10000000149011612 label=-2 weight=1

xgboost/tests/cpp/helpers.cc:89
The difference between gpair[i].GetHess() and out_hess[i] is 0.074828922748565674, which exceeds 0.01, where
gpair[i].GetHess() evaluates to 0.074828922748565674,
out_hess[i] evaluates to 0, and
0.01 evaluates to 0.01.
Unexpected hess for pred=0.10000000149011612 label=-2 weight=1

I found the issue is due to the LabelAbsSort. It uses XGBOOST_PARALLEL_SORT, which cannot preserve the order of the equal elements. In the above test, the test label is { 0, -2, -2, 2, 3, 5, -10, 100}. The sorted label index is supposed to be {0, 1, 2, 3, 4, 5, 6, 7}. However sometimes it is {0, 2, 3, 1, 4, 5, 6, 7}, because the original order of -2, -2 and 2 are not preserved after sorting.

After using XGBOOST_PARALLEL_STABLE_SORT, the test failure is not reproduced.

hmchen-github added a commit to hmchen-github/xgboost that referenced this issue Mar 25, 2022
XGBOOST_PARALLEL_SORT cannot preserve the original order of
equal elements. It causes flaky failure for the cox regression
unittest. Use stable sort to make sure the result is deterministic.

Fix dmlc#7754
hmchen-github added a commit to hmchen-github/xgboost that referenced this issue Mar 25, 2022
XGBOOST_PARALLEL_SORT cannot preserve the original order of
equal elements. It causes flaky failure for the cox regression
unittest. Use stable sort to make sure the result is deterministic.

Fix dmlc#7754
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant