You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
xgboost/tests/cpp/helpers.cc:86
The difference between gpair[i].GetGrad() and out_grad[i] is 0.081465564668178558, which exceeds 0.01, where
gpair[i].GetGrad() evaluates to 0.081465564668178558,
out_grad[i] evaluates to 0, and
0.01 evaluates to 0.01.
Unexpected grad for pred=0.10000000149011612 label=-2 weight=1
xgboost/tests/cpp/helpers.cc:89
The difference between gpair[i].GetHess() and out_hess[i] is 0.074828922748565674, which exceeds 0.01, where
gpair[i].GetHess() evaluates to 0.074828922748565674,
out_hess[i] evaluates to 0, and
0.01 evaluates to 0.01.
Unexpected hess for pred=0.10000000149011612 label=-2 weight=1
I found the issue is due to the LabelAbsSort. It uses XGBOOST_PARALLEL_SORT, which cannot preserve the order of the equal elements. In the above test, the test label is { 0, -2, -2, 2, 3, 5, -10, 100}. The sorted label index is supposed to be {0, 1, 2, 3, 4, 5, 6, 7}. However sometimes it is {0, 2, 3, 1, 4, 5, 6, 7}, because the original order of -2, -2 and 2 are not preserved after sorting.
After using XGBOOST_PARALLEL_STABLE_SORT, the test failure is not reproduced.
The text was updated successfully, but these errors were encountered:
XGBOOST_PARALLEL_SORT cannot preserve the original order of
equal elements. It causes flaky failure for the cox regression
unittest. Use stable sort to make sure the result is deterministic.
Fixdmlc#7754
XGBOOST_PARALLEL_SORT cannot preserve the original order of
equal elements. It causes flaky failure for the cox regression
unittest. Use stable sort to make sure the result is deterministic.
Fixdmlc#7754
We built the xgboost with our toolchain (clang) and I found Objective.CoxRegressionGPair test sometimes failed as follows:
I found the issue is due to the LabelAbsSort. It uses XGBOOST_PARALLEL_SORT, which cannot preserve the order of the equal elements. In the above test, the test label is
{ 0, -2, -2, 2, 3, 5, -10, 100}
. The sorted label index is supposed to be{0, 1, 2, 3, 4, 5, 6, 7}
. However sometimes it is{0, 2, 3, 1, 4, 5, 6, 7}
, because the original order of -2, -2 and 2 are not preserved after sorting.After using
XGBOOST_PARALLEL_STABLE_SORT
, the test failure is not reproduced.The text was updated successfully, but these errors were encountered: