-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re-enable recovery test after rabit allow duplicated allreduce calls with same signature #5031
Conversation
Restarted the aborted tests. |
Codecov Report
@@ Coverage Diff @@
## master #5031 +/- ##
=========================================
Coverage ? 71.52%
=========================================
Files ? 11
Lines ? 2311
Branches ? 0
=========================================
Hits ? 1653
Misses ? 658
Partials ? 0 Continue to review full report at Codecov.
|
@trivialfis can we rerun tests, a bit weird. |
The error message in the R tests is
I thought the R package was using mock Rabit? (Distributed training is not supported for XGBoost-R) |
I see, that’s probably not related to mock but all gather interface merged in master. |
depends on dmlc/rabit#129 |
Will look into this once our Windows worker is back. Sorry for being slow here. |
@trivialfis Sorry for the delay, I was spending time with my family over the Thanksgiving holiday. The blocking issue now is inability to run the compiled C++ tests ( |
@trivialfis I found the reason why |
@hcho3. That's great! |
@chenqin Could you make a rebase onto master branch. As you used your master branch I can't do that for you. |
@chenqin pls help rebasing, otherwise the Jenkins Windows tests can not pass. |
` [2019-12-15T01:52:26.359Z] unable to load shared object '/workspace/xgboost/xgboost.Rcheck/xgboost/libs/xgboost.so': [2019-12-15T01:52:26.359Z] /workspace/xgboost/xgboost.Rcheck/xgboost/libs/xgboost.so: undefined symbol: ZN5rabit6engine9AllgatherEPvmmmmPKciS3 |
@trivialfis @hcho3 @CodingCat happy new year |
@chenqin You too |
@CodingCat Could you please share the rabit bug you discovered? My next target would be merging existing PRs. No more new bug fix from me unless critical. |
Why we stop bug fixing but merging new functionalities?
It’s counter-intuitive as the final steps of releasing a new version?
…On Wed, Jan 1, 2020 at 11:34 PM Jiaming Yuan ***@***.***> wrote:
@CodingCat <https://github.com/CodingCat> Could you please share the
rabit bug you discovered? My next target would be merging PRs. No more new
bug fixes unless critical.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5031?email_source=notifications&email_token=AAFFQ6GAJCKYMU7YNWWL3STQ3WKIFA5CNFSM4JL75I7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH5ZIZA#issuecomment-570135652>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFFQ6BLT5TH6EPEY6PKFWTQ3WKIFANCNFSM4JL75I7A>
.
|
I mean I won't focus on bug fixes, others can continue the process. And merging PRs will focus on bug fixes PRs. |
Clarified, sorry for the ambiguity. |
Closes dmlc#5031.
thanks @trivialfis helping get enable rabit test pr in. close this pr |
Reeanble rabit tests after fix
dmlc/rabit#128 merged
Rabit assume bootstrap allreduce has unique signature, in #5012 we find it failed to satisfy such constraint.
The goal of this pr is to loosen such constraint and allow last write overwrite results with same signature during bootstrap phase. Here is example https://github.com/dmlc/xgboost/pull/4990/files#diff-5fabd019d3f1af7572baa4a6301cf076R36