Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coll] Increase timeout for allgather test. #9777

Merged
merged 2 commits into from
Nov 8, 2023

Conversation

trivialfis
Copy link
Member

https://buildkite.com/xgboost/xgboost-ci/builds/3904#018baca3-2d1c-44f8-b22d-f8d724d46f32

I can reproduce the error by increasing the size of the input data, but couldn't reproduce the hang, from the log it looks like the tracker is refusing the shutdown. I will work on canceling the tracker in the future.

@trivialfis
Copy link
Member Author

trivialfis commented Nov 8, 2023

Interestingly, it doesn't hang on github action: https://github.com/dmlc/xgboost/actions/runs/6798508681/job/18482801229?pr=9777

@trivialfis
Copy link
Member Author

cc @rongou .

@trivialfis trivialfis merged commit 6fd4a30 into dmlc:master Nov 8, 2023
26 checks passed
@trivialfis trivialfis deleted the rabit-allgather-timeout branch November 8, 2023 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants