Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/MLX5: fix context create hang #887

Merged

Conversation

Sergei-Lebedev
Copy link
Contributor

What

Fix hanging in TL MLX5 context create.

How ?

PD_OWNER_RANK doesn't start service bcast If no IB devices found, other ranks hang in sbcast waiting for PD_OWNER_RANK

Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid to do sbcast if ppn=1 ?

@samnordmann
Copy link
Collaborator

Is it valid to do sbcast if ppn=1 ?

Nevermind, I missed the part handling it

@Sergei-Lebedev Sergei-Lebedev merged commit 91a7560 into openucx:master Dec 7, 2023
10 of 11 checks passed
@Sergei-Lebedev Sergei-Lebedev deleted the topic/fix_mlx5_ctx_create_hang branch December 7, 2023 07:47
B-a-S pushed a commit to B-a-S/ucc that referenced this pull request Jan 4, 2024
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants