Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/NCCL: make team init non blocking #772

Merged
merged 3 commits into from
Jun 12, 2023

Conversation

shimmybalsam
Copy link
Collaborator

What

Make nccl team create non blocking

Why ?

All UCC TL's team create are supposed to be non blocking, but TL/NCCL was using old NCCL API which was blocking.

How ?

Updated blocking ncclCommInitRank() to non blocking ncclCommInitRankConfig()

@shimmybalsam shimmybalsam force-pushed the nccl_team_create_nb branch 2 times, most recently from 079f631 to fc6ca00 Compare May 9, 2023 09:07
@marsaev
Copy link

marsaev commented May 10, 2023

Note that non-blocking flag affects all NCCL apis, including communications and comm destroy. In particular https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommfinalize has to be used in communicator release/destroy.

src/components/tl/nccl/tl_nccl_team.c Outdated Show resolved Hide resolved
src/components/tl/nccl/tl_nccl_team.c Outdated Show resolved Hide resolved
src/components/tl/nccl/tl_nccl_team.c Show resolved Hide resolved
src/components/tl/nccl/tl_nccl_team.c Outdated Show resolved Hide resolved
src/components/tl/nccl/tl_nccl_team.c Show resolved Hide resolved
src/components/tl/nccl/tl_nccl_team.c Outdated Show resolved Hide resolved
@shimmybalsam shimmybalsam force-pushed the nccl_team_create_nb branch from 283fd56 to 6e98b6e Compare May 24, 2023 11:58
@shimmybalsam shimmybalsam force-pushed the nccl_team_create_nb branch 2 times, most recently from 8542cdb to 6558226 Compare May 25, 2023 09:46
@marsaev
Copy link

marsaev commented May 31, 2023

Just to clarify - this will make UCC configured and built with NCCL > 2.14.3 incompatible with NCCL < 2.14.3 at the runtime, right? Not complaining, just something worth mentioning in the docs.

@shimmybalsam
Copy link
Collaborator Author

Just to clarify - this will make UCC configured and built with NCCL > 2.14.3 incompatible with NCCL < 2.14.3 at the runtime, right? Not complaining, just something worth mentioning in the docs.

@marsaev Yes since nccl supports ncclCommInitRankConfig only from 2.14.3. I will make sure to add to docs.

@shimmybalsam shimmybalsam force-pushed the nccl_team_create_nb branch from 6558226 to e957b14 Compare June 4, 2023 09:10
@shimmybalsam shimmybalsam force-pushed the nccl_team_create_nb branch from e957b14 to c558fe3 Compare June 12, 2023 11:47
@Sergei-Lebedev Sergei-Lebedev merged commit 5429d05 into openucx:master Jun 12, 2023
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
* TL/NCCL: make team init non blocking

* TL/NCCL: support by version and nb finalize

* REVIEW: code review fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants