-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/NCCL: make team init non blocking #772
TL/NCCL: make team init non blocking #772
Conversation
079f631
to
fc6ca00
Compare
Note that non-blocking flag affects all NCCL apis, including communications and comm destroy. In particular https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommfinalize has to be used in communicator release/destroy. |
d41920e
to
283fd56
Compare
283fd56
to
6e98b6e
Compare
8542cdb
to
6558226
Compare
Just to clarify - this will make UCC configured and built with NCCL > 2.14.3 incompatible with NCCL < 2.14.3 at the runtime, right? Not complaining, just something worth mentioning in the docs. |
@marsaev Yes since nccl supports ncclCommInitRankConfig only from 2.14.3. I will make sure to add to docs. |
6558226
to
e957b14
Compare
e957b14
to
c558fe3
Compare
* TL/NCCL: make team init non blocking * TL/NCCL: support by version and nb finalize * REVIEW: code review fixes
What
Make nccl team create non blocking
Why ?
All UCC TL's team create are supposed to be non blocking, but TL/NCCL was using old NCCL API which was blocking.
How ?
Updated blocking ncclCommInitRank() to non blocking ncclCommInitRankConfig()