Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/NCCL make ncclGroupEnd nb #798

Merged

Conversation

shimmybalsam
Copy link
Collaborator

@shimmybalsam shimmybalsam commented Jun 21, 2023

What

check nccl status after using ncclGroupEnd , as well as for ncclSend/ncclRecv if they are called outside of ncclGroupStart/End.

Why ?

To fit nb change from using ncclCommInitRankConfig from PR #772

@shimmybalsam shimmybalsam force-pushed the nccl_nb_allgatherv_bug_fix branch 7 times, most recently from 6239705 to 90bbc66 Compare June 25, 2023 12:06
@shimmybalsam
Copy link
Collaborator Author

bot:retest

@manjugv
Copy link
Contributor

manjugv commented Jun 26, 2023

Can we abstract it better than having ifdef's all around the code?

src/components/tl/nccl/tl_nccl.h Show resolved Hide resolved
src/components/tl/nccl/allgatherv/allgatherv.c Outdated Show resolved Hide resolved
src/components/tl/nccl/tl_nccl.h Outdated Show resolved Hide resolved
src/components/tl/nccl/tl_nccl_context.c Outdated Show resolved Hide resolved
src/components/tl/nccl/tl_nccl_coll.c Outdated Show resolved Hide resolved
src/components/tl/nccl/tl_nccl_context.c Show resolved Hide resolved
@shimmybalsam shimmybalsam requested a review from manjugv June 27, 2023 14:20
@shimmybalsam
Copy link
Collaborator Author

Can we abstract it better than having ifdef's all around the code?

@manjugv I abstracted/removed most of the branching, let me know what you think, thanks.

@shimmybalsam shimmybalsam force-pushed the nccl_nb_allgatherv_bug_fix branch from 2a87e41 to 7184609 Compare June 27, 2023 15:26
@shimmybalsam shimmybalsam requested a review from samnordmann June 27, 2023 15:26
@shimmybalsam shimmybalsam force-pushed the nccl_nb_allgatherv_bug_fix branch from aff7789 to 16e8da7 Compare July 5, 2023 14:25
@shimmybalsam
Copy link
Collaborator Author

bot:retest

src/components/tl/nccl/tl_nccl.h Show resolved Hide resolved
src/components/tl/nccl/tl_nccl.c Outdated Show resolved Hide resolved
src/components/tl/nccl/tl_nccl.c Outdated Show resolved Hide resolved
@shimmybalsam shimmybalsam force-pushed the nccl_nb_allgatherv_bug_fix branch from 16e8da7 to 4808c81 Compare July 11, 2023 11:35
@shimmybalsam shimmybalsam force-pushed the nccl_nb_allgatherv_bug_fix branch from 4808c81 to 0e6b945 Compare July 18, 2023 14:33
@Sergei-Lebedev Sergei-Lebedev enabled auto-merge (squash) July 18, 2023 18:17
@Sergei-Lebedev Sergei-Lebedev merged commit 2a9ed2a into openucx:master Jul 19, 2023
jeniaka pushed a commit to jeniaka/ucc that referenced this pull request Aug 14, 2023
* TL/NCCL: make ncclGroupEnd nb

* REVIEW: review fixes

* TL/NCCL: default blocking and configurable

* REVIEW: second review fixes
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
* TL/NCCL: make ncclGroupEnd nb

* REVIEW: review fixes

* TL/NCCL: default blocking and configurable

* REVIEW: second review fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants