[BugFix][TIR] Fix multi-grouped multi-warp allreduce #15399

MasterJH5574 · 2023-07-25T06:49:00Z

PR #15327 and #15373 introduced multi-warp allreduce implementation. At the time of the introduction, I tested the correctness numerically via the workload of "taking a matrix of ones as input, computing the summation over each row". Both PR passed this numerical tess, while I didn't realize that this test is not complete and cannot guarantee the correctness.

The previous implementation has bug which can be tested by turning the input matrix from ones to random floating-point numbers. This will expose the issues of the previous implementation.

Therefore, this PR fixes the issues, and add the numerical tests for multi-warp allreduce into test_allreduce_cuda.py. By reducing some of the redundant tests in that file, we hope this can reduce the testing time a bit while still guarantee the correctness.

Sorry for not testing the implementation completely before.

tvm-bot · 2023-07-25T06:49:03Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Hzfengsy, @junrushao, @quic-sanirudh, @shingjan _{See #10317 for details}

_{Generated by tvm-bot}

PR apache#15327 and apache#15373 introduced multi-warp allreduce implementation. At the time of the introduction, I tested the correctness numerically via the workload of "taking a matrix of ones as input, computing the summation over each row". Both PR passed this numerical tess, while I didn't realize that this test is not complete and cannot guarantee the correctness. The previous implementation has bug which can be tested by turning the input matrix from ones to random floating-point numbers. This will expose the issues of the previous implementation. Therefore, this PR fixes the issues, and add the numerical tests for multi-warp allreduce into `test_allreduce_cuda.py`. By reducing some of the redundant tests in that file, we hope this can reduce the testing time a bit while still guarantee the correctness. Sorry for not testing the implementation completely before.

MasterJH5574 force-pushed the tvm-dev/2023-07-24-allreduce-fix branch from d59c1a8 to d7de7e0 Compare July 25, 2023 07:04

MasterJH5574 mentioned this pull request Jul 25, 2023

[Codegen][Metal] Support metal warp-level primitive #15401

Merged

tqchen approved these changes Jul 25, 2023

View reviewed changes

tqchen merged commit 236eb31 into apache:main Jul 25, 2023

ysh329 mentioned this pull request Oct 18, 2023

[Release] v0.14.0 Release Candidate Notes #15948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][TIR] Fix multi-grouped multi-warp allreduce #15399

[BugFix][TIR] Fix multi-grouped multi-warp allreduce #15399

MasterJH5574 commented Jul 25, 2023

tvm-bot commented Jul 25, 2023

[BugFix][TIR] Fix multi-grouped multi-warp allreduce #15399

[BugFix][TIR] Fix multi-grouped multi-warp allreduce #15399

Conversation

MasterJH5574 commented Jul 25, 2023

tvm-bot commented Jul 25, 2023