Contiguous pages support in Reduce Scatter read/write #12477

avoraTT · 2024-09-10T20:22:30Z

Ticket

Optimize sharded tensor address generators #12223

Problem description

Currently, the read/write chunk functions used in reduce scatter read in pages/tiles one at a time. An optimization is to instead read n contiguous pages until the end of the row, with respect to the dimensions of the shard, tensor slice, and worker slice.

What's changed

The read/write chunk functions have been updated to use the optimized shard tensor addr generators, that return the number of contiguous pages until the end of the shard. Using this, we can now read/write in a contiguous fashion until the end of the row.

Checklist

post-commit: https://github.com/tenstorrent/tt-metal/actions/runs/10800177371
t3k-frequent: https://github.com/tenstorrent/tt-metal/actions/runs/10800188520
t3k-perf: https://github.com/tenstorrent/tt-metal/actions/runs/10800211437 (same failure as main)
t3k-demo: https://github.com/tenstorrent/tt-metal/actions/runs/10800204981 (same failure as main)
t3k-nightly: https://github.com/tenstorrent/tt-metal/actions/runs/10800231521

SeanNijjar

Please make sure to run nightly!

SeanNijjar · 2024-09-10T20:41:38Z

ttnn/cpp/ttnn/operations/ccl/shared_with_host/hetergeneous_data_structs.hpp

@@ -127,24 +127,30 @@ inline void advance_worker_global_page_interleaved (

    coord_t const &tensor_shape, // full tensor shape

-    bool &last_page_of_worker
+    bool &last_page_of_worker,
+    const uint32_t stride=1


Since this appears to be called only in one place, can we just remove the default value and then also keep last_page_of_worker last? It's a bit of a nitpick but since last_page_of_worker is purely for debug, it'll be nice to keep it separate

…educe scatter read/write wrapped functions.

avoraTT force-pushed the avora/contig_optim branch from fdcdc8d to 7abf851 Compare September 10, 2024 20:23

avoraTT temporarily deployed to dev September 10, 2024 20:33 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 10, 2024 20:34 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 10, 2024 20:35 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 10, 2024 20:36 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 10, 2024 20:37 — with GitHub Actions Inactive

avoraTT marked this pull request as ready for review September 10, 2024 20:38

avoraTT requested review from SeanNijjar and cfjchu as code owners September 10, 2024 20:38

avoraTT self-assigned this Sep 10, 2024

SeanNijjar approved these changes Sep 10, 2024

View reviewed changes

avoraTT temporarily deployed to dev September 10, 2024 20:43 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 10, 2024 20:45 — with GitHub Actions Inactive

avoraTT had a problem deploying to dev September 10, 2024 20:45 — with GitHub Actions Failure

avoraTT temporarily deployed to dev September 10, 2024 20:45 — with GitHub Actions Inactive

avoraTT had a problem deploying to dev September 10, 2024 20:45 — with GitHub Actions Failure

avoraTT temporarily deployed to dev September 10, 2024 20:46 — with GitHub Actions Inactive

avoraTT had a problem deploying to dev September 10, 2024 20:46 — with GitHub Actions Failure

avoraTT temporarily deployed to dev September 10, 2024 20:46 — with GitHub Actions Inactive

avoraTT had a problem deploying to dev September 10, 2024 20:46 — with GitHub Actions Failure

avoraTT temporarily deployed to dev September 11, 2024 21:13 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 11, 2024 21:30 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 11, 2024 21:31 — with GitHub Actions Inactive

avoraTT force-pushed the avora/contig_optim branch from a08c1dc to 698c505 Compare September 12, 2024 16:42

avoraTT temporarily deployed to dev September 12, 2024 16:42 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 12, 2024 16:52 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 12, 2024 17:02 — with GitHub Actions Inactive

avoraTT force-pushed the avora/contig_optim branch from 698c505 to 77b3fee Compare September 13, 2024 14:03

avoraTT temporarily deployed to dev September 13, 2024 14:03 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 13, 2024 14:16 — with GitHub Actions Inactive

avoraTT temporarily deployed to dev September 13, 2024 14:19 — with GitHub Actions Inactive

avoraTT added 2 commits September 13, 2024 12:16

#0: Integrating optimization to read a contiguous chunk of pages in r…

213756f

…educe scatter read/write wrapped functions.

#0: Clean up args for advance_worker_global_page_interleaved.

0f9654d

avoraTT force-pushed the avora/contig_optim branch from 77b3fee to 0f9654d Compare September 13, 2024 16:16

avoraTT merged commit 1688b17 into main Sep 13, 2024
6 checks passed

avoraTT deleted the avora/contig_optim branch September 13, 2024 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contiguous pages support in Reduce Scatter read/write #12477

Contiguous pages support in Reduce Scatter read/write #12477

avoraTT commented Sep 10, 2024 •

edited

Loading

SeanNijjar left a comment

SeanNijjar Sep 10, 2024

Contiguous pages support in Reduce Scatter read/write #12477

Contiguous pages support in Reduce Scatter read/write #12477

Conversation

avoraTT commented Sep 10, 2024 • edited Loading

Ticket

Problem description

What's changed

Checklist

SeanNijjar left a comment

Choose a reason for hiding this comment

SeanNijjar Sep 10, 2024

Choose a reason for hiding this comment

avoraTT commented Sep 10, 2024 •

edited

Loading