#12544: support wide channels (> 256) in maxpool #12625

mywoodstock · 2024-09-12T23:02:38Z

Ticket

#12544

Problem description

Maxpool had limitation on max tiles for reduction to 8 (DST reg max).

What's changed

Added support for wide channels.

Checklist

Post commit CI passes
Blackhole Post commit (if applicable)
Model regression CI testing passes (if applicable)
Device performance regression CI testing passes (if applicable)
New/Existing tests provide coverage for changes

pavlejosipovic · 2024-09-13T07:30:57Z

ttnn/cpp/ttnn/operations/pool/maxpool/device/max_pool2d_multi_core_program_factory.cpp

@@ -63,10 +63,13 @@ MaxPool2D::MultiCore::cached_program_t max_pool_2d_multi_core_sharded_with_halo_
    uint32_t in_ntiles_hw = (uint32_t)std::ceil((float)kernel_size_hw_padded / tt::constants::TILE_HEIGHT);
    uint32_t in_ntiles_c = (uint32_t)std::ceil((float)input_shape[3] / tt::constants::TILE_WIDTH);
    uint32_t out_ntiles_c = (uint32_t)std::ceil((float)output_shape[3] / tt::constants::TILE_WIDTH);
-    uint32_t MAX_SMALL_KERNEL_SIZE_HW = 16;
+
+    const uint32_t MAX_SMALL_KERNEL_SIZE_HW = 16;


nit: I prefer constexpr for this type of consts

pavlejosipovic · 2024-09-13T08:04:42Z

ttnn/cpp/ttnn/operations/pool/maxpool/device/kernels/compute/max_pool_multi_core.cpp

    constexpr bool is_partial_tile = in_c < 32;
    static_assert((!is_partial_tile || (in_c == 16)), "Partial tile must have c_dim 16");
    constexpr uint32_t num_faces_in_tile = is_partial_tile ? 1 : 2;
    constexpr uint32_t num_out_rows = 1;

-    tilizeA_B_reduce_init<true>(in_cb_id, in_scalar_cb_id, in_ntiles_hwc, out_cb_id, num_faces_in_tile, window_size_hw);
+    uint32_t in_ntiles_hwc_block = in_ntiles_hwc / in_nblocks_c;


maybe add constexpr here

pavlejosipovic · 2024-09-13T08:07:46Z

ttnn/cpp/ttnn/operations/pool/maxpool/device/kernels/compute/max_pool_multi_core.cpp

 #include "compute_kernel_api/tilize.h"
 #include "compute_kernel_api/reduce.h"
 #include "compute_kernel_api/pack_untilize.h"
-// #include "tools/profiler/kernel_profiler.hpp"

 #define DEBUG_PRINT 0

 #if DEBUG_PRINT == 1


feels like these should be in some common place for kernel debug utils

pavlejosipovic · 2024-09-13T08:08:40Z

ttnn/cpp/ttnn/operations/pool/maxpool/device/kernels/compute/max_pool_multi_core.cpp

 #endif

-template<uint32_t in_ntiles_hw, uint32_t in_ntiles_c, uint32_t out_ntiles_c, uint32_t nblocks, bool is_partial_tile, uint32_t split_reader, uint32_t unpA_face_r_dim>
+template<uint32_t in_ntiles_hw, uint32_t in_ntiles_c, uint32_t out_ntiles_c, bool is_partial_tile, uint32_t split_reader, uint32_t unpA_face_r_dim, uint32_t in_nblocks_c>
 inline void reduce_h_fused(


"fused" in reduce_h_fused means tilize/untilize on the fly right?

ayerofieiev-tt · 2024-09-13T16:46:35Z

tests/ttnn/unit_tests/operations/test_maxpool2d.py

@@ -113,7 +113,7 @@ def run_max_pool(
    # interleaved_mem_config = ttnn.L1_MEMORY_CONFIG
    # output = ttnn.to_memory_config(output, interleaved_mem_config)
    output_host = output.cpu()
-    output_pytorch_padded = ttnn.to_torch(output_host)
+    output_pytorch_padded = torch.Tensor(ttnn.to_torch(output_host))


why is this needed? ttnn.to_torch returns torch.Tensor

It doesn't really return pure torch tensor -- needs ttnn to subsequently read/manipulate it. This was the recommended way to convert it into pure torch.

Yeah can confirm. If you don't do it torch function will crash with segmentation fault.

🫠 okay, good to know

dmakoviichuk-tt · 2024-09-13T17:09:23Z

tests/ttnn/unit_tests/operations/test_maxpool2d.py

+            ## wide for vgg
+            [1, 256, 56, 56],
+            [1, 512, 28, 28],
+            [1, 512, 14, 14],


is it possible to test it on multiple batches too?
also what if W or H is not equal to each other.
Also the case wher W or H is equal to 1 is interesting too.

These are specific to what are used in models -- the generic testing will be all in sweep tests.

But if it works on other cases why not demonstrate it here?
As I see in description: it mentioned that added support for wide channels.
Nothing said that addeed support for vgg only or some other CNN.
In this case I expect to see other wide channels examples.

mywoodstock requested review from eyonland, patrickroberts, yan-zaretskiy, cfjchu, xanderchin, TT-BrianLiu, ayerofieiev-tt, dmakoviichuk-tt, razorback3, dongjin-na, shwetankTT, sankarmanoj-tt and pavlejosipovic as code owners September 12, 2024 23:02

mywoodstock force-pushed the asarje/wide-maxpool branch from 2630487 to 7b96435 Compare September 12, 2024 23:14

mywoodstock temporarily deployed to dev September 12, 2024 23:18 — with GitHub Actions Inactive

mywoodstock temporarily deployed to dev September 12, 2024 23:19 — with GitHub Actions Inactive

mywoodstock temporarily deployed to dev September 13, 2024 00:27 — with GitHub Actions Inactive

mywoodstock temporarily deployed to dev September 13, 2024 00:36 — with GitHub Actions Inactive

mywoodstock temporarily deployed to dev September 13, 2024 00:41 — with GitHub Actions Inactive

mywoodstock force-pushed the asarje/wide-maxpool branch from 822148f to 86d42a0 Compare September 13, 2024 00:58

mywoodstock temporarily deployed to dev September 13, 2024 04:29 — with GitHub Actions Inactive

mywoodstock temporarily deployed to dev September 13, 2024 04:42 — with GitHub Actions Inactive

mywoodstock temporarily deployed to dev September 13, 2024 04:43 — with GitHub Actions Inactive

pavlejosipovic reviewed Sep 13, 2024

View reviewed changes

pavlejosipovic approved these changes Sep 13, 2024

View reviewed changes

ayerofieiev-tt reviewed Sep 13, 2024

View reviewed changes

ayerofieiev-tt approved these changes Sep 13, 2024

View reviewed changes

dmakoviichuk-tt reviewed Sep 13, 2024

View reviewed changes

#12544: support wide channels (> 256)

ef019dc

mywoodstock force-pushed the asarje/wide-maxpool branch from bde6978 to ef019dc Compare September 13, 2024 17:20

mywoodstock merged commit f1f1d37 into main Sep 13, 2024
6 checks passed

mywoodstock deleted the asarje/wide-maxpool branch September 13, 2024 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#12544: support wide channels (> 256) in maxpool #12625

#12544: support wide channels (> 256) in maxpool #12625

mywoodstock commented Sep 12, 2024

pavlejosipovic Sep 13, 2024

pavlejosipovic Sep 13, 2024

pavlejosipovic Sep 13, 2024

pavlejosipovic Sep 13, 2024

mywoodstock Sep 13, 2024

ayerofieiev-tt Sep 13, 2024

mywoodstock Sep 13, 2024

dmakoviichuk-tt Sep 13, 2024

ayerofieiev-tt Sep 13, 2024

dmakoviichuk-tt Sep 13, 2024

mywoodstock Sep 13, 2024

dmakoviichuk-tt Sep 13, 2024 •

edited

Loading

#12544: support wide channels (> 256) in maxpool #12625

#12544: support wide channels (> 256) in maxpool #12625

Conversation

mywoodstock commented Sep 12, 2024

Ticket

Problem description

What's changed

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmakoviichuk-tt Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

dmakoviichuk-tt Sep 13, 2024 •

edited

Loading