[Convolution] Packing + objectFifo, initial support #789

newling · 2024-09-19T16:38:10Z

This PR switches all numerical convolution tests to use the objectFifo pipeline. With respect to the new tiling strategy:

A single column is currently used. Targeting multiple columns results in error: 'aie.memtile_dma' op could not find and assign a valid BD id. This will will be investigated as follow-up work: Multicore convolution #821
There is no longer interleaving of compute and L2->L1 data movement, which means Failure running conv2d i32 with objectFifo #619 becomes low priority / obsolete
L3->L2, L2->L3 still uses padding. But L2->L1, L1->L2 uses packing.
Channel-first convolution is completely unsupported, we expect high level transforms to convert to channel last before reaching our backend.
Vectorization is not currently enabled, due to issues with alignment. See follow-up task Numerics issue with vectorized conv2d #820. This is functionally ok for now, as peano can scalarize code for all data types.

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEPackToDma.cpp

yzhang93 · 2024-09-20T06:11:59Z

I'll have a thorough review tomorrow. Just a high level question, does this new strategy also work with AIR pipeline?

Thanks. No, it doesn't get through compilation with the AIR pipeline.

build_tools/ci/cpu_comparison/run.py

build_tools/ci/cpu_comparison/test_files/conv2d_nchw_bf16.mlir

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp

build_tools/ci/cpu_comparison/run.py

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp

newling · 2024-09-23T18:15:19Z

@erwei-xilinx @yzhang93 I am making changes to the tiling strategy today, so please don't review this in detail yet. I will post IR once I have channel-last vectorizing correctly.

newling · 2024-09-26T00:47:16Z

Reporting current status with current code checked in, mostly to keep track for myself.

The vectorization looks good, compilation to vmfb successful. vector.contract -> aievec.matmul, no discontiguous transfer_reads or transfer_writes. Good.

But numerically incorrect. Values all quite close (no zeros).

If I make the input tensor be all 1s (but kernel still random values) then numerically correct.

If I comment out the pass AMDAIEVectorization, numerically correct (with input tensor and kernel having random values).

The AIEVec passes look like they're doing the correct thing, I can't see any problem with aievec.matmul, or lowering to LLVM.

So what's going on, is this a problem with peano? What experiments can I run to test this hypothesis?

Or is the problem somewhere else, and it's just having a bad interaction with AMDAIEVectorization?

makslevental · 2024-09-26T02:51:14Z

So what's going on, is this a problem with peano? What experiments can I run to test this hypothesis?

It's possible but I feel like it's unlikely - they run a large test suite (somewhere) nightly which (I'm assuming) checks NA.

newling · 2024-09-26T04:21:06Z

So what's going on, is this a problem with peano? What experiments can I run to test this hypothesis?

It's possible but I feel like it's unlikely - they run a large test suite (somewhere) nightly which (I'm assuming) checks NA.

Ok. The difference does appear to be vectorization/below. Replace this:

scf.for %arg4 = %c0 to %c3 step %c1 {
  scf.for %arg5 = %c0 to %c3 step %c1 {
    %collapse_shape = memref.collapse_shape %reinterpret_cast [[0, 1, 2, 3], [4]] : memref<1x1x4x1x4xf32, 2 : i32> into memref<4x4xf32, 2 : i32>
    %25 = vector.transfer_read %reinterpret_cast_8[%c0, %arg4, %c0, %arg5, %c0], %cst {in_bounds = [true, true]} : memref<1x3x1x6x8xbf16, 2 : i32>, vector<4x8xbf16>
    %26 = vector.transfer_read %reinterpret_cast_9[%arg4, %arg5, %c0, %c0, %c0, %c0], %cst {in_bounds = [true, true]} : memref<3x3x1x1x8x4xbf16, 2 : i32>, vector<8x4xbf16>
    %27 = vector.transfer_read %collapse_shape[%c0, %c0], %cst_0 {in_bounds = [true, true]} : memref<4x4xf32, 2 : i32>, vector<4x4xf32>
    %28 = arith.extf %25 : vector<4x8xbf16> to vector<4x8xf32>
    %29 = arith.extf %26 : vector<8x4xbf16> to vector<8x4xf32>
    %30 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %28, %29, %27 : vector<4x8xf32>, vector<8x4xf32> into vector<4x4xf32>
    vector.transfer_write %30, %collapse_shape[%c0, %c0] {in_bounds = [true, true]} : vector<4x4xf32>, memref<4x4xf32, 2 : i32>
  }
}

with this:

scf.for %arg4 = %c0 to %c3 step %c1 {
  scf.for %arg5 = %c0 to %c3 step %c1 {
    %subview = memref.subview %reinterpret_cast_7[0, %arg4, 0, %arg5, 0] [1, 1, 1, 4, 8] [1, 1, 1, 1, 1] : memref<1x3x1x6x8xbf16, 2 : i32> to memref<1x1x1x4x8xbf16, strided<[144, 48, 48, 8, 1], offset: ?>, 2 : i32>
    %collapse_shape = memref.collapse_shape %subview [[0, 1, 2, 3], [4]] : memref<1x1x1x4x8xbf16, strided<[144, 48, 48, 8, 1], offset: ?>, 2 : i32> into memref<4x8xbf16, strided<[8, 1], offset: ?>, 2 : i32>
    %subview_9 = memref.subview %reinterpret_cast_8[%arg4, %arg5, 0, 0, 0, 0] [1, 1, 1, 1, 8, 4] [1, 1, 1, 1, 1, 1] : memref<3x3x1x1x8x4xbf16, 2 : i32> to memref<1x1x1x1x8x4xbf16, strided<[96, 32, 32, 32, 4, 1], offset: ?>, 2 : i32>
    %collapse_shape_10 = memref.collapse_shape %subview_9 [[0, 1, 2, 3, 4], [5]] : memref<1x1x1x1x8x4xbf16, strided<[96, 32, 32, 32, 4, 1], offset: ?>, 2 : i32> into memref<8x4xbf16, strided<[4, 1], offset: ?>, 2 : i32>
    %collapse_shape_11 = memref.collapse_shape %reinterpret_cast [[0, 1, 2, 3], [4]] : memref<1x1x4x1x4xf32, 2 : i32> into memref<4x4xf32, 2 : i32>
    linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"]} ins(%collapse_shape, %collapse_shape_10 : memref<4x8xbf16, strided<[8, 1], offset: ?>, 2 : i32>, memref<8x4xbf16, strided<[4, 1], offset: ?>, 2 : i32>) outs(%collapse_shape_11 : memref<4x4xf32, 2 : i32>) {
    ^bb0(%in: bf16, %in_12: bf16, %out: f32):
      %25 = arith.extf %in : bf16 to f32
      %26 = arith.extf %in_12 : bf16 to f32
      %27 = arith.mulf %25, %26 : f32
      %28 = arith.addf %out, %27 : f32
      linalg.yield %28 : f32
    }
  }
}

And the numerics are fixed. Maybe it's time for me to try the simulator you mentioned @makslevental.

yzhang93 · 2024-09-26T05:24:18Z

@newling Can we have a full review of the current state of convolution ops in the next AIE sync? CC @MaheshRavishankar

makslevental · 2024-09-26T06:20:06Z

And the numerics are fixed. Maybe it's time for me to try the simulator you mentioned @makslevental.

I made progress on this yesterday but I'm still blocked by some issue about MEMTAB in the elf file. But I'm hoping to be able to land the PR (along with the new test suite) this week before I go back to HAL stuff next week.

newling · 2024-09-26T16:15:49Z

And the numerics are fixed. Maybe it's time for me to try the simulator you mentioned @makslevental.

I made progress on this yesterday but I'm still blocked by some issue about MEMTAB in the elf file. But I'm hoping to be able to land the PR (along with the new test suite) this week before I go back to HAL stuff next week.

Ok, thanks, I'll keep an eye out for that.

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/Passes.cpp

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/test/lowering_strategy_conv.mlir

newling · 2024-10-04T22:19:31Z

@yzhang93 this is ready for review again if you'd like to take a look

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEPackAndTranspose.cpp

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/Passes.cpp

newling requested review from MaheshRavishankar, nirvedhmeshram, yzhang93, Abhishek-Varma and jtuyls as code owners September 19, 2024 16:38

newling commented Sep 19, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEPackToDma.cpp Outdated Show resolved Hide resolved

newling changed the title ~~[WIP][Convolution] Packing + objectFifo~~ [Convolution] Packing + objectFifo, initial support Sep 19, 2024

yzhang93 requested changes Sep 20, 2024

View reviewed changes

newling commented Sep 20, 2024

View reviewed changes

build_tools/ci/cpu_comparison/run.py Show resolved Hide resolved

newling force-pushed the packing_for_convolution branch from ec28364 to 0d98a19 Compare September 20, 2024 22:35

yzhang93 reviewed Sep 21, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp Outdated Show resolved Hide resolved

yzhang93 reviewed Sep 21, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp Show resolved Hide resolved

yzhang93 reviewed Sep 21, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp Outdated Show resolved Hide resolved

yzhang93 reviewed Sep 21, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp Outdated Show resolved Hide resolved

newling changed the title ~~[Convolution] Packing + objectFifo, initial support~~ [WIP][Convolution] Packing + objectFifo, initial support Sep 23, 2024

newling force-pushed the packing_for_convolution branch from 5201089 to b86e1ec Compare September 25, 2024 23:27

newling force-pushed the packing_for_convolution branch from b86e1ec to b9f4242 Compare October 2, 2024 19:53

This was referenced Oct 3, 2024

Numerics issue with vectorized conv2d #820

Open

Multicore convolution #821

Open

newling changed the title ~~[WIP][Convolution] Packing + objectFifo, initial support~~ [Convolution] Packing + objectFifo, initial support Oct 3, 2024

newling commented Oct 3, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/Passes.cpp Show resolved Hide resolved

newling commented Oct 3, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/test/lowering_strategy_conv.mlir Show resolved Hide resolved

newling requested a review from yzhang93 October 3, 2024 15:27

newling force-pushed the packing_for_convolution branch 2 times, most recently from 2bed2df to b4a86cf Compare October 7, 2024 17:55

newling added 2 commits October 8, 2024 14:06

resquash

5ddbfd6

cosmetic

4dc5ec2

newling force-pushed the packing_for_convolution branch from b4a86cf to 4dc5ec2 Compare October 8, 2024 21:14

full column

3784918

yzhang93 reviewed Oct 8, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEPackAndTranspose.cpp Outdated Show resolved Hide resolved

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEPackAndTranspose.cpp Outdated Show resolved Hide resolved

yzhang93 reviewed Oct 8, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/Passes.cpp Outdated Show resolved Hide resolved

yzhang93 approved these changes Oct 8, 2024

View reviewed changes

newling added 2 commits October 8, 2024 16:51

fixes for full column, review comment addressing

f51056d

Merge branch 'main' into packing_for_convolution

7243b2f

newling merged commit c84cca0 into nod-ai:main Oct 9, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Convolution] Packing + objectFifo, initial support #789

[Convolution] Packing + objectFifo, initial support #789

newling commented Sep 19, 2024 •

edited

Loading

yzhang93 commented Sep 20, 2024 •

edited by newling

Loading

newling commented Sep 23, 2024

newling commented Sep 26, 2024

makslevental commented Sep 26, 2024

newling commented Sep 26, 2024

yzhang93 commented Sep 26, 2024

makslevental commented Sep 26, 2024

newling commented Sep 26, 2024

newling commented Oct 4, 2024

[Convolution] Packing + objectFifo, initial support #789

[Convolution] Packing + objectFifo, initial support #789

Conversation

newling commented Sep 19, 2024 • edited Loading

yzhang93 commented Sep 20, 2024 • edited by newling Loading

newling commented Sep 23, 2024

newling commented Sep 26, 2024

makslevental commented Sep 26, 2024

newling commented Sep 26, 2024

yzhang93 commented Sep 26, 2024

makslevental commented Sep 26, 2024

newling commented Sep 26, 2024

newling commented Oct 4, 2024

newling commented Sep 19, 2024 •

edited

Loading

yzhang93 commented Sep 20, 2024 •

edited by newling

Loading