Failure running conv2d i32 with objectFifo #619

newling · 2024-07-31T15:13:41Z

Checkpoints in IR:

IR Dump Before AssignTargetDevicesPass
IR Dump Before FoldMemRefAliasOps
IR Dump Before AMDAIENormalizeLoopBounds
IR Dump Before AMDAIEInsertCores

Core dumped in AMDAIEInsertCores.

first_run_conv.txt

newling · 2024-08-02T23:17:06Z

@jtuyls something new introduced here. Our conv tiling creates interleaved L2->L1 copies with compute:

...
 scf.forall (%arg2, %arg3) in (2, 2) {   // the threads
    scf.for %arg4 = %c0 to %c16 step %c1 {
        %7 = amdaie.dma_cpy_nd ...
        ...  
        %13 = vector.contract ... 
     }
}

This is new compared to current matmul, where amdaie.dma_cpy_nd is not in scf.for.

Question: should the insert-cores op should turn this into code like

 scf.forall (%arg2, %arg3) in (2, 2) {
    scf.for %arg4 = %c0 to %c16 step %c1 {
        ...
        %7 = amdaie.dma_cpy_nd
        ...
     }
     ... 
     amd.core(...) {
        scf.for %arg4 = %c0 to %c16 step %c1 {
         vector.contract etc.
        }
     }
}

i.e. should we duplicate loop structure for dma_cpy_nps and the core ? (with the consumer/producer stuff inserted as usual too)

newling · 2024-08-03T22:23:59Z

Update: probably need some sort of loop subsumption. Hoisting the dma_cpy_nd ops out of the scf.for loops

This PR switches all numerical convolution tests to use the objectFifo pipeline. With respect to the new tiling strategy: 1) A single **column** is currently used. Targeting multiple columns results in ` error: 'aie.memtile_dma' op could not find and assign a valid BD id`. This will will be investigated as follow-up work: #821 2) There is no longer interleaving of compute and L2->L1 data movement, which means #619 becomes low priority / obsolete 3) L3->L2, L2->L3 still uses padding. But L2->L1, L1->L2 uses packing. 4) Channel-first convolution is completely unsupported, we expect high level transforms to convert to channel last before reaching our backend. 5) Vectorization is not currently enabled, due to issues with alignment. See follow-up task #820. This is functionally ok for now, as peano can scalarize code for all data types.

newling mentioned this issue Jul 31, 2024

[ObjectFifo][AMDAIEInsertCores] Fail gracefully when there is no thread mapping #623

Closed

newling mentioned this issue Aug 4, 2024

[LoopSubsumption] Make pass fail when no device attribute available #643

Merged

newling mentioned this issue Sep 19, 2024

[Convolution] Packing + objectFifo, initial support #789

Merged

newling closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure running conv2d i32 with objectFifo #619

Failure running conv2d i32 with objectFifo #619

newling commented Jul 31, 2024

newling commented Aug 2, 2024

newling commented Aug 3, 2024

Failure running conv2d i32 with objectFifo #619

Failure running conv2d i32 with objectFifo #619

Comments

newling commented Jul 31, 2024

newling commented Aug 2, 2024

newling commented Aug 3, 2024