Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure running conv2d i32 with objectFifo #619

Closed
newling opened this issue Jul 31, 2024 · 2 comments
Closed

Failure running conv2d i32 with objectFifo #619

newling opened this issue Jul 31, 2024 · 2 comments

Comments

@newling
Copy link
Contributor

newling commented Jul 31, 2024

Checkpoints in IR:

IR Dump Before AssignTargetDevicesPass
IR Dump Before FoldMemRefAliasOps
IR Dump Before AMDAIENormalizeLoopBounds
IR Dump Before AMDAIEInsertCores

Core dumped in AMDAIEInsertCores.

first_run_conv.txt

@newling
Copy link
Contributor Author

newling commented Aug 2, 2024

@jtuyls something new introduced here. Our conv tiling creates interleaved L2->L1 copies with compute:

...
 scf.forall (%arg2, %arg3) in (2, 2) {   // the threads
    scf.for %arg4 = %c0 to %c16 step %c1 {
        %7 = amdaie.dma_cpy_nd ...
        ...  
        %13 = vector.contract ... 
     }
}

This is new compared to current matmul, where amdaie.dma_cpy_nd is not in scf.for.

Question: should the insert-cores op should turn this into code like

 scf.forall (%arg2, %arg3) in (2, 2) {
    scf.for %arg4 = %c0 to %c16 step %c1 {
        ...
        %7 = amdaie.dma_cpy_nd
        ...
     }
     ... 
     amd.core(...) {
        scf.for %arg4 = %c0 to %c16 step %c1 {
         vector.contract etc.
        }
     }
}

i.e. should we duplicate loop structure for dma_cpy_nps and the core ? (with the consumer/producer stuff inserted as usual too)

@newling
Copy link
Contributor Author

newling commented Aug 3, 2024

Update: probably need some sort of loop subsumption. Hoisting the dma_cpy_nd ops out of the scf.for loops

@newling newling closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2024
newling added a commit that referenced this issue Oct 9, 2024
This PR switches all numerical convolution tests to use the objectFifo
pipeline. With respect to the new tiling strategy:

1) A single **column** is currently used. Targeting multiple columns
results in ` error: 'aie.memtile_dma' op could not find and assign a
valid BD id`. This will will be investigated as follow-up work:
#821

2) There is no longer interleaving of compute and L2->L1 data movement,
which means #619 becomes
low priority / obsolete

3) L3->L2, L2->L3 still uses padding. But L2->L1, L1->L2 uses packing. 

4) Channel-first convolution is completely unsupported, we expect high
level transforms to convert to channel last before reaching our backend.

5) Vectorization is not currently enabled, due to issues with alignment.
See follow-up task #820.
This is functionally ok for now, as peano can scalarize code for all
data types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant