Replies: 3 comments 9 replies
-
Do you have a mock up of how this would look for a few use cases? |
Beta Was this translation helpful? Give feedback.
9 replies
-
Here's the current proposal: Defining a simple BD on the NPU shim aie.bd_chain @my_bd(%buf: memref<1xi32>) {
aie.dma_bd(%buf : memref<1xi32>, 0, 1)
aie.end
}
func.func @sequence(%arg0: memref<1xi32>, %arg1: memref<1xi32>) {
%tkn1 = aiex.npu.start_bds @my_bd(%arg0) on (%tile00, MM2S, 0) {issue_token = 1}
aiex.npu.wait(%tkn1) Chain of BDs on the shim aie.bd_chain @my_chain(%bufA: memref<1xi32>, %bufB: memref<1xi32>) {
^bd0:
aie.dma_bd(%bufA : memref<1xi32>, 0, 1, [<size = 1, stride = 2>, <size = 3, stride = 4>])
aie.next_bd ^bd1
^bd1:
aie.dma_bd(%bufB : memref<1xi32>, 0, 1, [<size = 1, stride = 2>])
aie.end
}
func.func @sequence(%arg0: memref<1xi32>, %arg1: memref<1xi32>) {
%tkn1 = aiex.npu.start_bds @my_chain(%arg0, %arg1) on (%tile00, MM2S, 0) {issue_token = 1}
aiex.npu.wait(%tkn1)
Example Use of new BD Chains on aie.bd_chain @ping_pong_pattern(%ping: memref<1xi32>, %pong: memref<1xi32>, %acq_lock: index, %rel_lock: index) {
^bb1:
aie.use_lock(%ping_acq_lock, AcquireGreaterEqual, 1)
aie.dma_bd(%ping : memref<1xi32>, 0, 1)
aie.use_lock(%ping_rel_lock, Release, 1)
aie.next_bd ^bb2
^bb2:
aie.use_lock(%pong_acq_lock, AcquireGreaterEqual, 1)
aie.dma_bd(%pong : memref<1xi32>, 0, 1)
aie.use_lock(%pong_rel_lock, Release, 1)
aie.next_bd ^bb1
}
aie.mem(%tile02) {
aie.dma_start_task @ping_pong_pattern (%fifo_output_buff_0, %fifo_output_buff_1, %fifo_output_cons_lock, %fifo_output_prod_lock) on (MM2S, 0)
}
Edit: Updated the last example to use a new syntax using |
Beta Was this translation helpful? Give feedback.
0 replies
-
Next steps:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We have some operations currently, such as
npu_writebd
,npu_push_queue
andnpu_dma_memcpy_nd
that are a very thin wrapper around register writes that use integer values for buffer descriptor IDs.I think there is an opportunity here to do something slightly smarter, and use SSA values with a new BD type instead. This SSA value would be returned from a configure operation, for example, or a dedicated allocation operation.
Why? It probably almost never matters what the actual integer value of the BD ID is. What does matter is that you reference the same BD across operations, e.g. you want to make sure that you push the same BD to the task queue using
npu_push_queue
as the one you just configured usingnpu_writebd
.Integers fail to make this easy; the compiler will happily emit code that pushes a BD ID to the task queue that is not configured.
With SSA values, we can make it impossible to push a BD to the queue that has not been configured.
Pros
Cons
next_bd
currently, you can set this to a BD that you have not configured yet. When using SSA values, you would have to configure them in reverse order, so you have an SSA value available to reference the next BD. In the same context, we would have to think about how to handle cycles.To do
Beta Was this translation helpful? Give feedback.
All reactions