-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the DSP primitive. #239
Conversation
For chips that have these capabilities, a DSP implementation has been added in the form of all the primitives described in the Gowin documentation (UG287-1.3.3E_Gowin Digital Signal Processing (DSP) User Guide), namely: - PADD9 - PADD18 - MULT9X9 - MULT18X18 - MULT36X36 - MULTALU18X18 - MULTALU36X18 - MULTADDALU18X18 - ALU54D The most complex but also the most useful is the MULTADDALU18X18 primitive - it allows you to easily make a typical FIR filter, while all connections between these primitives in the chain will be implemented by direct fixed wires with minimal delay. MULT36X36 are not combined into chains, but they have a different task - this primitive can be found in Linux SOCs. Added examples (in the examples/himbaechel directory) that are based on the tiny Riscv demonstrating UART calculations. Only the TXD pin is used (can be found in the specific .CST file for each board), so on the large computer side, only GND and RXD are enough. Port speed 115200, no parity, 8 data bits, 1 stop bit, linefeed only. Picocom launch example: ``` shell picocom -l --imap lfcrlf -b 115200 /dev/ttyU0 ``` The source code for the riscv test programs is provided along with the assembly instructions, but they are not built during the compilation of the examples due to additional compilers. Implemented the combination of primitives into chains using wires CASO-CASI, SO(A, B)-SI(A, B), as well as SBO-SBI for PADD. Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
This is not a simple thing and the number of combinations of these building blocks is quite large so that I foresee the need for easy correction of detected errors, and therefore the code contains some repetition of pieces. Non-optimality: as can be seen from the set of primitives, Gowin provides already indivisible combinations (for example Mult and Alu) that are well packaged and connected inside. However, there is one exception: pre-adders. PADD18 and PADD9 exist as separate primitives and that's how we currently code them, but it's actually a small piece inside the DSP block. Gowin packages them with other primitives, we don't currently. So keep in mind that when using pre-adders, additional delay may occur and more DSP blocks will definitely be occupied compared to Gowin. We will solve this issue in the next version of the DSP with a more sophisticated packaging algorithm when we are convinced of the correct functioning of the primitives in principle. |
Super exciting! Hopefully I can find some time to test and review soon. |
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
|
||
bels['MULT36X36'] = Bel() # entire DSP mult36x36 | ||
|
||
return bels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the DSP blocks just not have any modes or flags at all? Just fixed function blocks with inputs and outputs? Hm interesting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# B: 0-17 0-17 0-17 | | ||
# 18-35 | 0-17 0-17 | ||
# The ALU54D outputs turned out to be the easiest to find. | ||
# outputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cool to add some docs about their layout and connections at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Only a few of the diagrams in the DSP documentation shed any light on the implementation, and the diagram they provide for MULT36X36 is not one of them.
I need a good diagramming program.
'OPCDDYN_INV_7': 447, | ||
'OPCDDYN_INV_8': 448, | ||
'OPCDDYN_INV_9': 449, | ||
'OPCDDYN_INV_10': 450, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As usual, I'm not super happy with hardcoded constants so anything we can do to document where they came from would be great. Of course even better would be to obtain them programmatically but that's of course not always possible.
Another fun challenge is going to be to teach Yosys about these cells. what's a bit weird to me is why they seem to map $mul cells and leave $macc cells alone, while I would think you want to map $macc cells to, well, multiply accumulate cells. But maybe the idea is that the MUL18 is the fundamental building block and the rest is just macro cells? But I don't think our ALU pass knows about PADD and ALU54 cells, so for stuff like FIR filters, generating some MULTALU cell sound more effective? wdyt @gatecat ? |
For chips that have these capabilities, a DSP implementation has been added in the form of all the primitives described in the Gowin documentation (UG287-1.3.3E_Gowin Digital Signal Processing (DSP) User Guide), namely:
The most complex but also the most useful is the MULTADDALU18X18 primitive - it allows you to easily make a typical FIR filter, while all connections between these primitives in the chain will be implemented by direct fixed wires with minimal delay.
MULT36X36 are not combined into chains, but they have a different task - this primitive can be found in Linux SOCs.
Added examples (in the examples/himbaechel directory) that are based on the tiny Riscv demonstrating calculations using UART. Only the TXD pin is used (can be found in the specific .CST file for each board), so on the large computer side, only GND and RXD are enough. Port speed 115200, no parity, 8 data bits, 1 stop bit, linefeed only.
Picocom launch example:
The source code for the riscv test programs is provided along with the assembly instructions, but they are not built during the compilation of the examples due to additional compilers.
Implemented the combination of primitives into chains using wires CASO-CASI, SO(A, B)-SI(A, B), as well as SBO-SBI for PADD.