Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the DSP primitive. #239

Merged
merged 4 commits into from
Mar 31, 2024
Merged

Implement the DSP primitive. #239

merged 4 commits into from
Mar 31, 2024

Conversation

yrabbit
Copy link
Collaborator

@yrabbit yrabbit commented Mar 28, 2024

For chips that have these capabilities, a DSP implementation has been added in the form of all the primitives described in the Gowin documentation (UG287-1.3.3E_Gowin Digital Signal Processing (DSP) User Guide), namely:

  • PADD9
  • PADD18
  • MULT9X9
  • MULT18X18
  • MULT36X36
  • MULTALU18X18
  • MULTALU36X18
  • MULTADDALU18X18
  • ALU54D

The most complex but also the most useful is the MULTADDALU18X18 primitive - it allows you to easily make a typical FIR filter, while all connections between these primitives in the chain will be implemented by direct fixed wires with minimal delay.

MULT36X36 are not combined into chains, but they have a different task - this primitive can be found in Linux SOCs.

Added examples (in the examples/himbaechel directory) that are based on the tiny Riscv demonstrating calculations using UART. Only the TXD pin is used (can be found in the specific .CST file for each board), so on the large computer side, only GND and RXD are enough. Port speed 115200, no parity, 8 data bits, 1 stop bit, linefeed only.

Picocom launch example:

picocom -l --imap lfcrlf -b 115200 /dev/ttyU0

The source code for the riscv test programs is provided along with the assembly instructions, but they are not built during the compilation of the examples due to additional compilers.

Implemented the combination of primitives into chains using wires CASO-CASI, SO(A, B)-SI(A, B), as well as SBO-SBI for PADD.

For chips that have these capabilities, a DSP implementation has been
added in the form of all the primitives described in the Gowin
documentation (UG287-1.3.3E_Gowin Digital Signal Processing (DSP) User
Guide), namely:

  - PADD9
  - PADD18
  - MULT9X9
  - MULT18X18
  - MULT36X36
  - MULTALU18X18
  - MULTALU36X18
  - MULTADDALU18X18
  - ALU54D

The most complex but also the most useful is the MULTADDALU18X18
primitive - it allows you to easily make a typical FIR filter, while all
connections between these primitives in the chain will be implemented by
direct fixed wires with minimal delay.

MULT36X36 are not combined into chains, but they have a different task -
this primitive can be found in Linux SOCs.

Added examples (in the examples/himbaechel directory) that are based on
the tiny Riscv demonstrating UART calculations. Only the TXD pin is used
(can be found in the specific .CST file for each board), so on the large
computer side, only GND and RXD are enough. Port speed 115200, no
parity, 8 data bits, 1 stop bit, linefeed only.

Picocom launch example:

``` shell
picocom -l --imap lfcrlf -b 115200 /dev/ttyU0
```

The source code for the riscv test programs is provided along with the
assembly instructions, but they are not built during the compilation of
the examples due to additional compilers.

Implemented the combination of primitives into chains using wires
CASO-CASI, SO(A, B)-SI(A, B), as well as SBO-SBI for PADD.

Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
@yrabbit
Copy link
Collaborator Author

yrabbit commented Mar 28, 2024

This is not a simple thing and the number of combinations of these building blocks is quite large so that I foresee the need for easy correction of detected errors, and therefore the code contains some repetition of pieces.

Non-optimality: as can be seen from the set of primitives, Gowin provides already indivisible combinations (for example Mult and Alu) that are well packaged and connected inside. However, there is one exception: pre-adders. PADD18 and PADD9 exist as separate primitives and that's how we currently code them, but it's actually a small piece inside the DSP block. Gowin packages them with other primitives, we don't currently.

So keep in mind that when using pre-adders, additional delay may occur and more DSP blocks will definitely be occupied compared to Gowin.

We will solve this issue in the next version of the DSP with a more sophisticated packaging algorithm when we are convinced of the correct functioning of the primitives in principle.

@pepijndevos
Copy link
Member

Super exciting! Hopefully I can find some time to test and review soon.


bels['MULT36X36'] = Bel() # entire DSP mult36x36

return bels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the DSP blocks just not have any modes or flags at all? Just fixed function blocks with inputs and outputs? Hm interesting!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the most part this is true - separate blocks with their own inputs and outputs. We glue them together like a 4-d multiplier to get Mult36x36.

dsp

# B: 0-17 0-17 0-17 |
# 18-35 | 0-17 0-17
# The ALU54D outputs turned out to be the easiest to find.
# outputs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cool to add some docs about their layout and connections at some point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Only a few of the diagrams in the DSP documentation shed any light on the implementation, and the diagram they provide for MULT36X36 is not one of them.
I need a good diagramming program.

'OPCDDYN_INV_7': 447,
'OPCDDYN_INV_8': 448,
'OPCDDYN_INV_9': 449,
'OPCDDYN_INV_10': 450,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, I'm not super happy with hardcoded constants so anything we can do to document where they came from would be great. Of course even better would be to obtain them programmatically but that's of course not always possible.

@pepijndevos
Copy link
Member

Another fun challenge is going to be to teach Yosys about these cells.
For reference, this seems to be how ecp5 handles it https://github.com/YosysHQ/yosys/blob/b9d3bffda5abcbc5356936a7192c4a3c2b427c3e/techlibs/ecp5/synth_ecp5.cc#L298-L302

what's a bit weird to me is why they seem to map $mul cells and leave $macc cells alone, while I would think you want to map $macc cells to, well, multiply accumulate cells. But maybe the idea is that the MUL18 is the fundamental building block and the rest is just macro cells?

But I don't think our ALU pass knows about PADD and ALU54 cells, so for stuff like FIR filters, generating some MULTALU cell sound more effective? wdyt @gatecat ?

@pepijndevos pepijndevos merged commit 91807b0 into YosysHQ:master Mar 31, 2024
12 of 14 checks passed
@yrabbit yrabbit deleted the dsp branch August 21, 2024 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants