Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthesis with -nowidelut gives drastically better results #4798

Open
t-wallet opened this issue Dec 5, 2024 · 1 comment
Open

Synthesis with -nowidelut gives drastically better results #4798

t-wallet opened this issue Dec 5, 2024 · 1 comment
Labels
pending-verification This issue is pending verification and/or reproduction

Comments

@t-wallet
Copy link

t-wallet commented Dec 5, 2024

Version

0.45+139

On which OS did this happen?

Linux

Reproduction Steps

While synthesizing a SHA3 design on the Colorlight 5A-75B (Lattice ECP5 FPGA) board, I noticed that the LUT usage was way higher than when I used Vivado to synthesize the same design on a Xilinx board I own. It turns out that synthesizing with -nowidelut drastically reduced resource usage and significantly improved timing as well.

The problem was in the theta step of the algorithm:

module sha3theta (
  input  wire[4:0][4:0][63:0] i_state,
  output wire[4:0][4:0][63:0] o_state
);

  wire[4:0][63:0] sum_sheet;

  genvar i;
  generate
  for (i = 0; i < 5; i++) begin
    assign sum_sheet[i] =
      i_state[0][i] ^
      i_state[1][i] ^
      i_state[2][i] ^
      i_state[3][i] ^
      i_state[4][i];
  end
  endgenerate

  genvar row, col;
  generate
  for (row = 0; row < 5; row++) begin
    for (col = 0; col < 5; col++) begin
      assign o_state[row][col] =
        i_state[row][col] ^
        sum_sheet[(col - 1) % 5] ^
        {sum_sheet[(col + 1) % 5][62:0], sum_sheet[(col + 1) % 5][63:63]};
    end
  end
  endgenerate;

endmodule

Synthesizing this module with -nowidelut gives the following resource usage:

Number of cells: 1972
     LUT4 1972

And without the flag:

Number of cells: 9512
     L6MUX21 1188
     LUT4 5656
     PFUMX 2668

Expected Behavior

I would expect that at least the timing would be improved by the usage of the wide muxes found on the ECP5 FPGA.

Actual Behavior

Synthesizing the full design with -nowidelut improved the timing, Fmax being 90 MHz, vs 60 MHz without the flag. Of course I verified that this module was in the critical path.

@t-wallet t-wallet added the pending-verification This issue is pending verification and/or reproduction label Dec 5, 2024
@Ravenslofty
Copy link
Collaborator

A situation like this occurs when ABC9 has a mismatch between predicted and actual delay.

Without -nowidelut, ABC9 predicts 1.225ns of delay, while with -nowidelut ABC9 predicts 1.457ns of delay.

For fun, I set the ABC9 delay target to 10ns, and got a solution using 1920 LUT4s.

Out of curiosity, what's your testing methodology here? Did you just run the design once under nextpnr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-verification This issue is pending verification and/or reproduction
Projects
None yet
Development

No branches or pull requests

2 participants