Skip to content

MuellerDominik/fpga-division

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fpga-division

Division Algorithms in FPGAs

Algorithms

The following two algorithms are being examined:

Repeated subtraction (Rept. -)

C style pseudocode:

uint<N>_t numerator = num;        // input
uint<N>_t denominator = den;      // input

uint<N>_t quotient = 0;

while (numerator >= denominator) {
  numerator -= denominator;
  ++quotient;
}

uint<N>_t quo = quotient;         // output
uint<N>_t rmn = numerator;        // output

SRT division

The implemented algorithm is documented in a .ppt about computer arithmetic from the UCCS.

C style pseudocode:

uint<2*N + 1>_t numerator = num;  // input
uint<N>_t denominator = den;      // input

uint<N>_t quotient = 0;
uint<N>_t remainder = 0;
uint<log2(N-1)>_t k;
uint<N>_t ap = 0;
uint<N>_t an = 0;

k = leading_zeros(denominator);
denominator <<= k;
numerator <<= k;

for (int i = 0; i < N; ++i) {
  switch (top_3_bits(numerator)) {
    case 0:
    case 7:
      ap <<= 1;
      an <<= 1;
      numerator <<= 1;
      break;
    case 4:
    case 5:
    case 6:
      ap <<= 1;
      an = (an << 1) | 1;
      numerator <<= 1;
      numerator += (denominator << N);
      break;
    default:  // case 1, 2, 3
      ap = (ap << 1) | 1;
      an <<= 1;
      numerator <<= 1;
      numerator -= (denominator << N);
      break;
  }
}

quotient = ap - an;
remainder = numerator >> N;

if (msb(numerator) == 1) {
  remainder += denominator;
  --quotient;
}

uint<N>_t quo = quotient;         // output
uint<N>_t rmn = remainder >> k;   // output

Notes:

  • ap can be integrated into the N LSBs of the numerator to save N flip-flops (see VHDL code)
  • numerator is named p in the VHDL code (named (P,A) in the .ppt)
  • num is named A and den is named B in the .ppt

Performance (ATTL)

The performance of the algorithms is measured in termns of area, timing, throughput and latency (ATTL).

Iterative Approach

10 bits 12 bits N bits
Rept. - SRT Rept. - SRT Rept. - SRT
Area (#ff) a b c d ? ?
Timing (MHz) x y x y x y
Throughput (bits/clk) 2.3 1 2.4 1 ? 1
Avg. Latency (#clk) 4.3 *) 10 5.0 *) 12 ? N

*) Determined by Simulation
a, b, c, d, x, y: TBD

Unrolled Approach

10 bits 12 bits N bits
Rept. - SRT Rept. - SRT Rept. - SRT
Area (#ff) a b c d ? ?
Timing (MHz) u v w x ? ?
Throughput (bits/clk) 2.3 * M M 2.4 * M M ? M
Avg. Latency (#clk) 4.3 / M 10 / M 5.0 / M 12 / M ? N / M

M: Iterations per Cycle
a, b, c, d, u, v, w, x: TBD

License

Copyright © 2019 Dominik Müller and Nico Canzani

This project is licensed under the terms of the Apache License 2.0 - see the LICENSE file for details