Division Algorithms in FPGAs
The following two algorithms are being examined:
uint<N>_t numerator = num; // input
uint<N>_t denominator = den; // input
uint<N>_t quotient = 0;
while (numerator >= denominator) {
numerator -= denominator;
++quotient;
}
uint<N>_t quo = quotient; // output
uint<N>_t rmn = numerator; // output
The implemented algorithm is documented in a .ppt
about computer arithmetic from the UCCS.
uint<2*N + 1>_t numerator = num; // input
uint<N>_t denominator = den; // input
uint<N>_t quotient = 0;
uint<N>_t remainder = 0;
uint<log2(N-1)>_t k;
uint<N>_t ap = 0;
uint<N>_t an = 0;
k = leading_zeros(denominator);
denominator <<= k;
numerator <<= k;
for (int i = 0; i < N; ++i) {
switch (top_3_bits(numerator)) {
case 0:
case 7:
ap <<= 1;
an <<= 1;
numerator <<= 1;
break;
case 4:
case 5:
case 6:
ap <<= 1;
an = (an << 1) | 1;
numerator <<= 1;
numerator += (denominator << N);
break;
default: // case 1, 2, 3
ap = (ap << 1) | 1;
an <<= 1;
numerator <<= 1;
numerator -= (denominator << N);
break;
}
}
quotient = ap - an;
remainder = numerator >> N;
if (msb(numerator) == 1) {
remainder += denominator;
--quotient;
}
uint<N>_t quo = quotient; // output
uint<N>_t rmn = remainder >> k; // output
Notes:
ap
can be integrated into the N LSBs of thenumerator
to save N flip-flops (see VHDL code)numerator
is namedp
in the VHDL code (named(P,A)
in the.ppt
)num
is namedA
andden
is namedB
in the.ppt
The performance of the algorithms is measured in termns of area, timing, throughput and latency (ATTL).
10 bits | 12 bits | N bits | ||||
---|---|---|---|---|---|---|
Rept. - | SRT | Rept. - | SRT | Rept. - | SRT | |
Area (#ff) | a | b | c | d | ? | ? |
Timing (MHz) | x | y | x | y | x | y |
Throughput (bits/clk) | 2.3 | 1 | 2.4 | 1 | ? | 1 |
Avg. Latency (#clk) | 4.3 *) | 10 | 5.0 *) | 12 | ? | N |
*) Determined by Simulation
a, b, c, d, x, y: TBD
10 bits | 12 bits | N bits | ||||
---|---|---|---|---|---|---|
Rept. - | SRT | Rept. - | SRT | Rept. - | SRT | |
Area (#ff) | a | b | c | d | ? | ? |
Timing (MHz) | u | v | w | x | ? | ? |
Throughput (bits/clk) | 2.3 * M | M | 2.4 * M | M | ? | M |
Avg. Latency (#clk) | 4.3 / M | 10 / M | 5.0 / M | 12 / M | ? | N / M |
M: Iterations per Cycle
a, b, c, d, u, v, w, x: TBD
Copyright © 2019 Dominik Müller and Nico Canzani
This project is licensed under the terms of the Apache License 2.0 - see the LICENSE file for details