Post-Synthesis leakage power minimization procedure for Synopsys' PrimeTime, submitted to the low-power contest of the 'Synthesis and Optimization of Digital Systems' course at PoliTO.
The power consumption of a CMOS digital circuit results from static and dynamic contributions. The dynamic power is dissipated when the circuit is active, that is, whenever transitions occur on a net due to input stimuli, and is further divided into switching and internal power:
- The switching power is associated with the charging and discharging of cells' capacitive loads, which include net and gate capacitances.
- The internal power is dissipated within the cell boundaries, not only by charging and discharging capacitances internal to the cell, but also when both pull-up and pull-down networks happen to conduct mid-commutation.
Conversely, the static power is dissipated when the circuit is inactive and arises from transistor non-idealities, primarily the reverse pn-junction current, the thin-oxide gate current, and the sub-threshold current; accordingly, it is also known as leakage power.
Considering that for deep submicron technologies leakage power is approaching dynamic power, its minimization has been investigated at different levels of the VLSI design flow. While dealing with non-idealities typically pertains to transistor engineering done by the silicon vendor, the exponential dependence of the sub-threshold current on the transistor's sub-threshold voltage (Vt) makes Vt-sizing an effective optimization technique. Silicon vendors offer technologies where gates are available in two or more threshold voltage groups, each with different trade-offs of timing and leakage characteristics; it is then up to CAD tool vendors to implement algorithms that leverage these libraries to meet timing and power requirements.
Given a gate-level netlist synthesized with cells at the lowest Vt option, the problem is to implement a procedure to run in Synopsys' PrimeTime to perform multi-Vt cell assignment, such that:
- The leakage power is minimized.
- The worst slack is not negative.
- The cells retain their footprint.
- The optimization lasts no longer than 3 minutes.
The design kit comprises a technology library, the ST CMOS 65 nm in nominal conditions, with 3 Vt options, LVT, SVT, and HVT, and state-dependent cell leakage power characterization data. The effectiveness of the optimization is measured on ISCAS85 benchmark circuits, the c1908 and c5315, in terms of leakage power saving:
The thresholds to meet are the following:
Benchmark | Clock Period (ns) | S Threshold |
---|---|---|
c1908 | 2.0 | > 85% |
c1908 | 1.5 | > 55% |
c1908 | 1.0 | > 15% |
c5315 | 2.0 | > 90% |
c5315 | 1.5 | > 70% |
c5315 | 1.0 | > 25% |
Vt sizing is an inherently discrete optimization problem, which is proved to be NP-hard; finding an optimal solution would be computationally prohibitive even for modest benchmark designs. Hence, I searched the literature for heuristics known to achieve good results under timing constraints. The solution strategy proposed in [1] revolves around a cost function that is globally aware of the entire circuit. The ideal cell to select for a Vt increment is one that yields the highest leakage saving, while also consuming the least amount of total available slack:
Ignoring the timing recovery through cell upsizing proposed by the authors, which would violate contest rules, their Vt selection algorithm is arranged as a greedy optimization. First, cells are ranked according to the cost function above, where the worst slack reduction is computed by trial swapping one cell at a time to the next higher Vt. Those cells for which this operation doesn't lead to a timing violation are swappable candidates to be processed in the order of least cost: each cell is swapped again to the higher Vt option and the change is accepted if it doesn't cause a timing violation. Finally, the ranking operation is repeated and the loop continues so long as it is possible to identify swappable candidates.
Preliminary implementations of this heuristic achieved high quality results, satisfying all contest target thresholds, but where violating the runtime constraint. The submitted solution is optimized for speed, with hand crafted solutions, rather than relying on ECO predictions through estimate_eco
:
-
Prior to any optimization,
initLeakagePowerLut
analyzes each cell in the design not already at the highest Vt option. It compiles in a dictionary the leakage power saving that would result from any subsequent Vt increment, say from LVT to SVT and from SVT to HVT for a cell originally at LVT. This operation has a crucial impact on performance. Every time a cell is swapped to a higher Vt alternative with thesize_cell
command, leakage power annotations are invalidated and accessing any leakage power attribute triggers an expensive power update, with full switching activity propagation. As a consequence,initLeakagePowerLut
is devised to limit power updates to the number of available Vt options. -
The global cost function proposed in [1] is expensive to compute. In the worst case, the ranking operation involves trial swapping each design cell to the next higher Vt, one at a time. Not only this means resizing each cell twice, but the computation of the worst slack reduction triggers an incremental timing update for each "what if" scenario. The issue is mitigated by executing
localOptimization {select_pct derate_pct}
, a faster optimization loop that ranks design cells with a cost function that rewards large slack availability in the worst timing path through the cell, rather than a global slack metric. The arguments control the swapping speed: the former sets the percentage of cells not already at the highest Vt option to wholesale swap. Should this cause a timing violation, the latter argument dictates the percentage of cells from the previous attempt to retry swapping. -
Ranking operations implicitly trigger timing updates when necessary. However, it seems that these timing computations are less accurate and could result in exiting the optimization loop with a negative slack. For this reason, when Vt changes are finalized, the worst slack evaluation is performed only after explicitly triggering a full timing update.
-
globalOptimization {start_time_ms max_runtime_ms}
based on [1] might still exceed the contest's runtime limit. To address this, the optimization procedure is made "time aware". Each iteration is timed in milliseconds since the epoch, and an exponential moving average helps predict the duration of the next iteration. Should it complete afterstart_time_ms
+max_runtime_ms
, the optimization stops early. -
Technology-specific details are collected in the
::STcmos65
namespace. In particular, the dictionaryVT_LUT
is accessed through the helper proceduregetVtAlternative
to generate library cells' full names for performing Vt swaps.
Runs with various server workloads showed that the proposed solution meets the contest's thresholds. The table below summarizes the typical results in the proposed benchmarks, comparing them with Synopsys' leakage power minimization ECO flow. This is achieved by invoking fix_eco_power -pattern_priority {HS65_LH HS65_LS HS65_LL}
. The missing data in the highest effort scenarios is due to unexpected terminations of the flow with timing violations, despite having attempted to enforce a zero setup margin configuration.
Benchmark | Clock Period (ns) | S Threshold | fix_eco_power S / Time (s) |
multiVth S / Time (s) |
---|---|---|---|---|
c1908 | 2.0 | > 85% | 87.591 % / 3.603 | 95.483 % / 23.181 |
c1908 | 1.5 | > 55% | slack: -0.006782 | 65.964 % / 27.731 |
c1908 | 1.0 | > 15% | slack: -0.000680 | 16.674 % / 40.222 |
c5315 | 2.0 | > 90% | 74.568 % / 8.114 | 97.730 % / 67.213 |
c5315 | 1.5 | > 70% | 58.652 % / 8.348 | 82.333 % / 125.87 |
c5315 | 1.0 | > 25% | slack: -0.003441 | 25.416 % / 117.641 |
[1] M. Rahman and C. Sechen, "Post-synthesis leakage power minimization," 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 2012, pp. 99-104, doi: 10.1109/DATE.2012.6176440.
[2] Synopsys, Inc., "PrimeTime User Guide"
[3] Synopsys, Inc., "Power Compiler User Guide"