RTL Power Estimation methods and strategies
Sampath VP
ASIC/FPGA Design Professional | SoC Architecture | Crafting Cutting-Edge Solutions in VLSI Design
RTL power estimation techniques
RTL power estimation techniques require following inputs: (i) design description in Verilog/VHDL or other hardware description language, (ii) RTL simulation trace in the format of Value Change Dump (VCD), Fast Signal Database (FSDB), etc. and (iii) power characterized libraries (e.g. standard cell power libraries) to correctly estimate the power consumption of a design.
It is not feasible to utilize these techniques for system-level Architectural exploration and optimization because such techniques and flows are targeted to be specifically used by the RTL design teams. On the other hand, a good ESL power estimation methodology should allow an ESL designer, architect or modeler to quickly and accurately gauge the effect of various high-level modifications on the power consumption of the design without being required to completely move to RTL power estimation flow. Such a methodology will significantly reduce the complexity of the power estimation process at the system level approach for power analysis can be called as mixed probabilistic approach.
During this process, activities of the input-output ports and some intermediate signals are correctly imported from the system-level simulation dumps. Using this information, faster probabilistic power estimation is done at RTL which contributes to the overall speedup. Based on the accuracy of probabilistic power estimation, overall results can be improved. Furthermore, an accurate mapping of the system level variables to the corresponding RTL signals also contributes to the accuracy of such a power analysis methodology. Some high-level synthesis tools generate such a mapping file which can be used to further improve the power estimation accuracy .
Figure 2. Estimating power in Spyglass
The design files are the RTL code which describes the design. The tool analyzes the RTL code and translates it to gate-level information, for power analysis.A power model is needed to estimate leakage and internal power dissipated for each type of cell, this is provided by the power models in the lib files. To model switching activity several file formats are available, FSDB (Fast Signal Database), VCD (Value Changed Dump), SAIF (Switching Activity Interchange Format).The switching activity of a design refers to how often different nets changes the signal level. This information is dumped from an RTL or Net list simulation tool, and used to estimate the dynamic power consumption of the circuit. Value change dump is an event-based format that logs every value change made by each signal, and the time at which the change occurred.
The Switching Activity Interface Format (SAIF) file logs the average activity of each signal in a simulation. Fast Signal Database (FSDB) is an event-based format, similarly to the VCD, which logs each toggle in every signal. Its representation is binary, while VCD is ASCII making a more compact representation and smaller file sizes. FSDB is a file system data base. FSDB provides a thread-safe, process-safe Database class which uses the native file system as its back end and allows multiple file formats and serialization methods.
The activity files are dumped from RTL simulation in Questasim. Based on the RTL code and firmware of the given scenario, the FSDB dumper outputs the activity files from RTL which logs input and output of modules and registers, wires and signals specified in the code.The SDC files sets parameters that affects the power which could be definition of the clocks in the design, input transition times, which affect the internal current consumption. Also the output capacitance load, needs to be set for every external output as it affects the switching power, UPF files describes the power intent in the design. The file format consist of standard syntax for describing power supplies, power switches, level shifters, isolation, memory retention and power states.
The internal RTL generator is used to generate RTL code with a reconfigurable clusterized architecture without doing any clock gating yet. There is a feature of Spyglass Power which gives a graph of every activity in the design, allowing us to see the activity is for a particular block, even without actually doing any power computations. In case, it is found a power bug which affected 96% of our programmable core. This was a situation where the Spyglass Power activity report showed that a cluster of the design that should have been in an idle state was active and drawing power when it shouldn't have been. Remove these power bugs by manually adding clock- gating cells at the cluster-level and then did final analysis for clock-gating. We can extract a lot of different reports with Spyglass, such as what is clocked and what is not clocked; this helps to guide us in developing micro-architecture.
Spyglass has a feature to extract calibration data from the reference synthesis and back end characteristics. The parameters generated by the calibration are:
Cell sizing, Vt-mix, Clock tree and Capacitance model
It generates files which show the percentage of cell allocation with drive strength in the design. These parameters effects combinatorial and sequential leakage and dynamic power. The Synopsys Design Constraint (SDC) files is used to specify constraints regarding power, timing and area of the design. This includes input transition times, fanout, load capacitance, clock definition. These constraints are used to estimate power, since it tells the synthesis tool how the RTL design is synthesized. Synopsys provides a High level guide for determining the root cause of power deviations. This guide is used as a starting point for correlation between RTL and Netlist. The RTL estimation in Spyglass Power does not support time based power analysis. In the further power estimation, average power is considered in the chosen time window for both RTL and Netlist analysis. The switching activity from each scenario needs to be extracted from a simulation tool. Running the simulation in Questasim, and exporting the switching activity with a FSDB dumper. Codelink and exploration of the nets in the RTL simulation was used for debugging, and to verify the right timing to start the estimation, based on the time window set in the Primetime time based power analysis. The firmware of all the scenarios starts with a system reset, and initialization of the chip. This could include setting registers, activate peripherals and clock gating. The initialization sequence generates activity not relevant for the scenario and should be excluded from the estimation. When the FSDB files for both RTL and Net list were created, power analysis in Primetime was performed as a reference for correlation. To correlate estimation in RTL and Netlist a iterative process is followed. Spyglass has a feature to extract calibration data based on the layout for the design. With the calibration Spyglass is able to get statistics on cell size and drive strength, Vt mix and clock tree properties like fanout, icgc fan-out and wire load.
Primetime PX for power analysis
setup the power model
set power_enable_analysis true
set power_analysis_mode averaged
Link the library: .db file
Link the netlist: .v (which is generated after APR)
read sdc file
read spef file (generated from APR)
read saif file (generated from post-sim)
check_power
report_power -verbose - hierarchy
setup the power model
set power_enable_analysis true
set power_analysis_mode averaged
Link the library: .db file
Link the netlist: .v (which is generated after APR)
read sdc file
read spef file (generated from APR)
read saif file (generated from post-sim)
check_power
report_power -verbose - hierarchy
Table1.PrimetimePX for power analysis
Root cause of power deviations.
Sequential leakage root analysis
Combinatorial Leakage
Dynamic Power
Inspect fsdb file, time based, activity graph
Internal power
Switching power
Clock power
Memory power
Sequential leakage root analysis compare the number of registers and latches with reference. Ensure that the design and the reference use the same type of flip-flops, check SDC if the design uses scan flops. In a multi-threshold voltage design, consider changing the Vt mix extracted by calibration. There is a detailed debugging on the hierarchies with largest deviations, locate set of registers at the root cause, and identify mismatch. (Vt, library, corner).Combinatorial Leakage where it compare the area and number of instances in the RTL and reference run. Debug on hierarchical level to find largest mismatch. Analyze cell type, vt, library and library corner. Debug Spyglass and Primetime optimization settings.
Dynamic Power where it ensure that activity file annotation is as close to 100 % as possible. To improve annotation, find which signals is not annotated and change the simulation settings. Ensure that the reference and rtl estimation are given the same vectors with the same time window on the scenarios. Inspect fsdb file, time based, activity graph. Ensure that RTL simulation does not have a lot of Xs, especially clock, reset, enables.Internal power where it change the slew parameter to the average pin slew of the cells in the design. Ensure that clock gating threshold is similar.
Switching power: Check the wireload models from calibration SPEF files.
Clock power: Adjust clock tree from calibration. Ensure same frequency of the clock signal, ensure that sequential power matches, otherwise _x sequential. Adjust capacitance model from calibration.
Memory power: Start with leakage, this should be almost identical check libraries and instances provided. If dynamic are divergent, check vectors and their annotation for the memories. Ensure that cycle-based propagation on memory ports. Check memory access rates.
Methodology
The goal of the methodology is to make a fast and simple way of correlating a design against a reference, and use these correlation data as a reference for similar design in the same technology. Resulting in a general methodology for RTL estimation which could be used in other designs as well. The work will consist of:
Create synthesis and layout for default design
· Establish reference in Primetime PX
· Estimate power in Spyglass Power with Synopsys standard flow
· Improve correlation between RTL and Netlist
· Develop better methodology for correlation
· Implement low power RTL techniques
· Synthesis, layout, Primetime, Spyglass estimations of these designs
· Verify correlation for these design
Simulate the different scenarios and generate switching activity data for both gate level and RTL level and then create reference power estimation in Primetime PX .This could have been done by checking registers and debugging in a simulation tool. Designers usually have to build a image of how data is propagated and used over the simulation run. As designs get more and more complex, there is a need to facilitate this reasoning process, and automate the debugging. Behavior analysis automatically infers the design’s temporal behavior using the information in the HDL source and simulation result. Given an analysis scope, we extract the temporal behavior of the design from the design’s logical model and the simulation data.
The analysis procedure is divided into logic extraction and timing activity analysis. Instead of changing the stimulus to avoid this, what's typically done is either disable timing checks on those synchronizers or hack the sdf file and zeroed out the timing check numbers (setup/hold/etc. values), which in effect is like disable timing checks on those instances. VCS support disabling timing checks on a per instance basis or for the entirely. If have Verdi, then we can easily trace the X's back to it's source and find these synchronizers or just ask the designers. If we are using same RTL simulation verification enviroment for gate sims then we have to put some delay while driving the inputs as here setup and hold come in picture. Initialize all uninitialized fllops.Disable the timing checks of all sync flops.
Initialize all memory and registers of DUT just before DUT comes out of reset. If we are using same RTL simulation verification environment for gate sims then we have to put some delay while driving the inputs as here setup and hold come in picture. The same exact stimulus should work in both RTL and gate sim .If we're driving the stimulus onto a synchronous I/F, there's no way we can violate setup/hold time. If we are driving the stimulus onto an asynchronous I/F, then you already take care of that problem by disabling the timing checks on those synchronizers. Initialize all uninitialized flops.
Assuming all the necessary initialization of the chip is done correctly in the RTL simulation, you shouldn't have to do anything for the gate sim. Who cares if you have some uninitialized flops in the netlist if they don't cause any problems. Disable the timing checks of all sync flops.
Initialize all memory and registers of DUT just before DUT comes out of reset.Again, the initialization routine that you use for the RTL should work for your gate sim.The time based analysis is chosen since it will give a good overview of the power consumption over time in each scenario. This feature is also used to debug and verify behavior and power consumption within the scenarios to analyze power consumption on different modules, regarding clock gating, activity and leakage power.
Student at Malaviya National Institute of Technology Jaipur
10 个月Useful! ??
Lead Application Engineer @ Cadence Design Systems | Ex-SOC Power Thermal Graduate Intern @Intel | Master of Science (MSE) in Electrical Engineering at Arizona State University.
1 年Thank you for sharing!
PD STA/Signal Integrity/PDN sign-off @ Qualcomm | VLSI | IIT Roorkee | BEL
2 年very Informative thanks!!