登录查看更多内容

RTL Power Estimation methods and strategies

Sampath VP

ASIC/FPGA Design Professional | SoC Architecture | Crafting Cutting-Edge Solutions in VLSI Design

发布日期: 2019年10月31日

RTL power estimation techniques

RTL power estimation techniques require following inputs: (i) design description in Verilog/VHDL or other hardware description language, (ii) RTL simulation trace in the format of Value Change Dump (VCD), Fast Signal Database (FSDB), etc. and (iii) power characterized libraries (e.g. standard cell power libraries) to correctly estimate the power consumption of a design.

It is not feasible to utilize these techniques for system-level Architectural exploration and optimization because such techniques and flows are targeted to be specifically used by the RTL design teams. On the other hand, a good ESL power estimation methodology should allow an ESL designer, architect or modeler to quickly and accurately gauge the effect of various high-level modifications on the power consumption of the design without being required to completely move to RTL power estimation flow. Such a methodology will significantly reduce the complexity of the power estimation process at the system level approach for power analysis can be called as mixed probabilistic approach.

During this process, activities of the input-output ports and some intermediate signals are correctly imported from the system-level simulation dumps. Using this information, faster probabilistic power estimation is done at RTL which contributes to the overall speedup. Based on the accuracy of probabilistic power estimation, overall results can be improved. Furthermore, an accurate mapping of the system level variables to the corresponding RTL signals also contributes to the accuracy of such a power analysis methodology. Some high-level synthesis tools generate such a mapping file which can be used to further improve the power estimation accuracy .

Figure 2. Estimating power in Spyglass

The design files are the RTL code which describes the design. The tool analyzes the RTL code and translates it to gate-level information, for power analysis.A power model is needed to estimate leakage and internal power dissipated for each type of cell, this is provided by the power models in the lib files. To model switching activity several file formats are available, FSDB (Fast Signal Database), VCD (Value Changed Dump), SAIF (Switching Activity Interchange Format).The switching activity of a design refers to how often different nets changes the signal level. This information is dumped from an RTL or Net list simulation tool, and used to estimate the dynamic power consumption of the circuit. Value change dump is an event-based format that logs every value change made by each signal, and the time at which the change occurred.

The Switching Activity Interface Format (SAIF) file logs the average activity of each signal in a simulation. Fast Signal Database (FSDB) is an event-based format, similarly to the VCD, which logs each toggle in every signal. Its representation is binary, while VCD is ASCII making a more compact representation and smaller file sizes. FSDB is a file system data base. FSDB provides a thread-safe, process-safe Database class which uses the native file system as its back end and allows multiple file formats and serialization methods.

The activity files are dumped from RTL simulation in Questasim. Based on the RTL code and firmware of the given scenario, the FSDB dumper outputs the activity files from RTL which logs input and output of modules and registers, wires and signals specified in the code.The SDC files sets parameters that affects the power which could be definition of the clocks in the design, input transition times, which affect the internal current consumption. Also the output capacitance load, needs to be set for every external output as it affects the switching power, UPF files describes the power intent in the design. The file format consist of standard syntax for describing power supplies, power switches, level shifters, isolation, memory retention and power states.

The internal RTL generator is used to generate RTL code with a reconfigurable clusterized architecture without doing any clock gating yet. There is a feature of Spyglass Power which gives a graph of every activity in the design, allowing us to see the activity is for a particular block, even without actually doing any power computations. In case, it is found a power bug which affected 96% of our programmable core. This was a situation where the Spyglass Power activity report showed that a cluster of the design that should have been in an idle state was active and drawing power when it shouldn't have been. Remove these power bugs by manually adding clock- gating cells at the cluster-level and then did final analysis for clock-gating. We can extract a lot of different reports with Spyglass, such as what is clocked and what is not clocked; this helps to guide us in developing micro-architecture.

Spyglass has a feature to extract calibration data from the reference synthesis and back end characteristics. The parameters generated by the calibration are:

Cell sizing, Vt-mix, Clock tree and Capacitance model

It generates files which show the percentage of cell allocation with drive strength in the design. These parameters effects combinatorial and sequential leakage and dynamic power. The Synopsys Design Constraint (SDC) files is used to specify constraints regarding power, timing and area of the design. This includes input transition times, fanout, load capacitance, clock definition. These constraints are used to estimate power, since it tells the synthesis tool how the RTL design is synthesized. Synopsys provides a High level guide for determining the root cause of power deviations. This guide is used as a starting point for correlation between RTL and Netlist. The RTL estimation in Spyglass Power does not support time based power analysis. In the further power estimation, average power is considered in the chosen time window for both RTL and Netlist analysis. The switching activity from each scenario needs to be extracted from a simulation tool. Running the simulation in Questasim, and exporting the switching activity with a FSDB dumper. Codelink and exploration of the nets in the RTL simulation was used for debugging, and to verify the right timing to start the estimation, based on the time window set in the Primetime time based power analysis. The firmware of all the scenarios starts with a system reset, and initialization of the chip. This could include setting registers, activate peripherals and clock gating. The initialization sequence generates activity not relevant for the scenario and should be excluded from the estimation. When the FSDB files for both RTL and Net list were created, power analysis in Primetime was performed as a reference for correlation. To correlate estimation in RTL and Netlist a iterative process is followed. Spyglass has a feature to extract calibration data based on the layout for the design. With the calibration Spyglass is able to get statistics on cell size and drive strength, Vt mix and clock tree properties like fanout, icgc fan-out and wire load.

Primetime PX for power analysis

setup the power model

set power_enable_analysis true

set power_analysis_mode averaged

Link the library: .db file

Link the netlist: .v (which is generated after APR)

read sdc file

read spef file (generated from APR)

read saif file (generated from post-sim)

check_power

report_power -verbose - hierarchy

setup the power model

set power_enable_analysis true

set power_analysis_mode averaged

Link the library: .db file

Link the netlist: .v (which is generated after APR)

read sdc file

read spef file (generated from APR)

read saif file (generated from post-sim)

check_power

report_power -verbose - hierarchy

Table1.PrimetimePX for power analysis

Root cause of power deviations.

Sequential leakage root analysis

Combinatorial Leakage

Dynamic Power

Inspect fsdb file, time based, activity graph

Internal power

Switching power

Clock power

Memory power

Sequential leakage root analysis compare the number of registers and latches with reference. Ensure that the design and the reference use the same type of flip-flops, check SDC if the design uses scan flops. In a multi-threshold voltage design, consider changing the Vt mix extracted by calibration. There is a detailed debugging on the hierarchies with largest deviations, locate set of registers at the root cause, and identify mismatch. (Vt, library, corner).Combinatorial Leakage where it compare the area and number of instances in the RTL and reference run. Debug on hierarchical level to find largest mismatch. Analyze cell type, vt, library and library corner. Debug Spyglass and Primetime optimization settings.

Dynamic Power where it ensure that activity file annotation is as close to 100 % as possible. To improve annotation, find which signals is not annotated and change the simulation settings. Ensure that the reference and rtl estimation are given the same vectors with the same time window on the scenarios. Inspect fsdb file, time based, activity graph. Ensure that RTL simulation does not have a lot of Xs, especially clock, reset, enables.Internal power where it change the slew parameter to the average pin slew of the cells in the design. Ensure that clock gating threshold is similar.

Switching power: Check the wireload models from calibration SPEF files.

Clock power: Adjust clock tree from calibration. Ensure same frequency of the clock signal, ensure that sequential power matches, otherwise _x sequential. Adjust capacitance model from calibration.

Memory power: Start with leakage, this should be almost identical check libraries and instances provided. If dynamic are divergent, check vectors and their annotation for the memories. Ensure that cycle-based propagation on memory ports. Check memory access rates.

Methodology

The goal of the methodology is to make a fast and simple way of correlating a design against a reference, and use these correlation data as a reference for similar design in the same technology. Resulting in a general methodology for RTL estimation which could be used in other designs as well. The work will consist of:

Create synthesis and layout for default design

· Establish reference in Primetime PX

· Estimate power in Spyglass Power with Synopsys standard flow

· Improve correlation between RTL and Netlist

· Develop better methodology for correlation

· Implement low power RTL techniques

· Synthesis, layout, Primetime, Spyglass estimations of these designs

· Verify correlation for these design

Simulate the different scenarios and generate switching activity data for both gate level and RTL level and then create reference power estimation in Primetime PX .This could have been done by checking registers and debugging in a simulation tool. Designers usually have to build a image of how data is propagated and used over the simulation run. As designs get more and more complex, there is a need to facilitate this reasoning process, and automate the debugging. Behavior analysis automatically infers the design’s temporal behavior using the information in the HDL source and simulation result. Given an analysis scope, we extract the temporal behavior of the design from the design’s logical model and the simulation data.

The analysis procedure is divided into logic extraction and timing activity analysis. Instead of changing the stimulus to avoid this, what's typically done is either disable timing checks on those synchronizers or hack the sdf file and zeroed out the timing check numbers (setup/hold/etc. values), which in effect is like disable timing checks on those instances. VCS support disabling timing checks on a per instance basis or for the entirely. If have Verdi, then we can easily trace the X's back to it's source and find these synchronizers or just ask the designers. If we are using same RTL simulation verification enviroment for gate sims then we have to put some delay while driving the inputs as here setup and hold come in picture. Initialize all uninitialized fllops.Disable the timing checks of all sync flops.

Initialize all memory and registers of DUT just before DUT comes out of reset. If we are using same RTL simulation verification environment for gate sims then we have to put some delay while driving the inputs as here setup and hold come in picture. The same exact stimulus should work in both RTL and gate sim .If we're driving the stimulus onto a synchronous I/F, there's no way we can violate setup/hold time. If we are driving the stimulus onto an asynchronous I/F, then you already take care of that problem by disabling the timing checks on those synchronizers. Initialize all uninitialized flops.

Assuming all the necessary initialization of the chip is done correctly in the RTL simulation, you shouldn't have to do anything for the gate sim. Who cares if you have some uninitialized flops in the netlist if they don't cause any problems. Disable the timing checks of all sync flops.

Initialize all memory and registers of DUT just before DUT comes out of reset.Again, the initialization routine that you use for the RTL should work for your gate sim.The time based analysis is chosen since it will give a good overview of the power consumption over time in each scenario. This feature is also used to debug and verify behavior and power consumption within the scenarios to analyze power consumption on different modules, regarding clock gating, activity and leakage power.

Shakti Pathak

Student at Malaviya National Institute of Technology Jaipur

10 个月

Useful! ??

1 次回应

Chintan Ranpara

Lead Application Engineer @ Cadence Design Systems | Ex-SOC Power Thermal Graduate Intern @Intel | Master of Science (MSE) in Electrical Engineering at Arizona State University.

1 年

Thank you for sharing!

1 次回应

Anubhav Kumar

PD STA/Signal Integrity/PDN sign-off @ Qualcomm | VLSI | IIT Roorkee | BEL

2 年

very Informative thanks!!

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

RTL Power Estimation methods and strategies

Sampath VP

ASIC/FPGA Design Professional | SoC Architecture | Crafting Cutting-Edge Solutions in VLSI Design

更多精彩文章

社区洞察

其他会员也浏览了

Hardware Description Language (HDL) abstraction - VHDL - Systemverilog

Harnessing the Power of Synthesizeable RTL in VLSI Design: A Key to Success

Unveiling the Battle of Verilog vs. SystemVerilog in VLSI Design

??? The Importance of Soft Processors in #FPGAs for Customized Digital Circuits

Spyglass

How to dramatically reduce the time from architecture spec to tapeout ?

An Analogy Between VLSI Development and Cloud Infrastructure as Code

Diagnostic resolution is the KPI of test firmware

Key Challenges in RTL Synthesis and How to Overcome Them

Key Challenges in RTL Synthesis and How to Overcome Them

Switch to Stand out

2024年9月11日

ASIC RTL vs FPGA RTL

2024年8月29日

DV TALK 31ST AUGUST DONT MISS IT!

2024年8月17日

RTL Coding in FPGA

2024年6月10日

Transaction layer of PCIe

2024年5月31日

Deep learning designs

2024年5月30日

PCIe Equalization phases

2024年5月30日

PCIe Equalization

2024年5月29日

PCIe Enumeration

2024年5月28日

Important rules in RTL-Signoff

2024年5月6日

社区洞察

其他会员也浏览了

Hardware Description Language (HDL) abstraction - VHDL - Systemverilog

Harnessing the Power of Synthesizeable RTL in VLSI Design: A Key to Success

Unveiling the Battle of Verilog vs. SystemVerilog in VLSI Design

??? The Importance of Soft Processors in #FPGAs for Customized Digital Circuits

Spyglass

How to dramatically reduce the time from architecture spec to tapeout ?

An Analogy Between VLSI Development and Cloud Infrastructure as Code

Diagnostic resolution is the KPI of test firmware

Key Challenges in RTL Synthesis and How to Overcome Them

Key Challenges in RTL Synthesis and How to Overcome Them