The Role of Clock Gating

The Role of Clock Gating

Perhaps you've heard the term "clock gating" and you're wondering how it works, or maybe you know what clock gating is and you're wondering how to best implement it. Either way, this post is for you.

Why Power Matters

I can't help but laugh when I watch a movie where the main characters are shrunk down to the size of grains of sand and they have to fight off ants the size of T-Rexes. It's not that I'm amused by giant ants; it's the unquestioned assumption that our bodies would act the same at such a small scale that gets to me. I know, it's just a movie, but it still makes me cringe. Things just wouldn't scale that way. If I've got my physics right, our mass would scale cubically, while our surface area would scale quadratically as would our strength. As a result, we'd be super strong, but it wouldn't matter since we'd freeze to death almost instantly from heat loss. Plenty of other factors come into play, and I'm sure I'd get them wrong if I tried, but you get the idea.

But, what does "Honey I Shrunk the Kids" have to do with clock gating? Well, I began designing silicon in the 90s. At that time the only thing that mattered was performance. Since then, transistors have shrunk a bit–a lot actually–just like Rick Moranis. And their properties scale by different factors. One that's getting out of control is power. According to "The Dark Silicon Problem and What it Means for CPU Designers", heat generation per unit of silicon area is "somewhere between the inside of a nuclear reactor and the surface of a star," and that was in 2013. Power is now a first-order concern. In fact, we find ourselves in a new situation where we have more transistors available to us than we can afford to use. In a very real sense, the best way to get more performance is now to save more power.

I was motivated to write this post today by a Linkedin notification I received this morning, letting me know that I had been quoted in a post by Brian Bailey of Semiconductor Engineering entitled "Taking Power More Seriously". Bailey provides an excellent high-level overview of the myriad challenges of designing for power. One of those challenges is to implement fine-grained clock gating. As an EDA developer myself, of tools that, among other things, automate clock gating, I felt it timely to dive deeper into the topic.

What is clock gating?

Several factors contribute to a circuit's power consumption. The logic gates have static or leakage power that is roughly constant as long as a voltage is applied to them, and they have dynamic or switching power resulting from toggling wires. Flip-flops are rather power-hungry, accounting for maybe ~20% of total power. Clocks can consume even more, perhaps ~40%! Global clocks go everywhere, and they toggle twice each cycle. As we'll see, clock gating avoids toggling the clock when clock pulses are not needed. This reduces the power consumption of clock distribution and flip-flops, and it can even reduce dynamic power for logic gates.

Even in a busy circuit, when you look closer, most of the logic is not doing meaningful work most of the time. In this trace of a WARP-V CPU core, for example, the CPU is executing instructions nearly every cycle. But the logic computing branch targets isn't busy. It is only needed for branch instructions. And floating-point logic is only needed for floating-point instructions, etc. Most signal values in the trace below are gray, indicating that they aren't used.

No alt text provided for this image
CPU waveform showing clock gating opportunity

As previously noted, a significant portion of overall power is consumed by driving clock signals to flip-flops so the flip-flops can propagate their input values to their outputs for the next cycle of execution. If most of these flip-flop input signals are meaningless, there's no need to propagate them, and we're wasting a lot of power.

Clock gating cuts out clock pulses that aren't needed. (Circuits may also be designed to depend on the absence of a clock pulse, but let's not confuse matters with that case.) The circuit below shows two clock gating blocks (in blue) that cut out unneeded clock pulses and only pulse the clock when a meaningful computation is being performed.

No alt text provided for this image
Illustration of clock gating

In addition to reducing clock distribution and flip-flop power, clock gating also guarantees that flip-flop outputs are not wiggling when there are no clock pulses. This reduces downstream dynamic power consumption. In all, clock gating can save a considerable amount of power relative to an ungated circuit.

Implementing Clock Gating

A prerequisite for clock gating is knowing when signals are meaningful and when they are not. This is among the aspects of higher-level awareness inherent in a Transaction-Level Verilog model. The logic of a "transaction" is expressed under the condition that indicates its validity. Since a single condition can apply to all the logic along a path followed by transactions, the overhead of applying validity is minimal.

Validity is not just about clock gating. It helps to separate the wheat from the chaff, so to speak. The earlier CPU waveform, for example, is from a TL-Verilog model. Debugging gets easier as we have automatically filtered out the majority of signal values, having identified them as meaningless. And we know they are meaningless because of automatic checking that ensures that these values are not inadvertently consumed by meaningful computations.

With this awareness in our model, default fine-grained clock gating comes for free. The valid conditions are used by default to enable our clock pulses.

The full implications of having clock gating in place from the start may not be readily apparent. I've never been on a project that met its goals for clock gating. We always went to silicon with plenty of opportunity left on the table. This is because power savings is always the last thing to be implemented. Functionality has to come first. Without it, verification can't make progress, and verification is always the long pole. Logic designers can't afford to give clock gating any real focus until they have worked through their functional bug backlogs, which doesn't happen until the end is in sight. At this point, many units have already been successfully implemented without full clock gating. The project is undoubtedly behind schedule, and adding clock gating would necessitate implementation rework including the need to address new timing and area pressure. Worse yet, it would bring with it a whole new flood of functional bugs. As a result, well, let's just say we're heating the planet faster than necessary. Getting clock gating into the model from the start, at no cost, completely flips the script.

Conclusion

Power is now a first-order design constraint, and clock gating is an important part of an overall power strategy. Register transfer level modeling does not lend itself to the successful use of clock gating. A transaction-level design can have clock gating in place from the start, having a shift-left effect on the project schedule and resulting in lower-power silicon (and indirectly higher performance and lower area as well). If you are planning to produce competitive silicon, it's important to have a robust clock-gating methodology in place from the start.

Emmanuel U. O.

SoC/ASIC Design Engineer at Intel Corporation

7 个月

Why is in-rush current not discussed/considered when discussing clock-gated designs, as in power-gated designs? Both may lead to simultaneous switching when a switch (power) or a gate (clock) is enabled. I am just curious to hear your thoughts…

回复
Sumanto Kar

Assistant Project Manager, FOSSEE, IITB | M. Tech. Student IEOR, IITB

2 年

Great post Steve Hoover !!!

回复
Kannan Moudgalya

Indian Institute of Technology, Bombay

2 年

An unrelated, another way to remove the heat, is implemented in the Dojo project, see https://youtu.be/DSw3IwsgNnc, a talk given by Ganesh Venkataramanan

Kannan Moudgalya

Indian Institute of Technology, Bombay

2 年

Enjoyed the post, Steve Hoover. Reminds me of optimising compilers that don’t evaluate the OR instruction when one of the inputs is 1 and the AND instruction when one of the inputs is 0.

要查看或添加评论,请登录

Steve Hoover的更多文章

  • Hardware Accelerating Wordle ??

    Hardware Accelerating Wordle ??

    Cloud FPGAs When AWS first announced their F1 FPGA instances in 2016, I figured cloud FPGAs would change the landscape…

    3 条评论
  • Designing for Tiny Tapeout in Makerchip

    Designing for Tiny Tapeout in Makerchip

    Tiny Tapeout is a perfect example of what happens when a community comes together to change the status quo. When you…

    6 条评论
  • AES Encryption Taped Out in Days

    AES Encryption Taped Out in Days

    AES encryption is the go-to standard for symmetric key encryption. It can be implemented with reasonable performance in…

    8 条评论
  • How to Build a RISC-V CPU with Custom Instructions

    How to Build a RISC-V CPU with Custom Instructions

    Perhaps you expected this to be a long article. It's not.

    3 条评论
  • Claude (AI) Codes a RISC-V Core in TL-Verilog

    Claude (AI) Codes a RISC-V Core in TL-Verilog

    (with Blooper Reel) TL;DR TL-Verilog has the promise to help humans and LLMs collaborate effectively and safely on…

    6 条评论
  • Oops! I Made Another Language

    Oops! I Made Another Language

    Introducing the M5 Macro Preprocessor I seem to have developed a bad habit over the years of creating new software and…

    4 条评论
  • FPGA-Accelerated Genomics From Your Browser

    FPGA-Accelerated Genomics From Your Browser

    Introduction You might think "FPGA-accelerated genomics from your browser" would be an expensive commercial service…

    8 条评论
  • Visual Debug for Formal Verification

    Visual Debug for Formal Verification

    The dream of 100% confidence is compelling for silicon engineers. We all want that big red button to push that…

    10 条评论
  • What Students are up to with TL-Verilog in GSoC 2021

    What Students are up to with TL-Verilog in GSoC 2021

    Four students in Google Summer of Code 2021 are using TL-Verilog to open doors for the open-source silicon community…

    3 条评论
  • Another Exciting Summer for TL-Verilog in Google Summer of Code

    Another Exciting Summer for TL-Verilog in Google Summer of Code

    I'd like to congratulate Bala Dhinesh, Ninad Jangle, Nitin Mishra, and Vineet Jain on their acceptance into Google…

    1 条评论

社区洞察

其他会员也浏览了