Power Optimization Techniques in Digital IC Design - 3

Power Optimization Techniques in Digital IC Design - 3

"There has certainly come to you a Messenger from among yourselves. Grievous to him is what you suffer; [he is] concerned over you [i.e., your guidance] and to the believers is kind and merciful." Quran[9:128]
#boycottFrance

In the previous 2 articles, we discussed sources of power dissipation and presented some of the techniques to reduce it. In this article, we will see more techniques to reduce the activity factor.

Chain vs tree logic

No alt text provided for this image
Fig.1

Consider the two 4-input adder implementations in Fig.1. For the adder on the left, assume all the inputs arrive at the same time. At time 0 the 1st adder will start computing A + B, and the 2nd adder will start computing Result_1 + C. The problem here is that Result_1 is not the true value yet because the 1st adder will take some time to produce the result of A + B due to the propagation delay. So, the 2nd adder will be computing a false result until the 1st adder finish. The same issue will happen with the 3rd adder.

To reduce this, we use the tree implementation on the right side of Fig.1.

Reference [1] did a layout simulation and found that the chain implementation consumes about 1.5 more power than the tree implementation.

Note that the chain implementation is bad for timing since the total propagation delay is 3*T_prop compared to 2*T_prop in the tree implementation 

Resource sharing

Resource sharing is a technique used to decrease the area of the design. We want to discuss whether this can reduce the power or increase it.

No alt text provided for this image
Fig.2

Fig.2 shows two adder implementations. The one on the left uses two adders while the one on the right uses a single adder that performs the same operation. The question now is: Which implementation consumes more power?

The one on the right has one adder so that’s half the switched capacitance than the one on the left. However, the implementation on the right consumes more power due to 2 reasons:

  • In order to have the same data rate, the implementation on the right should perform 2 calculations in the same sampling period that’s double the switching rate. So, halving the capacitance was negated by doubling the rate.
  • The activity factor of the time-multiplexed adder is more than the activity of the 2 chained adders as proven in [1]. This is because the inputs to the adder come from 2 different sources and so the inputs are not correlated so this increases the activity factor.
No alt text provided for this image
Fig.3: The activity factor between the time mulipliexed (TM) implemenation and the other. [1]

This shows that time-multiplexed designs may increase the dynamic power consumption.

Glitches

Glitches are undesired transitions in the digital circuits. Those transitions increase the switching activity and so the dynamic power consumption of the IC. According to reference [2] glitch power dissipation range from 20 % to 70 % of total dynamic power dissipation.

No alt text provided for this image
Fig.4

Fig. 4 shows how a glitch may occur. Due to the delay difference between the inputs the NAND will switch to a false output (0) before returning to (1). You can watch the tutorial in [3] for a more detailed explanation of glitches.

There are some techniques to reduce glitching:

  • Balance the gates delay using buffers.
  • Balance the gates delay using different Vth for the transistors [4].
  • Include all the Min-terms when realizing the circuit [3].

Gate Reordering

No alt text provided for this image
Fig.5

Changing the order of the circuit can reduce dynamic power.

  • One example is the one shown in Fig.5. The glitchy signal in the left circuit will affect both MUXs. If we changed the order of the MUXs as shown on the right, the glitchy signal will only affect the last MUX while the first MUX will be safe from those glitches. So, the transitions will be less.
No alt text provided for this image
Fig.6
  • Another example is the one in Fig.6. The signal A is highly active, and it will cause transitions on both gates. If we reorder the circuit, the active signal A will affect the 2nd gate only and the activity factor will be reduced.

Precomputation logic

No alt text provided for this image
Fig.7 [1]

In this method, we disable the inputs that don’t affect the output from reaching any combinational logic ahead of them. To give you an example, consider the comparator in Fig.7. If the MSB of A and B are different then we already know the result of the comparison and we don’t need to allow the rest of the bits to reach the comparator. So here we will use an XOR to compare the MSBs and the output of the XOR will be used as an enable for the registers’ clock thus saving power from dissipating in the FF and in the comparator. This is one of the examples where clock gating can be used.


References

[1] https://www.springer.com/gp/book/9780792395768

[2]https://www.researchgate.net/publication/220091511_CMOS_Leakage_and_Glitch_Minimization_for_Power-Performance_Tradeoff

[3] https://www.youtube.com/watch?v=WIS9MZ0NqFY

[4] https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1084&context=ecetr

[5] https://www.youtube.com/watch?v=t-PyfAI-fX4&t=1529s

要查看或添加评论,请登录

Amr Adel的更多文章

社区洞察