登录查看更多内容

Timing in Sequential Circuits

Neeraj Kumar Cheryala

Engineer? at Qualcomm

发布日期: 2021年2月28日

In the previous article (https://www.dhirubhai.net/posts/neeraj-cheryala-362385b7_circuits-digitaldesign-digitalelectronics-activity-6769886130558136320-kfRc) of this series, "A Primer on Timing and Verification in Digital Circuits", we dealt with timing in Combinational circuits and glitches. In this article, we're gonna look at timing in more real circuits i.e. Sequential circuits. These circuits are used to construct Finite State Machines (FSMs) which are the basic building blocks of all the digital circuitry. Basically, all the circuits in digital systems are a blend of Combinational and Sequential logic circuits. Sequential logic circuit elements sample and store an output from the combinational logic and this output is fed back to the combinational circuit in the next clock cycle to keep the state machine going from one state to another until it reaches a desired state and we get the required output. Figure 1 illustrates the basic idea of a sequential logic circuit containing both the combinational and sequential logic elements. 'Y' is an internal state output and 'Z' is the final external output.

Figure 1: Components of a Sequential Circuit (Abstract view)

Input Timing Constraints

Now that we have understood the significance of Sequential circuits, let's dive into the timing constraints of Sequential circuits. Assuming that you know about the working of a D (Delay) flip-flop, its input 'D', output 'Q' and clock have timing requirements. The input 'D' must be stable when sampled by the rising or falling edge of the clock. To be able to sample it correctly, the circuit needs to satisfy two timing constraints called as Setup time and Hold time. These constraints are together called as the Aperture time (Ta). It is similar to the aperture of a camera where one needs to keep the aperture long enough to capture a photo nicely. Setup time and hold time are defined as follows:

Setup Time (Tsetup): It's simply the amount of time before the clock edge for which the data (input 'D') must be stable (i.e. it must not change).

Hold Time (Thold): It's simply the amount of time after the clock edge for which the data (input 'D') must be stable.

Hence the input needs to be stable for some time before and after the triggering clock edge until it gets latched correctly. So the aperture time is simply the time around the clock edge for which the data must be stable (Ta = Tsetup + Thold). Figure 2 illustrates the above discussed timing constraints for the input of a delay flip-flop. Observe that the clock edge itself takes a finite time to raise which we have discussed in the previous article.

Figure 2: Input Timing Constraints for a D flip-flop.

So what happens when input timing is violated?..If 'D' is changing when sampled, it leads to Metastability in the circuit. This is when the flip-flop output is stuck somewhere between '1' and '0'. Nevertheless, the output eventually settles non-deterministically based on the circuit characteristics and manufacturing of the device. Since Metastability is a lower level circuit issue, it's not discussed in this article as a convenient abstraction. Figure 3 is an example of this non-deterministic convergence of the output of a NAND based RS latch when Metastability occurs.

Metastability in NAND based RS latch (Source: W. J. Dally, Lecture notes for EE108A, Lecture 13: Metastability and Synchronization Failure (When Good Flip-Flops go Bad) 11/9/2005.)

Figure 3: Metastability in NAND based RS latch (Source: W. J. Dally, Lecture notes for EE108A, Lecture 13: Metastability and Synchronization Failure (When Good Flip-Flops go Bad) 11/9/2005.)

Output Timing

In this section, we're gonna discuss the timing constraints of the output. These are actually delays from clock edge to the output transitions in a flip-flop.

Contamination Delay Clock-to-Q (Tccq): It's simply the earliest time after the triggering clock edge that 'Q' (output) starts to change as per the input to the flip-flop. It is very similar to the contamination delay in a combinational circuit which is discussed in the previous article of this series.

Propagation Delay Clock-to-Q (Tpcq): It's simply the latest time after the triggering clock edge that 'Q' stops changing i.e. the maximum amount of time from clock edge to when the output becomes stable. It is very similar to the propagation delay in a combinational circuit.

Figure 4: Tpcq and Tccq for a D flip-flop.

Sequential System Design in the true sense looks like Figure 5. We have a flip-flop which samples some external inputs. The samples inputs are then fed to a combinational circuit to determine the next state. The output of the combinational circuit is then sampled by another flip-flop towards the output side. The overall system may consist multiple flip-flops with combinational logic. The cycle time or the clock period (Tc) of the system is actually determined by the maximum combinational logic delay in the whole system. This is often referred to as the Critical Path Delay. Hence from Figure 5, we observe that one must meet the timing requirements for both R1 and R2 to determine the cycle time.

Figure 5: Sequential Circuit Model

Basically, we need to ensure correct input timing on R2. To be specific, the input D2 must be stable for at least Tsetup time before the clock edge and at least until Thold time after the triggering clock edge as discussed under the input timing constraints. This means that there is both a minimum and maximum delay between the two flip-flops R1 and R2. If the combinational logic is too fast, we may get a Thold violation in R2 because then the transition in Q1 is quickly reflected at D2 within lesser time than Thold for R2. This may seem bad as we must not let the combinational logic to be too fast but we must ensure this due to non-idealities in the circuit. On the other hand if the combinational logic is too slow that the clock cycle time is not enough to accommodate the combinational delay, we may get a Tsetup violation in R2 because then the transition in Q1 is not reflected at D2 before Tsetup time of the next triggering clock edge to R2. Hence we neither want the combinational logic to be too fast nor too slow. If the combinational logic is too slow, we can increase the clock cycle time to accommodate the Tsetup time for R2. But if the combinational logic is too fast, we should make it slow. Below is the figure that illustrates these violations.

Figure 6: Setup time and hold time violations in the example sequential circuit.

Setup Time Constraint

As we have discussed in the previous section, safe timing depends on the maximum delay from R1 to R2. The input to R2 must be stable at least Tsetup before the clock edge. From the timing diagram in Figure 7, the clock cycle time Tc > Tpcq + Tpd + Tsetup. This means that we have effectively wasted some time as Tpcq and Tsetup for latching and buffering the input data. The useful work is only done during Tpd in the combinational logic that determines the next state. This is called as Sequencing Overhead. It is basically the amount of time wasted each cycle due to sequencing element timing requirements. Hence whenever we reduce the clock cycle time, we directly reduce the useful work as Tpcq and Tsetup stay constant. This in fact stops us from using extremely high clock frequencies today.

Figure 7: Timing diagram illustrating the setup time constraint.

We have seen that the clock period is determined by the critical path since it has the longest propagation delay. So critical path determines the minimum clock period (i.e., maximum clock frequency). If the critical path is too long, the design will run slowly. But if it is too short, each cycle will do very little useful work since most of the cycle will be wasted in sequencing overhead.

Hold Time Constraint

As we have seen before, safe timing also depends on minimum delay from R1 to R2. The input to R2 must be stable for at least Thold after the clock edge. From the timing diagram in Figure 8, Tccq(R1) + Tcd > Thold(R2) i.e. Tcd > Thold - Tccq. This means that we need to have a minimum combinational delay. So usually people try to design for Thold = Tccq or make the difference very small. We observe that Tcd(min) i.e. contamination delay doesn't depend on the clock period Tc which tells us that we cannot change the minimum combinational delay by changing the clock period. Hence practically, it may be very hard to fix Thold violations after manufacturing. We may need to modify the circuit if that happens.

Figure 8: Timing diagram illustrating the hold time constraint.

Example: Timing Analysis

Let's consider an example sequential circuit as shown in Figure 9 whose timing characteristics are given below:

Example Sequential circuit for Timing analysis

Tccq = 30 psec

Tpcq = 50 psec

Tsetup = 60 psec

Thold = 70 psec

Tpd = 35 psec (per gate)

Figure 9: Example Sequential circuit for Timing analysis Tcd = 25 psec (per gate)

We need to check the setup time and hold time constraints to decide if the design satisfies the safe timing. Then we can derive the required minimum clock cycle time i.e. maximum clock frequency. Firstly, the propagation delay and the contamination delay of the combinational logic are calculated as follows:

Tpd = 3*35 psec (Three gates)

= 105 psec

Tcd = 25 psec (one gate)

Let's now check the setup time constraints:

We have, Tc > Tpcq + Tpd + Tsetup

Tc > 50 + 105 + 60

Tc > 215 psec (Practically, we can add some Tmargin to this)

Hence maximum clock frequency is f(max) = 1/Tc

f(max) = 4.65 GHz

Checking the hold time constraint, we get Tccq + Tcd = 55 psec which is lesser than Thold (70 psec). So this circuit fails to satisfy the safe timing. We can fix this hold time violation by adding buffer gates to short paths as shown in Figure 10. Here we are increasing the minimum combinational delay. We will now analyse the modified circuit.

Figure 10: Additional buffer gates to satisfy the hold time constraint

Tpd = 3*35 psec

= 105 psec

Tcd = 2*25 psec

= 50 psec

Let's now check the setup time constraints:

We have, Tc > Tpcq + Tpd + Tsetup

Tc > 50 + 105 + 60

Tc > 215 psec (Practically, we can add some Tmargin to this)

Hence maximum clock frequency is f(max) = 1/Tc

f(max) = 4.65 GHz

We observe that the clock frequency has not changed since we have not modified the propagation delay of the combinational logic. Checking the hold time constraint, we get Tccq + Tcd = 80 psec which is greater than Thold (70psec). Hence the modified circuit satisfies both the setup time and hold time constraints at the cost of extra circuitry and power. Practical circuits involve many other parameters to consider before signing-off the design one of which is Clock Skew. Tmargin which has been ignored in the above calculations contains clock skew. Clock skew basically hints at the point that clocks have delay too. The clock does not reach all parts of the chip at the same time. Hence Clock Skew can be defined as the maximum time difference between two clock edges in the circuit. The flip-flop which is near to the clock source may see the clock edge much sooner than the one which is farther from the clock source. This is illustrated in Figure 11. Figure 12 is the nicest diagram (which I could find on the net) illustrating the variation of clock skew with the horizontal and vertical axis of Alpha 21264 (fastest processor of its time) chip.

Figure 11: Clock skew between point A and point B in the circuit.

Figure 12: Spatial distribution of clock skew in Alpha 21264 (P. E. Gronowski+, "High-performance Microprocessor Design," JSSC’98.)

Now that we understood the significance of Clock Skew, let's revisit the setup time constraint except now we will also consider clock Skew in our calculations. We consider the worst-case skew to check the safe timing requirements. Taking the same basic sequential circuit shown previously in Figure 5, if the clock arrives at R2 before R1, it leaves as little time as possible for the combinational logic in between since we would sample the output of the combinational logic i.e. D2 early which may in turn lead to setup time violation if we don't increase the clock period. This scenario is illustrated by the timing diagram in Figure 13. Now the modified setup time constraint is Tc > Tpcq + Tpd + Tsetup + Tskew. Hence the Tsetup is effectively increased i.e. Tsetup(effective) = Tsetup + Tskew.

Figure 13: Modified setup time constraint due to clock skew.

Let's now revisit the hold time constraint keeping clock skew under consideration. The worst case scenario happens when the clock arrives at R2 after R1. This effectively increases the minimum required delay for the combinational logic and also the hold time. The modified hold time constraint is Tcd + Tccq > Thold + Tskew. Now the Thold (effective) is given by Thold + Tskew. This is illustrated in the timing diagram below:

Figure 14: Modified hold time constraint due to clock skew.

We observe that the clock skew effectively increases bot the setup time and hold time. This leads to increased sequencing overhead in turn decreasing the useful work done per cycle. Hence designers must keep skew to a minimum. This requires an intelligent clock network across the chip. Basically the goal is to make clock arrive at all locations roughly at the same time. There has been a lot of research in clock tree synthesis over the years. Although clock trees are not covered in this article, Figure 15 may give you an idea about them. We have a clock source and then we try to balance things across the circuit in different ways such that clock arrives at all the points of circuit at the same time.

Practical clock tree synthesis (Source: Abdelhadi, Ameer, et al. "Timing-driven variation-aware nonuniform clock mesh synthesis." GLSVLSI’10.)

Figure 15: Practical clock tree synthesis (Source: Abdelhadi, Ameer, et al. "Timing-driven variation-aware nonuniform clock mesh synthesis." GLSVLSI’10.)

So that should be enough for this article. I hope you have got something new to takeaway from this article on timing in digital circuits. The forthcoming article in the series discusses Functional Verification of digital circuits using Verilog Testbenches. Remember that it's always important to understand how technology works and also analyse how it applies to practical scenarios. For instance, designing an efficient clock tree is crucial to minimize the sequencing overhead in practical circuits as discussed in this article. So stay tuned for more articles in the series. Don't forget to post your feedback in the comment section. Have fun!!

Ron Davison

Circuit design engineering consultant. Precision analog, SMPS, EMI /EMC, controls, systems integration, & mentoring.

2 年

Good morn'n read. Brought back some good ol logic memories My senior project for uC, digital, & timing class, I wrote a Mathcad program that spit out the delay when you put in the input delays for the logic devices chosen. A negative time, was a instant fail signal, your design needed more work and you missed something. Got an A+. The requirements where just for the parts chosen. But this was a universal solution for timing constraint closure!

1 次回应

查看更多评论

要查看或添加评论，请登录

Neeraj Kumar Cheryala的更多文章

ChatGPT Answers: Major Milestones in IoT Industry

2023年4月22日

ChatGPT Answers: Major Milestones in IoT Industry

ChatGPT is an advanced AI language model that is currently making waves in the tech industry. Built on the GPT-3.

1 条评论
A Survey of Computing Paradigms - From Literature to Machines

2022年12月12日

A Survey of Computing Paradigms - From Literature to Machines

It has been proved time and again that designing better #computing systems for the future is only possible by…
Circuit Verification

2021年3月15日

Circuit Verification

The last two articles of this series, 'A Primer on Timing and Verification in Digital Circuits' should have provided…

3 条评论
Timing in Combinational Circuits

2021年2月23日

Timing in Combinational Circuits

Circuit design is a trade-off between area (Circuit area is proportional to the cost of the device) and speed /…
Systolic Arrays and the TPU

2020年10月30日

Systolic Arrays and the TPU

Computers have truly transformed our lives over the last three decades. They are now an integral part of our lives.
Why learn Computer Architecture? A Case-study approach.

2020年10月24日

Why learn Computer Architecture? A Case-study approach.

Computers have truly transformed our lives over the last three decades. They are now an integral part of our lives.

1 条评论
SSH and The Power of Public Key Cryptography

2020年7月22日

SSH and The Power of Public Key Cryptography

Basics of Secure Shell (SSH) SSH is essentially a network protocol that leverages cryptography. As you might know, it…
Evolution of Computers

2020年7月12日

Evolution of Computers

This article will present you an intriguing journey in the world of computers and let you navigate through the history…

1 条评论
Decoding the benefits and the downfall of VLAN TRUNKING PROTOCOL (VTP)

2020年7月12日

Decoding the benefits and the downfall of VLAN TRUNKING PROTOCOL (VTP)

In Computer Networks, a VLAN creates a logical broadcast domain across multiple sections of a LAN. VLANs improve the…

2 条评论

See all articles

Timing in Sequential Circuits

Neeraj Kumar Cheryala

Engineer? at Qualcomm

Input Timing Constraints

Output Timing

Setup Time Constraint

Hold Time Constraint

Example: Timing Analysis

Neeraj Kumar Cheryala的更多文章

社区洞察

其他会员也浏览了

Memory mapping in DDR is essential for defining how data is stored, accessed, and managed in the physical address space of DRAM

DDR Write and Read Leveling in DDR Protocol

Tubes, Transistors, and Time Machines

What are the various types of memory faults?

DPD challenges for UWB signals

DDR (Double Data Rate) memory uses both the rising and falling edges of the clock signal to transfer data, effectively doubling the data transfer rate

How to Calculate the Number of Tests - Consideration of Embedded Signals

Absolute encoders vs. incremental encoders: the differences

(B)ASIC: how do we deal with multiple clocks?

Understanding the Various Files in STD Cell Characterization - 03

Input Timing Constraints

Output Timing

Setup Time Constraint

Hold Time Constraint

Example: Timing Analysis

Neeraj Kumar Cheryala的更多文章

ChatGPT Answers: Major Milestones in IoT Industry

A Survey of Computing Paradigms - From Literature to Machines

Circuit Verification

Timing in Combinational Circuits

Systolic Arrays and the TPU

Why learn Computer Architecture? A Case-study approach.

SSH and The Power of Public Key Cryptography

Evolution of Computers

Decoding the benefits and the downfall of VLAN TRUNKING PROTOCOL (VTP)

社区洞察

其他会员也浏览了

Memory mapping in DDR is essential for defining how data is stored, accessed, and managed in the physical address space of DRAM

DDR Write and Read Leveling in DDR Protocol

Tubes, Transistors, and Time Machines

What are the various types of memory faults?

DPD challenges for UWB signals

DDR (Double Data Rate) memory uses both the rising and falling edges of the clock signal to transfer data, effectively doubling the data transfer rate

How to Calculate the Number of Tests - Consideration of Embedded Signals

Absolute encoders vs. incremental encoders: the differences

(B)ASIC: how do we deal with multiple clocks?

Understanding the Various Files in STD Cell Characterization - 03