Notorious Two  to smash tapeout

Notorious Two to smash tapeout

It is known that there are two external parameters on which behaviour of any semi-conductor device and product is dependent upon. These parameters are Power Supply and Temperature.

Many a times these are two main culprits which are root-causes for co-relation issue between simulation and silicon in test setup. Either there is an issue in simulation or in test setup. If issue is there in the test setup, then it is not a major problem. Impact on schedule would be small, as test board can be re-spinned in short period. However if issue was in simulation, then it would be catastrophic. It might have huge impact on schedule and cost.

In this article, I have compiled examples of what could be reason for correlation issue between Test setup and simulation.

Test Setup:

Power Supply:

Power Supply voltage is either provided by circuit on the test board or external equipment. Generally physical separation between generation and application of voltage is unavoidable which introduces parasitics in the path. This parasitic causes voltage drop which could become unstable or unpredictable due to current drawn from the chip or the temperature of the board. To manage this, generally there is a sense path (no current flow in this path) emanating from same place where power supply is enforced at the pin level. This sense path is used to read the voltage at the pin level and then forward path is adjusted to make sure that right voltage is applied to the pin all the time. De-cap circuit is connected at the pin level to damp down the transient which reduces requirement on the time needed in the feedback path to rectify the voltage level. If there is any correlation issue, one should check the power supply and ground voltage (with respect to system/equipment ground) at the pin level of the chip. Sometimes transient voltage plays the devil, in that case oscilloscope should be used to observe the voltage.

Temperature:

Similar feedback loop is there in the temperature forcer as well. As part is turned-on, it starts dissipating power which in turn increases the temperature of the die and hence its external surface. This new temperature is sensed by the feedback path and forward path (probably air flow) is adjusted to bring back the surface temperature to the right value again. Temperature equilibrium is achieved by the thermal resistance of the chip. One should change the position of temperature force point and check whether temperature is changing. If it changes (though very unlikely) across the surface, then there is temperature gradient inside the chip. And position of temperature force point should be at the highest surface temperature.

Surface (or ambient) temperature + Power Dissipation * Thermal resistance of the package (degree C per Watt) = Die Temperature

If there is any correlation issue, then one should verify whether feedback loop is working properly in the temperature forcer and ambient temperature has been set correctly. One should re-check Thermal Resistance of the package if there is no issue in temperature forcer.

In any product, there is a component of current which increases with increase in die temperature. This is generally static leakage current. If this component is big percentage of overall current, then there could be positive feedback in die temperature which might cause thermal runaway. More current means -> more power means -> higher temperature means -> more current means -> more power means -> higher temperature .. this loop might continue indefinitely till product is reduced to ashes.

Generally heat sink is placed at the top of the chip to reduce the possibility of thermal run-away.

Simulation:

The same issues apply inside the chip as well.

Power Supply:

The VDD and Ground line have finite resistance and there is finite current (DC + transient) flowing through them. But unfortunately the feedback technique used in test setup can't be employed inside the chip. So it is certain that voltage inside the chip on VDD and Ground line would not be same as voltage applied at the pin level.

One of the well known adverse effects is called power/ground bounce. The power/ground pins are connected to pads through bond wire. Due to fast rate of current drawn by the Output driver at the pin, it generates high voltage (L*di/dt) at the chip side of the bond wire as bond wire has got finite inductance. This would slow down the output driver and there would be correlation issue between simulation and silicon. Hence in simulation, proper equivalent electric circuit between pad and pin, and between pins should be taken into account.

Bigger problem could be in the input buffer. It shares the same Power and Ground line as that of output driver. The input signal is referred to external ground. But internal ground is not same as external ground due to (L*di/dt) noise voltage in the output driver. There is a possibility that input signal would be wrongly interpreted, i.e., high logic could be treated as low and vice-versa.

There could be another common mistake. Lets us take for example, one Power or ground line is taken from pad and more than one block (Say Block A and Block B) is connected to that same line. In this case, one could forget to add current drawn from other block say Block B, while running simulation for Block A. This would not represent the correct voltage going to block A and hence it might cause issue in the silicon.

If there is Power-on-reset circuit internal to the chip, then one has to be extra cautious with respect to VDD/GND voltage while running simulation for the POR block. POR is a very critical block in the chip. If POR circuit does not work as expected then chip is almost dead and worthless for extracting any data for the chip.

One should be aware that in VDD/GND path from pin to pad, there is a bond wire which inserts inductance in the circuit. So the voltage on VDD/GND line in the chip side would depend on rate of current flow through VDD/GND line. Basically the point I am trying to make is that bypass capacitor on VDD pin at board level, is not going to kill switching noise on the chip side. One more point that, at power up there is lot of surge current through VDD/GND to charge full chip capacitance between VDD and GND lines. And hence there could be very unpredictable voltage profile at VDD and GND lines at power-up. Not much could be done inside the chip to mellow it down. This could activate multiple Reset signals on the POR block. I think, POR block should be designed in such a way that in the set of multiple Reset signals, the last Reset signal should have sufficient width to reset the whole chip.

Die Temperature:

If product is low power, then die temperature may not be a big issue inside the chip as there may not be much difference between die and ambient temperature.

But if it is a high power part, then it might become an issue. Particularly if part is an industrial part, where ambient temperature could be 105C. In this case, one has to ensure that die temperature does not go above 125C as most of the library cells are characterised for maximum temperature of 125C. To ensure accuracy of die temperature, both max power dissipation inside the chip and max thermal resistance of the package should be highly accurate. However I believe, it is very difficult to get accurate value of these variables through simulation. So enough margins should be kept in the design otherwise it could lead to failure of the chip.

The other issue is temperature gradient in high power part. The power dissipation may not be uniform across the die and hence temperature may not be same across the die. As delay in digital circuit is a function of temperature and setup/hold time is a function of difference in delay in two paths, this could become a very big issue if timing path extends from low temperature area to high temperature area. I am not sure, but I believe that temperature sensors are placed across the die in high power part to mitigate this issue as it might be un-reliable to ascertain the die temperature by keeping just ambient temperature accurate as thermal resistance of the chip may not be accurate enough and even may vary from part to part.

One more issue with high power part is in High Temperature Operating Life qualification test. Generally oven temperature is elevated to emulate long life of part in years in the field to thousands of hours in test setup. However, unless part is designed for temperature higher than 125C (which is the temperature in real usage in the field), there is not much scope to elevate the oven temperature. I am not sure whether design team agrees to design the part for HTOL temperature as it would over design the part for normal application temperature. And if part is not designed for HTOL temperature, then time taken to perform HTOL is just too high to be practical and hence out of question. I am not sure what is done in the industry to tackle this challenge. I would appreciate if any reader could throw some light on this.


要查看或添加评论,请登录

Kaushal Jha的更多文章

社区洞察

其他会员也浏览了