The Timing analysis should answer the following questions:
In a top-down approach, Timing Analysis is used for verification purposes. Timing Analysis must say, given the direct environment of the chip (i.e. timing constraints on the interface), if the chip will be able to work properly.
In a bottom-up approach, Timing Analysis is used for characterization purposes.
As timing performance of a chip under design is one of the main concerns facing designers, it must be controlled and refined at each stage of the design flow.
In a classical top-down methodology, timing constraints are set at system-level, and synthesis and PR tools are timing-driven. A first Timing Analysis run is done after synthesis, and then after floorplanning and placement. In those cases delays are only estimated, not taking into account the parasistics induced by global routing.
The final sign-off Timing Analysis and characterization is done after global routing, on a netlist back-annotated with extracted parasitics (a post-layout netlist).
Since synthesis and PR tools are timing-driven, timing characterizations of the building blocks are also now needed. Those building blocks are sometimes large third-party IPs, with fixed timing characterizations. In such cases, timing constraints are also set by those blocks, and the methodology aquires bottom-up aspects.
Signal Transition | A transition is a change in the state of a signal. A rising transition occurs when the signal's voltage swings from a low level to a high level (from 0V to VMAX). A falling transistion occurs when the signal's voltage goes from a high level to a low level (from VMAX to 0V). In Avertec methodology, a signal transition is also refered to as a timing event |
Threshold | The delay threshold is the voltage ratio where a signal is considered as having changed state. Typically, this ratio is 50%. The threshold is also the measurement point for delay calculation. |
Delay | A delay is defined between two signal's transitions, having a causality relation (the first transition implying the second). The value of a delay is the elapsed time between the instant of the first signal's transition crossing the threshold and the instant of the second signal's transition crossing the threshold. As a result of this definition, it is possible to have negative delays (especially with a long input slope) |
Delays can be measured either on the direct output of the gate, or on any node of the RC interconnect network. Signal propagation through the RC interconnect network causes additionnal delay.
The transition of a signal is modeled by its slope:
Slope | A slope is defined between two thresholds: a high threshold (VTH HIGH) and a low threshold (VTH LOW). The value of the slope is the elapsed time between the instant of the signal's transition crossing VTH LOW (VTH HIGH) and the instant of the signal's transition crossing VTH HIGH (VTH LOW). |
Typically, VTH LOW varies from 5% to 40% of VMAX, and VTH HIGH varies from 60% to 95% of VMAX. A single value defined between two thresholds is a very reductive way to model slopes, as it gives no information about the shape of the slope. The most basic approach is to assume that the slope is linear. In Avertec methodology, the shape of the slope is assumed to be an hyperbolic tangent.
The delays and slopes of a given gate depend on three different kinds of factors:
Internal Factors |
|
Global External Factors |
|
Local External Factors |
|
When dealing with simple gates, delays are most often calculated by electrical simulation (SPICE simulation). The operating mode for calculating delays characterizing a gate is as follow:
As an example, let's consider the following gate, and its associated truth table:
The four identified causality relations and associated delays are reported below. The state of the other input that conditions the causality relation is given between brackets.
delay0: I0 rising -> 0 falling (I1 = 0) delay1: I0 falling -> 0 rising (I1 = 0) delay2: I1 rising -> 0 falling (I0 = 0) deIay3: 1 falling -> 0 rising (I0 = 0) |
Four successive electrical simulations are then necessary to completely characterize the gate.
The same kind of delay calculations can be done on more complex designs. For example, let consider the following design.
We can deduce from the connectivity of the gates, and from their truth tables, causality relations between the transitions on inputs A, B, C, D and the transitions on the output I. All the possible causality relations, and the delay associated with each, are given below. The pattern conditioning each relation is given between brackets.
delay0: A rising -> I rising (B = 0, C = 0, D = 1) delay1: A falling -> I falling (B = 0, C = 0, D = 1) delay2: B rising -> I rising (A = 0, C = 0, D = 1) delay3: B falling -> I falling (A = 0, C = 0, D = 1) delay4: C rising -> I rising (A = 0, B = 0, D = 1) delay5: C falling -> I falling (A = 0, B = 0, D = 1) delay6: D rising -> I falling (A = 0, B = 0, C = 0) delay7: D falling -> I rising (A = 0, B = 0, C = 0) |
See below an illustration of the calculation of delay0 between A rising and H rising. A rising implies E falling if B = 0, which sets the value of input B. E falling implies G rising, which in turn implies I falling if H = 0. H = 0 if F = 1 and D = 1, which sets the value of input D. F = 1 if C = 0, which sets the value of input C.
The pattern conditioning A rising -> I rising is then B = 0, C = 0 and D = 1.
Though being quite simple, the above circuit has necessited eight simulations of the full design to completely characterize it.
Actually, for a design of n inputs and m outputs, there may exist up to 2n x 2m causality relations between input and output transistions. This can lead to a maximum of 2n x 2m electrical simulations to calculate all the delays associated with those relations, i.e. to characterize the design.
Furthermore, a causality relation is not easy to identify, and the setting of the pattern conditioning it is a very complex task.
Apart of very regular designs, such as memories, where causality relations are quite simple to establish, and where simulation can be aggressively optimized, these severe drawbacks render electrical simulation impossible to apply on designs exceeding a thousand transistors.
Static Timing Analysis has arisen from two constatations.
The first constatation was that, causality being a transitive relation, a global causality relation (from an input pin to an output pin) could be discomposed into elementary (gate) causality relations. If we take the example above, the causality relation A rising -> I rising can be decomposed into A rising -> E falling -> G rising -> I rising. A typical timing representation of such a causality relation is given by a timing diagram, as illustrated below.
The second constatation was that, as a first approximation, delays associated with elementary causality relations could be added to get the delay of the global causality relation. From this statement we can see that it is possible to calculate (by electrical simulation) the delays associated with a gate only once, and thus achieve significant gains in calculation complexity: the delay of a global causality relation can be calculated by just adding elementary delays.
This statement supposes that delays are independent of their local environment. We have already seen that this is not really the case, and so this leads to some inaccuracy in the delay calculation. We will now see how to refine the delay modelization to attain a accuracy near the one obtained by electrical simulation.
The previous constatations allow us to model designs using weighted graphs, where an edge is a signal transition, and an arc is a causality relation. The arcs are weighted by the delay of the causality relation. The graph of a simple gate (a nor) has the following appearance:
The graph of a gate-level design such as the one below is made by the connexion of the gates' graphs.
Thus, the graph of the design described above has the following appearance:
This graph is known as a causality graph. A global causality relation is represented here by what is called a path in graph theory terminology.
A graph representation allows us to apply well-known efficient algorithms, such as path searching. In a quite straightforward manner (compexity O(n)), by just following the arcs, we can identify all the timing paths of the design (the eight global causality relations described above).
As stated in chapter 1.2.3, gate delays depend on internal factors, global external factors and local external factors. Until 90nm, internal factors don't change for a given chip, and global external factors don't change for a given timing analysis run. The only variable factors are the local external factors, i.e. the input slope and the output load of the gate.
When calculating paths delays, we sum gate delays. As a first approximation, a gate delay can be modeled by a simple value. Experience has showed that this is very unrealistic, since the local external factors can vary a lot from one instance of a gate to another. This has led to a more wide-ranging approach to gate characterization: gate delays are given for a set of input slopes and a set of output loads.
The most common way to describe this set of delay is a lookup table. A common lookup table is a 2D matrix, having for axes the input load and the output capacitance. The following figure illustrates a typical lookup-table.
Lookup table characterizations are most often provided with the gate-library itself. Since they are given for a limited range of PVT, it is often necessary to re-characterize them.
In 90nm and below, other factors may also change: local power supply due to IR-drop, instance dependant parameters (stress effect, proximity effect). This limits the acccuracy a lookup-table based characterization.
In terms of timing, designs are made of combinational elements, and of sequential (clocked) elements.
What we called combinational elements are elements (logic gates) that just propagate signals, independantly to any clock.
Sequential elements are clocked elements. In most cases, they have a memorizing behavior controlled by clock signals (latches, flip-flops). In order to operate correctly, these elements must respect timing constraints (typically the setting of the data to memorize relative to the clock signals).
A kind of clocked element is the dynamic logic stage (precharged logic). It must also respect timing constraints.
The main purpose of the timing analysis process is:
In the following sections, we will first study the timing behavior of sequential elements such as latches, flip-flops and dynamic logic gates.
We will then discuss the constraints sequential elements set on the interface of the design (setup and hold times, access times, frequency)
Then we will study how to integrate those elements in such a way that the design can operate correctly.
Below is the schematic of a simple latch:
The following timing diagram describes the timing behavior of the latch.
When CK is high, the latch is said to be in transparant mode, i.e. the value on the input DIN is observable on the output DOUT, after the delay Ttransparent, also refered to as transparancy.
When CK goes from high to low (the latch closes), the value of DIN is memorized in the latch. DIN must be stable at the time CK falls. Actually, to ensure the stabilization of the memory loop, DIN must not only be stable at the time CK falls, but also for a certain amount of time before CK falls, and for a certain amount of time after CK falls. These times are refered to as setup time and hold time respectively.
When CK is low, the latch is said to be in memorizing mode. The value observable on DOUT is the value memorized when the latch is closed.
When CK goes from low to high, the latch comes back in transparent mode, and a new value on the input DIN becomes observable on the output DOUT after the delay taccess, also refered to as access time.
A latch is characterized by four intrinsic values: the transparency, setup, hold and access times.
A typical flip-flop is made of two latches in series, where the clocks are inverted.
The following timing diagram describes the timing behavior of the flip-flop.
When CK is high (transp1):
When CK goes from high to low (transp1 -> memo1):
When CK is low (memo1):
When CK goes from low to high (memo1 -> transp2):
Below is a typical implementation of Dynamic CMOS logic (precharge-evaluate logic).
During the precharge phase, the output node of the dynamic CMOS stage is precharged to a high logic level. When the clock signal rises at the beginning of the evaluation phase, there are two possibilities: the output node of the dynamic CMOS stage is either discharged to a low level through the NMOS circuitry (falling transition), or it remains high. Regardless of the input voltages applied to the dynamic CMOS stage, it is not possible for the output node to make a rising transition during the evaluation phase. Consequently, the input configuration must have been set before the evaluation phase and must remain stable during it, otherwise an unwanted conducting path may appear through the NMOS circuitry, leading to an erroneous low-level state of the output node.
Let's consider the following design made up of two flip-flops:
The following timing diagram illustrates the correct operating mode of the design: the value v2 stored in FF0 becomes accessible on B on the first falling edge of CK, then v2 propagates through the combinational block, finally v2 is stored by FF1 on the second falling edge of CK.
The design operates correctly because period - tsetup(FF1) > taccess(FF0) + tcomb. Otherwise, as illustrated in the timing diagram below, if period - tsetup(FF0) < taccess(FF1) + tcomb, the second falling edge of CK occurs before the value v2 stored in FF0 has propagated through the combinational block. The value stored by FF1 is v1, the value stored by FF0 in the preceding phase.
From these observations, we can deduce that there exists a minimum period (and a maximum frequency) allowing the design to operate correctly.
Synchronous designs are based upon the communication between memory elements, such as latches or flip-flops, this communication being controlled by the clock signal. Therefore, a single clock signal is connected to an important number of memory elements in the design, and it is very difficult to ensure that the clock signal will propagate homogenously (with the same delay) towards every memory element, even by inserting clock-tree bufferization. This phenomena is known as clock skew. The following diagram presents asymmetric clock buffering, leading to skew between the two flip-flops.
The communication between the two flip-flops, taking into account the skew, is illustrated in the following timing diagram.
if taccess + tcomb > skew, the design will operate correctly.
Otherwise, if taccess + tcomb < skew, the design will not work. Note that this timing error is independent of the period.
When a flip-flop input is directly connected to an input pin, or is connected through a combinational path to an input pin, the respect of setup/hold constraints depends on the stability window of the input signal itself, and on the propagation delays of the input and clock signals towards the flip-flop.
The input signal's stability window may occur too soon or too late, relative to the clock signal, to ensure the respect of the setup/hold constraints of the flip-flop.
The final purpose of any design being its integration into a higher-level design, it is therefore necessary to provide information on the constraints that apply on the input pins of the design, i.e. in which timing windows input signals must be stable to ensure the respect of internal sequential elements. It is then possible to make the higher-level design in such a way that the stability windows are correctly set on the inputs of the design it integrates.
The constraints are obtained by calculating global setup and hold times.
Let's consider the following design, where I and CK are input pins.
The diagram below illustrates the calculation of global setup/hold times.
global_setup = setup + tcomb_I - tcomb_CK global_hold = hold + tcomb_CK - tcomb_I |
Another useful information is the access time, which tells the designer when the data on an output pin is available, relative to a clock edge. In the following design, O is an output pin.
The global access time is illustrated in the timing diagram below.
global_access = tcomb_CK + access + tcomb_O |