















## For example



## Things that affect timing

- Factors
  - Device characteristics (Vth, Ids, etc.)
  - Interconnect characteristics (RCL)
  - ➤ Coupling
  - > IR drop, power noise
  - ➤ Temperature
  - ≻ Clock skew
  - ➤ Modeling errors
- · Variability and uncertainties
  - > Process variations (including measurement uncertainties)
  - > Environmental variations (temperature map, power map, etc.)
  - Pattern variations (ex. functional vs. structural)

lide # 12 Wang@UCSB (for private use

2

### Commonly-asked questions

- What cause a speed path to be missed by timing analysis tools?
  - What do I miss after pre-silicon analysis?What are binning based on?
- How variations should be modeled in order to support timing analysis?
   > How to build an effective statistical timing model?
- Where do the variation models come from?
   > What models can a fab provide?
- What are the important variations to be considered in analyzing timing?
   > Which is the dominating factor? Leff or Vth variation?

## Understanding chip variability

- Result of interactions among
  - > Process variability and uncertainties
  - Design variability

# 14 Wang@UCSB (for private use

- Modeling uncertainties
- Variability in assumptions employed in tools for fast approximation
- > Variability and uncertainties in test and measurements
- To understand chip variability, we need to decompose the sources of variations and minimize their interactions
  - > To analyze and control variations separately

General problem formulations in statistical domain
Through out this tutorial, we will learn how to statistically analyze variability
We will often face one of the following 4 states and a states of analysis of analysis.

- categories of analysis
  Statistical characterization
- Statistical modeling

Slide # 13 Wang@UCSB (for private use)

- ➤Worst-case corner analysis
- ➤Statistical analysis

Slide # 15 Wang@UCSB (for private use)







3







Topics to cover

- Brief discussion (only if we have time, slides not included)
   > Timed ATPG
  - Timing diagnosis

Slide # 21 Wang@UCSB (for private use)



We begin with discussion on modeling of process variations





de # 24 Wang@UCSB (for private use)

### For example

- Measure the thickness of the transistor gate dielectric at 100nm technology generation
   > Suppose the gate is 2nm thick
  - $\succ$  Process tolerance is  $\pm$  5% = 0.1nm
- P/T = 10% = (6  $\sigma_{measurement}$ ) / 0.1nm >  $\sigma_{measurement}$  = 0.0017nm > An atomic step on silicon is about 0.15nm!
- Direct measurement on some process parameters can be difficult

Slide # 25 Wang@UCSB (for private use)

## Model-based measurement

- Each measurement method is based on a model that relates observed signals to values of variables being measured
- Model-based measurement alleviates the high precision requirement for measuring some process parameters directly
- Depending on the model and the algorithm used to extract values from the observed signals, various amounts of error can be introduced

# 26 Wang@UCSB (for private use









## Channel length

Slide # 31 Wang@UCSB (for private use)

Slide # 33 Wang@UCSB (for private use)

- $L = L_m \Delta L$ >  $L_m$ : drawn channel length
  - $\rightarrow \Delta L$  : difference between drawn and actual
  - > The objective is to measure  $\Delta L$
- Measuring  $\Delta L$  is more complicated
  - > Use channel resistance method (R<sub>m</sub>), by
  - > Calculating A = 1 / ( $\mu$  Cox W (V<sub>gs</sub> V<sub>th</sub>) )

  - At various V<sub>gs</sub> values
     Intersect different lines in R<sub>m</sub> Vs. L<sub>m</sub> plot
  - $\succ$  Use intersected point to obtain  $\Delta L$
- See Handbook of Silicon Semiconductor Metrology

## To summarize ...

- MOSFET device model > I<sub>ds</sub> = 0 for V<sub>gs</sub> - V<sub>th</sub> < 0  $\succ$   $I_{ds}$  = ( $\mu$  C \_ ox W / (L -  $\Delta$ L) ) (V \_ gs - V \_ th - 0.5 V \_ ds) V \_ ds > I<sub>ds</sub> = ( $\mu$  C<sub>ox</sub> W / 2(L -  $\Delta$ L) ) (V<sub>gs</sub> - V<sub>th</sub>)<sup>2</sup> (saturation region)
- Parameter space  $P = \{W, \Delta L, V_{th}, \mu, C_{ox}\}$ > They may not be directly measurable > They are to be inferred from measurements of Ids, Vqs, and Vds



- ✓ i(v) : the current-voltage measurements
- ✓ This is a typical non-linear least-square analysis
- Parameters in P may NOT be independent ✓ Previously, we assume that they are independent
- For complex M, local minimization is done for each selected subset of parameters in P
- Derived P values are subject to error ε<sub>p</sub>

Statistical Characterization of P

- If we treat each variable in P as a random variable, we measure their means and sigmas > These random variables can be correlated!
  - > This increases the difficulty of measurement
- · One simple approach is to measure many devices individually
  - $\succ$  Because  $\epsilon_{\rm p}$  is unknown, the statistics of P can become questionable
  - > Moreover, a complex model such as BSIM-3 have hundreds of parameters, many of which are hard to extract by measuring capacitance, current, voltage.
  - > These increase the difficulty of variation extraction

de # 34 Wang@UCSB (for private use)

e # 32 Wang@UCSB (for private use



Slide # 35 Wang@UCSB (for private us

# (Boning & Nassif 99) · Observe that parameters are highly correlated - The error $\boldsymbol{\epsilon}_{p}$ is not independent from the parameters > The parameters are eventually used to characterize the performance of a device > The error $\epsilon_p$ will be propagated into error in this performance characterization

The fact that error  $\epsilon_{\text{p}}$  is not independent from the parameter increase the error in performance characterization

# 36 Wang@UCSB (for private use

6



- Simple concept
  - > Two Normal variation : A = N( $\mu_1,\sigma_1$ ), B = N( $\mu_2,\sigma_2$ )  $\succ$  Let f = A + B
  - $\begin{array}{l} \succ \ \sigma(f) = (\sigma_1^2 + \sigma_2^2)^{1/2} \\ \succ \ \sigma(f) = \sigma_1 + \sigma_2 \end{array} \quad \begin{array}{l} \text{if } A, \ B \ are \ totally \ independent \\ if \ A, B \ is \ 100\% \ correlated \end{array}$
- If  $\varepsilon_p$  is independent of parameters, we have > Performance z = f (P +  $\varepsilon_p$ )
- This concept is general > We will come back to this again in the section of statistical timing analysis ide # 37 Wang@UCSB (for private use







Break 5 minutes for questions Continue on: Modeling process variations Next, we will focus on variation sources ide # 41

## Process variations (Boning & Nassif 99)

- Process variations can be classified as
  - > Variation in geometry
  - > Variation in material
  - > Variation in electrical property
- It can also be classified as
  - Device variation
  - > Interconnect variation

# 42 Wang@UCSB (for private use

# 40 Wang@UCSB (for private use)





#### Device/geometry (Boning & Nassif 99) Device/material (Boning & Nassif 99) Doping variation · Film thickness variation Gate oxide thickness is critical > Due to does, energy, angle, or other ion implant Usually well-controlled dependencies > Affect junction depth and dopant profiles Lateral dimension (length, width) > Hence, affect effective channel length L<sub>eff</sub> > Typically due to photolithography proximity effects ➢ Also affect V<sub>th</sub> Systematic pattern dependent > to Mask, len, or photo system deviations Not layout dependent to plasma etch dependencies · Variation in deposition and anneal processes Can have wafer scale dependency, or depend on layout density and aspect ratio (L/W) > Suffer substantial wafer-to-wafer and with-in wafer variations > May result in large device-to-device random variation MOSEETs are sensitive to $\succ$ channel length L, $t_{\text{ox}}$ , and some W > Impact contact and line resistance L variation has received attention due to its impact directly on output current characteristics (discussed later) Slide # 45 Wang@UCSB (for private use # 46 Wang@UCSB (for private use)

## Device/Electrical (Boning & Nassif 99)

Vth variation

- > Often due to oxide thickness, geometry variations, and other sources
- > It is characterized separately because of its importance
- Discrete dopant variation
  - > Random placement and concentration fluctuation due to discrete location of dopant atoms in the channel and S/D
  - Study shows that it is not a severe problem for logic but may affect SRAM containing large number of devices that should be well matched
  - > Also cause Vth variation
- Leakage current
  - > Sub-threshold leakage currents can vary significantly

ide # 47 Wa @UCSB (for pri

### • Line width and space Mainly photolithography and etch dependencies Directly induce line resistance variation > Also cause capacitance variation within layer and across layers > Affect signal integrity analysis Metal thickness > Is usually well controlled in conventional process > Can have wafer-to-wafer and within-wafer variations Copper polishing process can result in thickness loss of 10-20% depending on the patterns Dielectric thickness > Can have substantial variations

Interconnect/geometry (Boning & Nassif 99)

- > At wafer level, typically on the order of 5%
   > Within-die can have pattern dependent variation due to such as CMP
- Contact and via size Affected by etch process and systematic layer thickness variation
   Directly impact contact and Via resistance

8

### Interconnect/material (Boning & Nassif 99)

- Contact and via resistance
  - Sensitive to etch and clean processes
     Substantial wafer-to-wafer variation
- Metal resistivity
   > Usually well controlled and vary wafer to wafer
- Dielectric constant

Slide # 49 Wang@UCSB (for private use)

Slide # 51 Wang@UCSB (for private use

- Depend on the deposition process
- > Is usually well controlled
- Pattern dependent variation may be important for low-K dielectrics in interconnect

## Studying variations

- Variations have been there for a long time
  - > People have studied process variations for a long time
  - Historically, analog designs are much more sensitive to process variations than logic
  - ✓ Eg. Mismatch issue in two devices
    - See Statistical modeling of device mismatch, Michael, C.; Ismail, M.; Solid-State Circuits, IEEE Journal of, Volume: 27, Issue: 2, Feb. 1992

### The studies of process variations

# 50 Wang@UCSB (for private use

- > Primarily for the control of process quality
- > Diagnose unusual equipment disturbances
- > Diagnose unusual environmental fluctuations

Studying variations Disturbances in Variations in Disturbances in geometry and electrical parameter process parameters P material parameter D properties Y  $D = \overline{G(P)}$ Y = F(D)(Physical filtering (Statistical filtering Design of experiments) Process characterization F<sup>-1</sup>) • P are the independent sources of variations • G can be studied through design of experiments • Parameters in D can be correlated Usually easier to observe Y *F* is studied through (statistical) process characterization > Here "filtering" corresponds to the diagnosis process to relate causes of variations







9











|                                    | Variatio           | on trends          |                          |  |  |
|------------------------------------|--------------------|--------------------|--------------------------|--|--|
|                                    |                    |                    |                          |  |  |
|                                    | Impact on<br>delay | Impact on<br>power | Trend                    |  |  |
| L <sub>eff</sub>                   | Large              | Large Large        |                          |  |  |
| w                                  | Small              | Small              | Decreasing               |  |  |
| V <sub>th</sub>                    | Small              | Medium             | Increasing<br>Increasing |  |  |
| Interconnect                       | Small              | Low                |                          |  |  |
| Other                              | Variable           | Variable           | Flat                     |  |  |
| N Hakim, ICCAD04, N Menezes, VTS05 |                    |                    |                          |  |  |

Slide # 61 Wang@UCSB (for private



















Slide # 70 Wang@UCSB (for private use)





## Study : Gate CD variability on delay

- See M. Orshansky et. al. 2002 TCAD, 2004 TSM
- Highlights
  - Study Lgate variability in 0.18µm technology
     Development of test chips

  - Consider density and orientation
     Consider impact on clock tree, cell delay, path delay, and circuit delay
  - Consider sampling resolution, sampling location, as well as optical proximity correction
- Conclude

Slide # 73 Wang@UCSB (for private use

- > CD variability is pattern dependent (density and orientation)
- Distra-die CD variation is largely systematic
   Cell delays vary as much as 17% among different locations
   Clock skew vary as much as 8% of clock cycle (74ps)
- Circuit delay degrades as much as 20%
   Mask level spatial gate OPC should be employed
- OPC that takes spatial gate information into account performs better than traditional OPC approach

### Study : variability on clock skew

- Source: [IEDM'98] S.R.Nassif. Within-Chip Variability Analysis
  - Highlights Based on 0.25µm technology

  - Study intra-die variability Channel length variability  $\pm 0.035 \ \mu m$

  - Wire width variability ±0.25 µm
     Wire widths for worst-case skew 48.9 ps
     Channel lengths for worst-case skew 171.5 ps



Channel lengths # 74 Wang@UCSB (for r

Wire widths

### Study : Pattern-dependent variation on delay

- Source : V. Mehrotra et. al. DAC 2000, 172-175
- Highlights
  - > Study delay variation in both Aluminum and copper (0.60  $\mu m$  metal and ILD thickness)
  - Study clock skew in 0.25 µm technology > Study pattern dependent effects such as density to ILD thickness, dishing and erosion in CMP
- Conclude
  - Models for systematic variations are required for accurate simulation of circuit performance
  - > Interconnect CMP variation can increase bus delay by more than 30% even in copper technology
  - > Clock skew is not strongly impacted by interconnect CMP variation
  - Variation in device gate length can significantly alter path delays with an increase in maximum skew of about 50ps

Slide # 75 Wang@UCSB (for private use)



- Variation in Vth M. Niewczas, IEEE ICMTS, 1997
- Focus on test structures to study Vth
   T. Tanaka et. al. IEDM 2000
- ✓ Focus on variation in dopant profile
- · Variation in gate line edge roughness
  - S. Xiong, et. al. IEEE Tran. Semi Manu. 2004
     A. Asenov, et. al. IEEE Tran. Elec. Device, 2003
  - > Roughness is not an issue today
- > May affect leakage current due to short channel effect as technology scales
- Circuit sensitivity to interconnect variation
- Z. Lin et. al. IEEE Tran. On Semi Manu. 1998
- Interconnect is hard to characterize and model
   Develop a model for interconnect variation
- Sub-wavelength lithography
- > A. Kahng and YC Pati, DAC 1999 Conclude the importance of OPC and need for more effective OPC algorithms

Myth

Design & optimization

Myth

Timing

analysis.

RC extraction Delay calculation clock

Noise

analysis

Power

grid

SPICE

TCAD

Macro-modeling

DFM

Observed chip variability

Tester &

Test generation

Test delivery

• And many others ...

















## Timing Macro-modeling

- Objective: Creating reduced models at transistor level, gate level, or cell level to support fast timing simulation
  - Treat SPICE simulation as golden
  - >At transistor level, support path-based timing analysis
  - >At gate/cell level, support full-chip analysis

## Timing Macro-modeling



- > Focus on timing/delay characteristics
- ➤ usually >100x faster than SPICE

## Brief History - cell modeling

1µ : delay = f (C)

Slide # 85 Wang@UCSB (for private us

- > Capacitance load is the dominating factor to decide delay > Lumped capacitance model (from other gates)
- > Ignore slew > Device dominate delay, ignore interconnect R
- 1µ .5µ: delay = f (C, input slew, lumped RC) Slew considered
  - Lumped RC model at gate output
- <  $.5\mu$  delay = f (C, input slew, RC) + g (distributed RC) > Interconnect delay addressed with distributed RC
  - > Parasitic (RC) extraction is needed
  - Interconnect loading on gates studied

Slide # 87 Wang@UCSB (for private use)

## Two basic approaches K factor model

Re

Ŵ

Cl

- > Similar to tabular approach
- > For each load and slew, find delay value
- Lumped output capacitance cannot model load accurately
  - ✓ Modeling the "Effective Capacitance" for RC Interconnect of CMOS Gates Qian, Pullela, Pillage, TCAD Dec 1994 (>100 citations)
     ✓ Map complex RC load into effective capacitance

  - Later, R. Arunachalam, F. Dartu, L. Pileggi, ICCD '97 develop method to map RCL load into effective capacitance
- Switch resistor model
  - > Empirically fit the resistor value for each load
  - Store resistor values, rather than delay values More accurately when load is not purely capacitance

# 88 Wang@UCSB (for private use

## Table driven approach

Advantages:

Much faster STA than using complex equations

- Disadvantages:
  - Require large amount of memory
  - ✓ Usually (slew vs. load) is stored from a 5x5 up to a 9x9 table ≻Temperature/Voltage

✓ The method of applying a degrading factor  $\Delta$  is inaccurate

Slide # 89 Wang@UCSB (for private use

## Various enhancements

- F. Dartu, N. Menezes, J. Qian and L. Pileggi DAC '94 Replace switch with piecewise linear voltage source (in a switch resistor model)
  - Empirical gate delay model proposed for complex RC Loading (impedance)
     Address 2<sup>nd</sup>-order effect
- Hayes and White 1997, 10th IEEE ASIC conference Demonstrates that applying Voltage/Temp multiplicative degrading factor is inaccurate
  - For example, we characterize cells at 1v

  - > If 1.1v, we just multiply by a  $\Delta$  (before 97) > Proposes additive correction factor: If 1.1v, we add a  $\Delta$
  - A Korshak, JC Lee 2001 ISQED
- Use a current-resistor-capacitance model to match I, R, C to known timing data
- Shao et al, 2003, ISPD Second-order circuit model - not dependent on load!
   Gate can be independently pre-characterized

## Low-level macro-modeling

- Fully mathematical analysis of gate-structure ٠
  - High complexity
  - Based on actual device equations
- Table driven/Empirical equation
- Similar to STA cell modeling
- Extensive pre-simulation required
- Divide switching behavior into several regions model different regions with different equations
- Map CMOS gates to circuit primitives
- Usually map to inverters
- Macro-modeling other structures with the primitives



## Interconnect RC (capacitance extraction)

• 2D extraction

Slide # 91 Wang@UCSB (for private use)

- Consider area overlap between 2 layers (area C), side wall in the same layer (side C), and side wall to the adjacent layers (fringing C)
   The relationships relating geometry to C are characterized by the fab
- > Commonly used approach (can be implemented as a rule based tool)
- > Practical for worst-case STA, even though it is not accurate
- 2.5D extraction
  - Consider more layers and within a layer, the distance between wires
  - > Pre-characterize unit region based on possible patterns and develop library
  - > Commonly used for high-performance designs
- 3D extraction
  - > Most accurate but expensive
  - > Boundary element method (BME), finite element method, Monte Carlo method > Often applied at package or in characterization of patterns in 2.5D method
  - Not many people worry about RC extraction with variations today > Further studies are required in this area

Block-based vs. path-based

STA

Path-based

STA

For each path, extract transistor level netlist

Critical paths Worst-case timings

Timing violations

Path delays Worst-case timings

de # 93 Wang@UCSB (for private use



- STA or block-based STA > Usually rely on cell models

  - > The goal is to filter out critical paths for further analysis and optimization
- Path-based STA
  - Usually reply on transistor level timing analysis
  - > Try to achieve SPICE accuracy

Full-chip or

large modules

A set of

Critical paths

> Do it by following a path-by-path basis > Then, worst timing can be simply max(path delay, path delay, ..., path delay)

ide # 95 Wang@UCSB (for privat



Break 5 minutes for questions

Next, we will switch topic to Statistical timing analysis











Chang at. al. > Presented at ICCAD'03 also

- "First-order Incremental Block-Based Statistical Timing Analysis", Visweswariah et. al.
   > Won Best Paper Award DAC '04
- Message at DAC05:
   > Statistical timing analysis is a hot topic!

Slide # 101 Wang@UCSB (for private use



- Delays are represented as CDF, rather than PDF
   CDF can be characterized as piece-wise linear
- ✓ 3 points, 5 points, 7 points
- Reconvergent fanouts are handled by V Delay subtraction
- ✓ Mean and variance moment matching
- Three key conclusions
  - CDF is easier to handle, more efficient
    We have verified this claim independently
  - Handling re-convergent fanouts is not a critical issue
  - ✓ We also have verified this claim independently
     > The accuracies of using 3, 5, and 7 points are similar, but the run-times are proportionally longer

de # 102 Wang@UCSB (for private use)













## Hard case

- They propose heuristic to handle more complicate re-convergence situations
  - Keep a dependency list for every node (re-convergent sources)
  - Keep reducing the list to 1 node so that the simple case formulation can be applied (the mean/variance matching)
  - > More like the super-gate idea

Slide # 109 Wang@UCSB (for private use

## Performance impact

| Circuit | Performance impa | ct based on points |
|---------|------------------|--------------------|
|         | 5 points CDF     | 7 points CDF       |
| C432    | 2.0              | 4.0                |
| C499    | 2.7              | 4.7                |
| C880    | 2.5              | 4.5                |
| C1908   | 3.3              | 5.3                |
| C2670   | 2.5              | 4.2                |
| C3540   | 2.1              | 3.7                |
| C6288   | 2.5              | 4.5                |
| C7552   | 2.7              | 4.7                |
| C7552   | 2.7              | 4.7                |

Accuracy Impact Circuit 7 points 5 points 3 points Error % Error % Error % -2.2 C432 0.61 1.8 C499 0.57 1.76 -2.4 C880 0.44 1.7 -2.54 C1908 0.27 1.65 -2.63 -2.74 C2670 0.31 1.6 C3540 0.55 1.81 -2.15 C6288 0.79 -1.38 1.69 1.84 -1.98 C7552 0.69 Based on 99% point of delay value Source: ICCAD03 paper Slide # 111 Wang@UCSB (for private use)

## Accuracy in general

Handling re-convergent fan-outs seems to be unnecessary if our focus is at the worst-case bound
 Without handling re-convergent fan-outs, we can save from 10 to 33% of run times

IBM: Parameterized Block-Based SSTA (DAC04)

- Path-based analysis
  - Select a set of paths first and analyze those paths only (guard-band)
  - >The problem is simpler (nXn correlation matrix)

Block-based analysis

- > Like breadth-first search (level-by-level analysis)
- > Analyze the timing graph
- Unlike the EPA approach, they define a *canonical delay* form and propagate this form through the circuit
  - ✓In EPA, it propagates *Probabilistic Events*







## Computational overhead

- Run time overhead ➤ about 20% on batch operation > about 50% on the actual arrival time propagation
- Memory overhead > about 100% depending on the number of sources of variation and complexity of the models
- Capacity • > able to analyze 2M+ gate ASIC chips on 64-bit machines

Comparison experiments

- In order to compare the two approaches
   We implemented (to best of our knowledge) PWL and canonical methods for SSTA
  - ≻ We also implemented just STA

Slide # 117 Wang@UCSB (for private use)

Slide # 119 Wang@UCSB (for private use

- > Apply with our  $0.25\mu m$  cell library > Comparison at  $3\sigma$  worst-case delay point
- Comparison at mean delay point
- > Use Monte-Carlo analysis output as golden answer
- We artificially make pin-to-pin variations from  $\pm k\%$  to  $\pm 5k\%$ 
  - > To assess the situations when variations increase

Comparison

### 3-sigma error vs Monte-Carlo

| Circuit | STA    | PWL3  | PWL5  | PWL7 | Canonical |
|---------|--------|-------|-------|------|-----------|
| c499    | 5.97%  | .03%  | .47%  | .81% | .04%      |
| c880    | 6.69%  | .40%  | .12%  | .42% | .01%      |
| C2670   | 6.80%  | .55%  | .05%  | .37% | .05%      |
| c6288   | 11.19% | 2.09% | 1.21% | .69% | .01%      |

5x variance

| Circuit | STA    | PWL3  | PWL5  | PWL7  | Canonical |  |  |  |  |
|---------|--------|-------|-------|-------|-----------|--|--|--|--|
| c499    | 23.32% | .16%  | 1.91% | 3.15% | .04%      |  |  |  |  |
| c880    | 30.36% | 2.57% | .26%  | 1.14% | .03%      |  |  |  |  |
| C2670   | 29.8%  | 3.06% | .71%  | .79%  | .28%      |  |  |  |  |
| C6288   | 48.45% | 8.92% | 1.21% | .69%  | .35%      |  |  |  |  |

e # 118 Wang@UCSB (for private use)

# 116 Wang@UCSB (for private use)

| Com     | parison | at me       | ean dela | ay point  |
|---------|---------|-------------|----------|-----------|
|         | 5x va   | ariation ii | ncrease  |           |
| Circuit | PWL3    | PWL5        | PWL7     | Canonical |
| c499    | 6.62%   | 4.76%       | 3.42%    | .23%      |
| C880    | 7.97%   | 4.87%       | 3.44%    | .01%      |
| C2670   | 8.64%   | 5.93%       | 4.27%    | .37%      |
| C6288   | 12.28%  | 8.3%        | 6.03%    | .39%      |

| Duna time e |            |
|-------------|------------|
| Run-time    | comparison |

• For the two larger circuits (seconds):

| Circuit | PWL3 | PWL6 | PWL7 | Canonical |
|---------|------|------|------|-----------|
| C2670   | .31  | .53  | .86  | .33       |
| C6288   | .73  | 1.27 | 2.03 | 44.1      |

120 Wang@UCSB (for priv

### Summary

- PWL pros:
  - Very fast
     Can support arbitrary distribution (non-Gaussian)
  - Variable accuracy
- PWL cons
  - Correlations cause a lot of difficulty spatial correlations may be hard to model and handle
     Mean delay calculation may be inaccurate
- Canonical pros: Reasonably fast

Slide # 121 Wang@UCSB (for private use)

- Accurate
   Naturally handles all sorts of correlations well (if model is available)
- Canonical cons
  - > Can be slow due to correlation handling Assumes Gaussian distributions

## Some SSTA works at DAC 05

- Hongliang Chang, et. al. > Canonical representation for non-linear, non-Gaussian parameters
- Yaping Zhan, et. al. > Correlation-aware, non-Gaussian distributions
- Lizheng Zhang, et. al. Correlation-preserved, non-Gaussian distribution with Quadratic timing model
- Aseem Agarwal, et. al.
- > Statistical gate sizing with SSTA
- Vishal Khandelwal, et. al. > Taylor-expansion polynomial-representation based SSTA

# 122 Wang@UCSB (for private use















de # 130 Wang@UCSB (for private use)







- Target on stages after Static Timing Analysis, before tape-out
- What the tool does: Given a 2-timeframe pattern, estimate its delay distribution as (mean, σ) based on given a timing model
   > Benjamin Lee et. al. VTS05, ITC05
- Among many challenges, one difficulty lies in the fact that a pattern may sensitize different sets of paths on different dies
   Hazards may be present on one die but not another
   Overall delay distribution becomes multi-modal
- · Let's look at the Monte Carlo simulation results ...





• This method works fine if no hazard

Slide # 135 Wang@UCSB (for private use)

Slide # 133 Wang@UCSB (for private use)

## Run times of Pattern-based STA (seconds)

| Method      | C880 | C2670 | C6288  | Ind32 |
|-------------|------|-------|--------|-------|
| Monte Carlo | 4993 | 25382 | 78830  | 8015  |
|             |      |       |        |       |
| PB-STA      | 9.56 | 63.31 | 530.83 | 23.61 |

PB-STA is only 2-6 times slower than fixed-delay

e # 136 Wang@UCSB (for private use)





















# General thinking Cell characterization usually assumes single input switching > MIS can cause large delay shifts from the characterized values > MIS effect depends on signal alignment The probability of signals align close to each other is diminishing after passing through a few stages of gates Therefore, most MIS effects occur at the gates closer to the launching latches · MIS affect short paths more severely than long paths Need to check hold time violation (minimum delay) more carefully with MIS than setup time violation > Speed up amount is greater than slowdown amount

# 148 Wang@UCSB (for private use)

# General approach - filtering Because MIS may not occur often, we usually take a filtering approach to rule out gates or cells that MIS are impossible to happen > For the remaining gates and cells, we assume the worst

- Filtering methods
  - > Filtering based on timing windows from STA
    - ✓ If time windows of two signals do not overlap at all, we say that MIS cannot happen for these two signals
    - ✓ We need to pursue an iterative algorithm until STA results converge, because if timing windows do overlap, we need to change the gate's output delay and propagate the change to all downstream gates whose delays are affected
  - > Filtering based on logic constraints ✓ This is a typical ATPG problem
- Adding statistical process variations in the analysis > See Agarwal, A.; Dartu, F.; Blaauw, D.; DAC 04, pages:658 - 663









## Other models

- T. Sakurai TED 1993
- $\succ$  Derives closed form equations to model the waveform of an RC line
- J. Qian, S. Pullela, L. Pillage TCAD 1994
  - Derive new model for effective capacitance, because others have ±10% error, and optimism is generally unacceptable
     Introduce π-model to separate the capacitive element into 2
  - elements, one before and one after the resistor
- H. Kawaguchi, T. Sakurai ASP-DAC 1998
   > n-line coupling capacitance equations without victim and aggressor relationship
- A. Kahng, S. Muddu, and D. Vidhani ASIC/SOC 1999 > Extend  $\pi$ -model by separating the resistive element into 2 elements, one before the  $\pi$ , and one in the  $\pi$ 
  - > Done to reduce the over pessimism and over optimism of SF







Slide # 153 Wang@UCSB (for private use)





> So the idea is (1) extract power map (2) STA with the map

# 158 Wang@UCSB (for private use)

### Power grid analysis

- Model power-grid as a RLC network
- > Circuit abstracted into time-varying piecewise-linear current sources
  - > Simulate circuit with the ideal power grid to obtain current profile
- Modified nodal Analysis (MNA) used to solve for power grid node voltages
- Converts the problem into solving a sparse, symmetric-postive-definitie linear system > G x(t) + C  $\partial$ x(t)/ $\partial$ t = b(t)
  - > G: conductance matrix
  - > C: admittance matrix due to C,L
  - > x(t): time-varying vector of voltages at nodes
  - b(t): time-varying current sources

Slide # 159 Wang@UCSB (for private use)

### IR drop and dI/dt noise

#### IR drop

- Usually refers to decrease/increase in power/ground rail voltage due to resistance of devices between rail and a node of interest
- Common practice is to budget a max-per-rail static voltage drop tolerable
- Static IR-drop can be calculated from extracted parasitic / average power consumption (DC analysis)
   Dynamic-IR drop- require vector based analysis

#### dI/dt noise

- > Inductive dI/dt noise used to occur mostly on package
- > On-chip interconnect's impedance is no longer ignorable due to higher frequencies
- Change in current (dI)
   Simultaneous switching big current swing

160 Wang@UCSB (for private use)

### Various studies

- H Kriplani, FN Najm, IN Hajj, IEEE TCAD '95 Linear time algorithm: finds upper-bound estimate of current wave-forms at all contact points
- HH Chen and David Ling DAC '97 (cited by 111)
   > Describes models used for power bus / switching circuits/decoupling capacitors
- H.H. Chen and J.S. Neely, IEEE Transactions on Components, Packaging and Manufacturing Technology, Aug 1998
- Analyze IR drop and inductive dI/dt noise
   Notes: worst-case dI noise and worst-case IR drop do not occur at same time
- > Power-supply distribution model
- Switching-circuit model

#### Slide # 161 Wang@UCSB (for private use

## Various studies

- Yi-Min Jiang, K-T Cheng, An-Chang Deng, ISLPED 98 Genetic-algorithm approach to generate patterns
- Estimate IR drop and dI noise based on charge/discharge current cell library Yi-min Jiang, K-T Cheng, DAC '99
- > Statistical model derived by simulating characterization patterns ✓ Use GA search to find patterns (last paper)
- ✓ Find average voltage for each cell for each pattern average voltages form distribution
- A. Dharchoudhury, et al, DAC 98 (based on PowerPC)
   > Describes methodology for power supply design/analysis
  - > IR-drop analysis is discussed
    - ✓ Transistor level is infeasible
    - ✓ OTS blocks (standard cells) macro-modeled as current source
    - ✓ Each block has an IR-drop budget (voltage drop )
  - ✓ If budget violated, power grid that supplies block is augmented
- P. Larsson, IEEE Custom Int. Circuits Conf 1999
- > Describes noise suppression techniques Makes some predictions for the future based on process parameters
- @UCSB (for

## Various studies

- Sani Nassif, Joseph Kozhaya, DAC 2000 (fast simulation) PDE-like multi-grid method for simulation of power grid ( computation wire, not macro-modeling)

  - Circuit abstracted as time-varying current sources
     Grid-reduction technique
- M.Zhao, et al DAC 2000 (Hierarchical analysis) Difficulties in power network analysis
  - Network is huge, typically 1-100 million nodes
  - ✓ Sparse linear system solution methods: conjugate gradient Network is nonlinear due to switching devices
  - Solution: simulate individual blocks without power network, then simulate power network using time-variant current profiles
  - Speed-up proposed:
     Macro-model local power grids
- J. Saxena, K. Butler, V. Jayaram, et al, ITC 2003
  - Structural-tests have a lot of switching activity

  - ✓ Worst-case sceario for IR-drop Analyzed chips increased switching activity with structural test induced IR drop caused failure
- lide # 163 Wang@UCSB (for private use

## Various studies

- D. Kouroussis, Rubil Ahmadi, Farid Najm, DAC 2004 > Abstract circuit in terms of current constraints (peak current constraint)
  - > Use a upper/lower bound of supply variation
  - > Extract critical paths
  - > Verify that voltage of critical paths are within bounds
  - > Solve for max. delay of paths given current constraints

## Jing Wang , et al. VTS '05

- Power region model
  - Assume supply voltage within a region is uniform
     On-chip Ldi/dt drop is neglected

### Switching Model

- Triangle/Trapezoid current model
- ✓ Gates see constant average Vdd

164 Wang@UCSB (for private



### Study: correlating structure test to functional test

- Motivations
  - >Examine the correlation between the frequencies measured using various structural testing and functional testing
  - >Investigate structural testing as an option for speed binning
    - ✓ Reduce tester cost for speed binning
  - Reduce the cost of testing delay defects

### Slide # 167 Wang@UCSB (for private use



## Structural Testing

- Structural testing provides an attractive complementary/alternative solution
  - ➢Relaxed speed and accuracy requirements on the external pins
  - Number of high performance tester channels are minimized
  - ≻Low cost testers can be used
  - ≻Easier debugging

lide # 169 Wang@UCSB (for private use

≻Can achieve high fault coverage

## **Previous Work**

- Earlier studies shown poor correlation due to the lack of coverage of paths around memories (Belete et al, ITC 2001)
- Cory et al, IEEE Design & Test, 9-10/2003, found a linear relationship between the frequencies of the functional and latch-to-latch path delay tests.
- We could not duplicate D&T 2003 result for high performance designs (>1 GHz).

170 Wang@UCSB (for private u

## Types of Structural Tests

- At-speed memory BIST test
- Transition tests:
  - Simple transition tests: transition tests w/o going through memories.
  - Complex transition tests: transition tests going through memories.
- Path delay tests:
  - $\succ$  Simple path delay tests: latch to latch path delay tests.
  - Complex path delay tests: path delay tests involving
  - memories or Cycle-stealing path

Slide # 171 Wang@UCSB (for private use)

## Chip Used for Experimentation

• MPC7455 microprocessor executing to the PowerPC<sup>TM</sup> instruction set architecture

| Frequency | # Logic<br>Transistors | # of<br>Latches | # of Stuck-at<br>faults |
|-----------|------------------------|-----------------|-------------------------|
| 1Ghz+     | 6.8M                   | 123k            | 6.2M                    |

## Structural Tests Used

- Simple transition tests: 13K with 70% fault coverage
- Complex transition tests: 12K with 78% fault coverage
- Path delay tests: top 2490 critical timing paths
  - ≻Latch-to-latch paths: 1463
  - Memory paths: 91
  - ≻Cycle-stealing paths: 231
  - ➢Misc. paths, like clock or pre-charge paths: 700

### Slide # 173 Wang@UCSB (for private use)

## Path Delay Test Coverage

| Path type      | type # of<br>paths co |     | # of Path tests | Test<br>efficiency |  |
|----------------|-----------------------|-----|-----------------|--------------------|--|
| Latch to latch | 1463                  | 60% | 878             | 96.7%              |  |
| Memory         | 91                    | 95% | 86              | 100%               |  |
| Cycle stealing | 231                   | 63% | 146             | 100%               |  |

Blide # 174 Wang@UCSB (for private u

# 172 Wang@UCSB (for private











## **Trend Analysis**

- Complex transition test provided the closest match to Fmax (on average) both at probe and at final.
- Simple path test was faster than Fmax > 19.44% faster during packaged test
   > 9.28% faster during probe test
- Complex path test (compared to Fmax) was > 3% faster during packaged test > 8% slower during probe test
- ABIST test frequencies were relatively lower (by 2%) at probe than at packaged test

## **Result Analysis**

- Possible explanation for the performance difference between the probe and package tests:
  - Wafer data collected from newer and faster parts relative to the ones used in the initial package test experiment
  - Electrical environment differences
  - Difference in cooling between wafer-probe and package tests.

## Potential Test Escapes

- We analyzed the limiting-speed paths of several die where the frequencies of structural tests were noticeably slower than that of Fmax
- In 88% of the complex transition test cases, the speed limiting paths were associated with complex memory transaction scenarios.
- That coincided with chips that passed functional tests but were failing in system tests associated with the same memory transactions. Investigation is ongoing.
- Analysis of fail data of other structural tests led to the identification of test-only paths.

Slide # 183 Wang@UCSB (for private use)

Slide # 181 Wang@UCSB (for private use)



Slide # 184 Wang@UCSB (for private use)

# 182 Wang@UCSB (for private use)



| Speed Binn                                                         | ing Re                  | sults                 |                |  |
|--------------------------------------------------------------------|-------------------------|-----------------------|----------------|--|
| Corresponding average<br>for each type of struct<br>off frequency. | ge freque<br>ctural tes | ency was<br>st as the | s used<br>cut- |  |
| Test type                                                          | Under                   | Over                  | GB             |  |
| Complex Transition                                                 | 4.4%                    | 6.6%                  | 2.2%           |  |
| Simple Transition                                                  | 3.2%                    | 6.1%                  | 2.2%           |  |
| ABIST                                                              | 3.9%                    | 5.4%                  | 2.2%           |  |
| Complex Path                                                       | 1.9%                    | 4.8%                  | 2.2%           |  |

5.8%

7.3%

6.4%

lide # 186 Wang@UCSB (for private use)

Simple Path

## **Guard Band Effects**

Cut-off Frequencies = Average functional & structural Under-G: additional parts which go into slow bin due to guard bands

|                    | Test type | Un   | der    | (    | Over   | 0   | ЗB | Un    | der-G   |   |
|--------------------|-----------|------|--------|------|--------|-----|----|-------|---------|---|
|                    | Func      | 0%   | 6      |      | 0%     | 3   | %  | 18    | 3.3%    |   |
|                    | Func      | 0%   | ,<br>0 |      | 0%     | 5   | %  | 32    | .6%     |   |
|                    | Test type |      | Unde   | er   | Over   |     | GE | 3     | Under-0 | 3 |
| Complex Transition |           | 4.4% |        | 6.6% |        | 2.2 | %  | 16.7% |         |   |
| Simple Transition  |           | 3.2% |        | 6.1% | ó      | 2.2 | %  | 20.4% |         |   |
| ABIST              |           | 3.9% | ,      | 5.4% | ,<br>0 | 2.2 | %  | 22.6% |         |   |
| Complex Path       |           | 1.9% | b      | 4.8% | ,      | 2.2 | %  | 17.0% |         |   |
| Simple Path        |           | 5.8% |        | 7.3% | 6      | 6.4 | %  | 36.9% |         |   |

Slide # 187 Wang@UCSB (for private use

 Summary

 • Correlation between functional frequency and structural tests frequencies are encouraging

 • Complex transition tests give the best correlation to the functional frequencies

 • Almost all the structural tests performed reasonably well in speed binning the parts

 • The results clearly demonstrate the importance of including structural delay path going through the memory arrays

 • The data also suggests that some test escapes can be screened by structural tests

Timing Correlation of Pre-silicon & Post-silicon Two Studies Break 5 minutes for questions 1. Correlating pre-silicon critical paths to postsilicon speed paths > How many pre-silicon paths to be tested in order Next, we will continue the topic on to cover the top 10 speed paths? other studies related to speed binning 2. Correlating structure testing frequency Tmax to functional testing frequency Fmax ≻ Which structurally-tested paths can be used for speed binning (deciding fast vs. slow)? lide # 189 # 190 Wang@UCSB (for private use)





de # 192 Wang@UCSB (for private use)







- 2. Issue of structural testing for speed binning
  - · For high performance designs, correlation between Tmax and Fmax is not high enough

| Struct. Test | Fmax Cor |
|--------------|----------|
| ABIST        | .87      |
| Smpl AC      | .81      |
| Cplx. AC     | .76      |
| Smpl Path    | .83      |
| Cplx Path    | .82      |

e # 196 Wang@UCSB (for private use)



# Properties of most correlated paths to Fmax

| Path# | Туре  | Block      | Ratio | Corr. |
|-------|-------|------------|-------|-------|
| 1174  | Cplx  | Α          | 1.61  | .90   |
| 1092  | Cplx  | Α          | 1.11  | .89   |
| 2161  | Cplx  | A 1.11 .89 | .89   |       |
| 3105  | latch | V          | 1.57  | .87   |
| 1817  | Cplx  | E          | 1.39  | .87   |

- Ratio = Avg. Speedup relative to Fmax
- Individual path correlation to Fmax is higher than applying whole path delay test set together.
  Most correlated path is 1.6x faster than Fmax
- Less correlated, but slower paths mask these higher correlated paths out

# 198 Wang@UCSB (for private use)



| Test      | Acc.  | Under | Over  | GB    |  |
|-----------|-------|-------|-------|-------|--|
| ABIST     | 86.9% | 8.6%  | 4.5%  | 1.9%  |  |
| Smpl AC   | 81.8% | 13.2% | 7%    | 2.3%  |  |
| Cplx AC   | 77.4% | 11.1% | 11.5% | 2.86% |  |
| Smpl Path | 79.1% | 13.9% | 7%    | 2.86% |  |
| Cplx Path | 82.2% | 9.5%  | 8.3%  | 2.5%  |  |
|           |       |       |       |       |  |
| Path #    | Acc.  | Under | Over  | GB    |  |
| 1174      | 91%   | 4.5%  | 4.5%  | 4.34% |  |
| 1092      | 89.7% | 4.1%  | 6.2%  | 5.2%  |  |
| 2161      | 89.3% | 4.9%  | 5.8%  | 4.9%  |  |
| 3105      | 86.9% | 8.2%  | 4.9%  | 3%    |  |
| 0100      |       |       |       |       |  |

Summary

- Post-silicon path delay tests can provide a wealth of information
  - >Path ranking correlation metrics
  - ≻Structural Speed-Binning

Thank you

Reference: http://mtv.ece.ucsb.edu/TTEP/

Slide # 201 Wang@UCSB (for private use)

## Acknowledgement

Many people have directly and indirectly helped the making of this tutorial. Special thank to Noel Menezes at Intel SCL and Sani Nassif at IBM Austin Research for their invaluable insights regarding many current issues in timing analysis, DSM timing effects, variation modeling and process characterization. Benjamin Lee and Leonard Lee from UCSB helped to survey many papers in the areas of timing modeling, crosstalk, and power noise. Benjamin also did the experiments on comparing the two SSTA methods. Special thank to Nagib Hakim at Intel Santa Clara for his insight on binning with respect to inter-die and intra-die variations. Special thank to Jing Zeng at Freescale and Benjamin Lee for their work on the speed binning experiments. Thank to T M Mak at Intel Santa Clara and Praveen Parvarthala at Intel AZ for their valuable inputs on functional speed binning methodology. Thank to Wei-Ping Shi at Texas A&M University for his help on understanding RC extraction issues.

And thank to many others who have directly or indirectly helped ...

Slide # 203 Wang@UCSB (for private use)