















### For example



# Things that affect timing

- Factors
	- ¾Device characteristics (Vth, Ids, etc.)
	- ¾Interconnect characteristics (RCL)
	- $\triangleright$  Coupling
	- ¾IR drop, power noise
	- $\triangleright$  Temperature
	- ¾Clock skew
	- ¾Modeling errors
- Variability and uncertainties
	- ¾Process variations (including measurement uncertainties)
	- ¾Environmental variations (temperature map, power map, etc.)
- ¾Pattern variations (ex. functional vs. structural)

 $\pm$  # 12 Wang@UCSB (for private use)

### Commonly-asked questions

- What cause a speed path to be missed by timing analysis tools?  $\triangleright$  What do I miss after pre-silicon analysis?
	- ¾ What are binning based on?
- How variations should be modeled in order to support timing analysis? ¾ How to build an effective statistical timing model?
- Where do the variation models come from?  $\triangleright$  What models can a fab provide?
- What are the important variations to be considered in analyzing timing?  $\triangleright$  Which is the dominating factor? Leff or Vth variation?

#### Slide # 13 Wang@UCSB (for private use)

#### Understanding chip variability

- Result of interactions among
	- ¾Process variability and uncertainties
	- ¾Design variability

Slide # 14 Wang@UCSB (for private use)

- ¾Modeling uncertainties
- ¾Variability in assumptions employed in tools for fast approximation
- ¾Variability and uncertainties in test and measurements
- To understand chip variability, we need to decompose the sources of variations and minimize their interactions
	- ¾To analyze and control variations separately

General problem formulations in statistical domain • Through out this tutorial, we will learn how to statistically analyze variability • We will often face one of the following 4

- categories of analysis ¾Statistical characterization
- ¾Statistical modeling
- ¾Worst-case corner analysis
- ¾Statistical analysis

ide # 15 Wang@UCSB (for private use)













Topics to cover



 $\det$  # 21 Wang@UCSB (for private use)

Break 5 minutes for questions

We begin with discussion on modeling of process variations





 $e$  # 24 Wang@UCSB (for private use)

Slide # 22

### For example

- Measure the thickness of the transistor gate dielectric at 100nm technology generation  $\triangleright$  Suppose the gate is 2nm thick
	- $\triangleright$  Process tolerance is  $\pm$  5% = 0.1nm
- $P/T = 10\% = (6 \sigma_{\text{measurement}}) / 0.1 \text{nm}$  $\gtrsim \sigma_{\text{measurement}} = 0.0017$ nm ¾An atomic step on silicon is about 0.15nm!
- Direct measurement on some process parameters can be difficult

 $\det$  # 25 Wang@UCSB (for private use)

# Model-based measurement

- Each measurement method is based on a model that relates observed signals to values of variables being measured
- Model-based measurement alleviates the high precision requirement for measuring some process parameters directly
- Depending on the model and the algorithm used to extract values from the observed signals, various amounts of error can be introduced

26 Wang@UCSB (for private use









# Channel length

 $de # 31 Wang@UCSB$  (for private use)

- L = L<sub>m</sub> ∆L<br>
≻L<sub>m</sub> : drawn channel length
	- ¾∆L : difference between drawn and actual
	- ¾The objective is to measure ∆L
- Measuring ∆L is more complicated
	- $\triangleright$  Use channel resistance method (R<sub>m</sub>), by
	- $\triangleright$  Calculating A = 1 / ( $\mu$  Cox W ( $V_{gs} V_{th}$ ))
	- $\triangleright$  At various V<sub>gs</sub> values
	- $\triangleright$  Intersect different lines in R<sub>m</sub> Vs. L<sub>m</sub> plot ¾Use intersected point to obtain ∆L
	-
- See Handbook of Silicon Semiconductor Metrology

# To summarize …

- MOSFET device model  $\triangleright$  I<sub>ds</sub> = 0 for V<sub>as</sub> – V<sub>th</sub> < 0  $\varphi$ I<sub>ds</sub> = (µ C<sub>ox</sub> W / (L -  $\Delta$ L) ) (V<sub>gs</sub> – V<sub>th</sub> – 0.5 V<sub>ds</sub>) V<sub>ds</sub>  $> I<sub>ds</sub> = (μ C<sub>ox</sub> W / 2(L - ΔL)) (V<sub>gs</sub> - V<sub>th</sub>)<sup>2</sup>$  (saturation region)
- Parameter space P =  $\{W, \Delta L, V_{th}, \mu, C_{ox}\}$  $\triangleright$  They may not be directly measurable ¾They are to be inferred from measurements of  $I_{ds}$ ,  $V_{qs}$ , and  $V_{ds}$



- $\check{\mathbf{v}}$  i(v) : the current-voltage measurements
- $\checkmark$  This is a typical non-linear least-square analysis
- ¾Parameters in P may NOT be independent
- $\checkmark$  Previously, we assume that they are independent
- For complex M, local minimization is done for each selected subset of parameters in P
- Derived P values are subject to error  $\varepsilon_p$

ide # 33 Wang@UCSB (for private use)



- If we treat each variable in P as a random variable, we measure their means and sigmas ¾These random variables can be correlated!
	- ¾This increases the difficulty of measurement
- One simple approach is to measure many devices individually
	- $\triangleright$  Because  $\varepsilon_{\rm p}$  is unknown, the statistics of P can become questionable
	- ¾Moreover, a complex model such as BSIM-3 have hundreds of parameters, many of which are hard to extract by measuring capacitance, current, voltage.
	- ¾These increase the difficulty of variation extraction

 $\det \theta$  34 Wang@UCSB (for private use)

 $\text{Le} \# 32 \text{ Wang@UCSB}$  (for private use



Slide # 35 Wang@UCSB (for private use)

- (Boning & Nassif 99) • Observe that parameters are highly correlated • The error  $\varepsilon_{\rm p}$  is not independent from the parameters ¾The parameters are eventually used to characterize the performance of a device
	- The error  $\varepsilon_p$  will be propagated into error in this performance characterization
- The fact that error  $\varepsilon_{\rm p}$  is not independent from the parameter increase the error in performance characterization

 $\det$  # 36 Wang@UCSB (for private use)



• Simple concept

 $\epsilon$  # 37 Wang@UCSB (for private use

- $\triangleright$  Two Normal variation : A = N(μ<sub>1</sub>,σ<sub>1</sub>), B = N(μ<sub>2</sub>,σ<sub>2</sub>)  $\geq$  Let  $f = A + B$
- $\triangleright \sigma(f) = (\sigma_1^2 + \sigma_2^2)^{1/2}$  if A, B are totally independent  $\Rightarrow \sigma(f) = \sigma_1 + \sigma_2$  if A,B is 100% correlated
- If  $\varepsilon_{\text{p}}$  is independent of parameters, we have <br>
> Performance **z** = f (P +  $\varepsilon_{\text{p}}$ )
- If  $\varepsilon_{\rm p}$  is not independent of parameters
	- ► Performance **z'** = f (P + αP + ε<sub>random</sub>)<br>
	→ Where ε<sub>random</sub> is independent of the parameters<br>
	► Then, we should have variance( **z'** ) > variance( **z**)
- This concept is general  $\triangleright$  We will come back to this again in the section of statistical timing analysis









# Process variations (Boning & Nassif 99)

- Process variations can be classified as
	- ¾Variation in geometry
	- $\triangleright$  Variation in material
	- ¾Variation in electrical property
- It can also be classified as
	- ¾Device variation
	- $\triangleright$  Interconnect variation

 $\det$  # 42 Wang@UCSB (for private use)





#### Device/geometry (Boning & Nassif 99) • Film thickness variation  $\geq$  Gate oxide thickness is critical  $\triangleright$  Usually well-controlled • Lateral dimension (length, width)  $\triangleright$  Typically due to photolithography proximity effects  $\checkmark$  Systematic pattern dependent<br>  $\checkmark$  to Mask, len, or photo system deviations  $\checkmark$  Not layout dependent<br>  $\checkmark$  to plasma etch dependencies  $\checkmark$  Can have wafer scale dependency, or depend on layout density and aspect ratio (L/W) • MOSFETs are sensitive to  $\ge$  channel length L, t<sub>ox</sub>, and some W  $\triangleright$  L variation has received attention due to its impact directly on output current characteristics (discussed later) Device/material (Boning & Nassif 99) • Doping variation ¾Due to does, energy, angle, or other ion implant dependencies ¾Affect junction depth and dopant profiles  $\triangleright$  Hence, affect effective channel length L<sub>eff</sub>  $\triangleright$  Also affect V<sub>th</sub> • Variation in deposition and anneal processes ¾Suffer substantial wafer-to-wafer and with-in wafer variations ¾May result in large device-to-device random variation ¾Impact contact and line resistance

#46 Wang@UCSB (for private use)

#### Device/Electrical (Boning & Nassif 99)

• Vth variation

 $e$  # 45 Wang@UCSB (for private use

- ¾Often due to oxide thickness, geometry variations, and other sources
- ¾It is characterized separately because of its importance
- Discrete dopant variation
	- ¾Random placement and concentration fluctuation due to
	- discrete location of dopant atoms in the channel and S/D  $\triangleright$  Study shows that it is not a severe problem for logic but may affect SRAM containing large number of devices that<br>should be well matched
	- ¾Also cause Vth variation
- Leakage current
	- ¾Sub-threshold leakage currents can vary significantly

 $\det$  # 47 Wang@UCSB (for private use

#### Interconnect/geometry (Boning & Nassif 99) • Line width and space  $\triangleright$  Mainly photolithography and etch dependencies ¾ Directly induce line resistance variation  $\triangleright$  Also cause capacitance variation within layer and across layers  $\triangleright$  Affect signal integrity analysis • Metal thickness ¾ Is usually well controlled in conventional process ¾ Can have wafer-to-wafer and within-wafer variations ¾ Copper polishing process can result in thickness loss of 10-20% depending on the patterns • Dielectric thickness  $\geq$  Can have substantial variations

- ¾ At wafer level, typically on the order of 5% ¾ Within-die can have pattern dependent variation due to such as CMP
- 
- Contact and via size ¾ Affected by etch process and systematic layer thickness variation ¾ Directly impact contact and Via resistance

# 48 Wang@UCSB (for private use



- Contact and via resistance
	- ¾Sensitive to etch and clean processes ¾Substantial wafer-to-wafer variation
- Metal resistivity ¾Usually well controlled and vary wafer to wafer
- Dielectric constant

 $\det$  # 49 Wang@UCSB (for private use)

- ¾Depend on the deposition process
- $\triangleright$  Is usually well controlled
- ¾Pattern dependent variation may be important for low-K dielectrics in interconnect



- Variations have been there for a long time
	- ¾People have studied process variations for a long time
	- ¾Historically, analog designs are much more sensitive to process variations than logic
		- $\checkmark$  Eg. Mismatch issue in two devices
		- 9See *Statistical modeling of device mismatch*, Michael, C.; Ismail, M.; Solid-State Circuits, IEEE Journal of, Volume: 27 , Issue: 2 , Feb. 1992
- The studies of process variations
	- ¾Primarily for the control of process quality
	- ¾Diagnose unusual equipment disturbances
	- ¾Diagnose unusual environmental fluctuations

 $50$  Wang@UCSB (for private use)





















 $ide # 61 Wang@UCSB$  (for pri



















# 70 Wang@UCSB (for private use





# Study : Gate CD variability on delay

- See M. Orshansky et. al. 2002 TCAD, 2004 TSM
- Highlights
	- $\geqslant$  Study Lgate variability in 0.18 $\mu$ m technology<br> $\geqslant$  Development of test chips
	- $\checkmark$  Consider density and orientation
	- ¾ Consider impact on clock tree, cell delay, path delay, and circuit delay
	- ¾ Consider sampling resolution, sampling location, as well as optical proximity correction
- Conclude

de # 73 Wang@UCSB (for private use

- $\geq$  CD variability is pattern dependent (density and orientation)
- ¾ Intra-die CD variation is largely systematic ¾ Cell delays vary as much as 17% among different locations
- 
- ¾ Clock skew vary as much as 8% of clock cycle (74ps)
- ¾ Circuit delay degrades as much as 20% ¾ Mask level spatial gate OPC should be employed
- 
- $\geq$  OPC that takes spatial gate information into account performs better than traditional OPC approach

### Study : variability on clock skew

- Source: [IEDM'98] S.R.Nassif. Within-Chip Variability Analysis
- Highlights ¾ Based on 0.25µm technology
	-
	- $\geq$  Study intra-die variability<br>  $\geq$  Channel length variability ±0.035 µm
	-
	- ¾ Wire width variability ±0.25 µm ¾ Wire widths for worst-case skew 48.9 ps ¾ Channel lengths for worst-case skew 171.5 ps





*Channel lengths Wire widths*

 $# 74$  Wang@UCSB (for  $p$ 

#### Study : Pattern-dependent variation on delay

- Source : V. Mehrotra et. al. DAC 2000, 172-175
- Highlights
	- $\geq$  Study delay variation in both Aluminum and copper (0.60  $\mu$ m metal and ILD thickness)
	- $>$  Study clock skew in 0.25  $\mu$ m technology
	- ¾ Study pattern dependent effects such as density to ILD thickness, dishing and erosion in CMP
- Conclude
	- ¾ Models for systematic variations are required for accurate simulation of circuit performance
	- $\triangleright$  Interconnect CMP variation can increase bus delay by more than 30% even in copper technology
	- ¾ Clock skew is not strongly impacted by interconnect CMP variation
	- ¾ Variation in device gate length can significantly alter path delays
	- with an increase in maximum skew of about 50ps

de # 75 Wang@UCSB (for private use)

#### Other studies

- Variation in Vth
- $\triangleright$  M. Niewczas, IEEE ICMTS, 1997<br>  $\checkmark$  Focus on test structures to study Vth
- 9 Focus on test structures to study Vth ¾ T. Tanaka et. al. IEDM 2000
- $\checkmark$  Focus on variation in dopant profile
- Variation in gate line edge roughness
	- ¾ S. Xiong, et. al. IEEE Tran. Semi Manu. 2004 ¾ A. Asenov, et. al. IEEE Tran. Elec. Device, 2003
	- $\triangleright$  Roughness is not an issue today
	- ¾ May affect leakage current due to short channel effect as technology scales
- Circuit sensitivity to interconnect variation
- ¾ Z. Lin et. al. IEEE Tran. On Semi Manu. 1998
- $\triangleright$  Interconnect is hard to characterize and model
- → Interconnect is hard to enaracted the and in
- Sub-wavelength lithography
	- $\geq$  A. Kahng and YC Pati, DAC 1999
- ¾ Conclude the importance of OPC and need for more effective OPC algorithms • And many others …

Myth

Myth

*Design & optimization*

*RC extraction analysis Delay calculation*

*Noise*

*Power grid*

*clock*

*Test delivery TCAD*

*Timing analysis*

*SPICE*

*Observed chip variability*

*Test generation*

*Tester &*

76 Wang@UCSB (for private use















# Timing Macro-modeling

- Objective: Creating reduced models at transistor level, gate level, or cell level to support fast timing simulation
	- ¾Treat SPICE simulation as golden
	- ¾At transistor level, support path-based timing analysis
	- ¾At gate/cell level, support full-chip analysis

# Timing Macro-modeling



 $\triangleright$  usually  $>$ 100x faster than SPICE

Slide # 86 Wang@UCSB (for private use)

## Brief History – cell modeling

•  $1\mu$ : delay = f (C)

 $ide # 85$  Wang@UCSB (for private use

- $\geq$  Capacitance load is the dominating factor to decide delay ¾ Lumped capacitance model (from other gates)
- $\triangleright$  Ignore slew ¾ Device dominate delay, ignore interconnect R
- $1\mu$  .5 $\mu$ : delay = f (C, input slew, lumped RC) • Slew considered
	- Lumped RC model at gate output
- $\lt$  .5 $\mu$  delay = f (C, input slew, RC) + g (distributed RC)  $\triangleright$  Interconnect delay addressed with distributed RC
	- ¾ Parasitic (RC) extraction is needed
	- $\triangleright$  Interconnect loading on gates studied

de # 87 Wang@UCSB (for private use)

#### Two basic approaches • K factor model ¾Similar to tabular approach  $\triangleright$  For each load and slew, find delay value ¾Lumped output capacitance cannot model load accurately

*Re*

٨W

*Cl*

- 9 Modeling the "Effective Capacitance" for RC Interconnect of CMOS Gates Qian, Pullela, Pillage, TCAD Dec 1994 (>100 citations)
- 
- 9 Map complex RC load into effective capacitance 9 Later, R. Arunachalam, F. Dartu, L. Pileggi, ICCD '97 develop method to map RCL load into effective capacitance
- Switch resistor model
	- ¾Empirically fit the resistor value for each load
	- ◆ Store resistor values, rather than delay values<br>▶ More accurately when load is not purely capacitance

 $e$  # 88 Wang@UCSB (for private use

#### Table driven approach

• Advantages:

¾Much faster STA than using complex equations

- Disadvantages:
	- ¾Require large amount of memory  $\checkmark$ Usually (slew vs. load) is stored from a 5x5 up to a 9x9 table
	- ¾Temperature/Voltage 9The method of applying a degrading factor ∆ is inaccurate

Slide # 89 Wang@UCSB (for private use)

#### Various enhancements

- F. Dartu, N. Menezes, J. Qian and L. Pileggi DAC '94 ¾ Replace switch with piecewise linear voltage source (in a switch resistor model)
	- ¾ Empirical gate delay model proposed for complex RC Loading (impedance)
	- $\geq$  Address 2<sup>nd</sup>-order effect

 $\sqrt{a}$ UCSB (for private)

- Hayes and White 1997, 10th IEEE ASIC conference ¾ Demonstrates that applying Voltage/Temp multiplicative degrading factor is inaccurate
	- $\ge$  For example, we characterize cells at 1v
	-
	- ¾ If 1.1v, we just multiply by a ∆ (before 97) ¾ Proposes additive correction factor: If 1.1v, we add a ∆
	- A Korshak, JC Lee 2001 ISQED
- $\triangleright$  Use a current-resistor-capacitance model to match I, R, C to known timing data
- Shao et al, 2003, ISPD ¾ Second-order circuit model - not dependent on load! ¾ Gate can be independently pre-characterized

### Low-level macro-modeling

- Fully mathematical analysis of gate-structure
	- High complexity Based on actual device equations
- Table driven/Empirical equation
- $\triangleright$  Similar to STA cell modeling
- Extensive pre-simulation required
- Divide switching behavior into several regions model different regions with different equations
- Map CMOS gates to circuit primitives
- Usually map to inverters
- Macro-modeling other structures with the primitives



**GLICSB** (for private use)

Slide # 94

Interconnect RC (capacitance extraction)

#### • 2D extraction

 $de # 91$  Wang@UCSB (for private use)

- > Consider area overlap between 2 layers (area C), side wall in the same layer<br>(side C), and side wall to the adjacent layers (fringing C)<br>≻ The relationships relating geometry to C are characterized by the fab
- 
- ¾ Commonly used approach (can be implemented as a rule based tool)
- $\triangleright$  Practical for worst-case STA, even though it is not accurate
- 2.5D extraction
	- $\triangleright$  Consider more layers and within a layer, the distance between wires
	- ¾ Pre-characterize unit region based on possible patterns and develop library
	- ¾ Commonly used for high-performance designs
- 3D extraction
	- $\triangleright$  Most accurate but expensive
	- ¾ Boundary element method (BME), finite element method, Monte Carlo method ¾ Often applied at package or in characterization of patterns in 2.5D method
- Not many people worry about RC extraction with variations today  $\triangleright$  Further studies are required in this area

 $\#$  93 Wang@UCSB (for private use



Next, we will switch topic to Statistical timing analysis



 $\det \theta$  95 Wang@UCSB (for private)













- "Statistical Timing Analysis Considering Spatial Correlations Using a Single PERT-like Traversal", Chang at. al. ¾ Presented at ICCAD'03 also
- "First-order Incremental Block-Based Statistical Timing Analysis", Visweswariah et. al. ¾ Won Best Paper Award DAC '04
- Message at DAC05:  $\triangleright$  Statistical timing analysis is a hot topic!





- Three key concepts
	- ¾Delays are represented as CDF, rather than PDF ¾CDF can be characterized as piece-wise linear
	- $\checkmark$  3 points, 5 points, 7 points
	- ¾Reconvergent fanouts are handled by  $\checkmark$  Delay subtraction
	- $\checkmark$  Mean and variance moment matching
- Three key conclusions
	- ¾CDF is easier to handle, more efficient
	- $\checkmark$  We have verified this claim independently ¾Handling re-convergent fanouts is not a critical issue
	- $\checkmark$  We also have verified this claim independently
	- $\triangleright$  The accuracies of using 3, 5, and 7 points are similar, but the run-times are proportionally longer

 $\pm$  # 102 Wang@UCSB (for private use)













### Hard case

- They propose heuristic to handle more complicate re-convergence situations
	- ¾Keep a dependency list for every node (re-convergent sources)
	- $\triangleright$  Keep reducing the list to 1 node so that the simple case formulation can be applied (the mean/variance matching)
	- ¾More like the super-gate idea

 $\det$  # 109 Wang@UCSB (for private use

### Performance impact



 $\#$  110 Wang@UCSB (for p

Source: ICCAD03 paper



# $e$ # 112 Wang@UCSB (for private use) Accuracy in general • Handling re-convergent fan-outs seems to be unnecessary if our focus is at the worst-case bound • Without handling re-convergent fan-outs, we can save from 10 to 33% of run times Source: ICCAD03 paper

IBM: Parameterized Block-Based SSTA (DAC04)

- Path-based analysis
	- ¾Select a set of paths first and analyze those paths only (guard-band)
	- ¾The problem is simpler (nXn correlation matrix)

• Block-based analysis

- ¾Like breadth-first search (level-by-level analysis)
- $\triangleright$  Analyze the timing graph
- > Unlike the EPA approach, they define a *canonical delay* form and propagate this form through the circuit
	- 9In EPA, it propagates *Probabilistic Events*

Slide # 113 Wang@UCSB (for private use)





# Computational overhead

- Run time overhead ¾about 20% on batch operation  $\blacktriangleright$  about 50% on the actual arrival time propagation
- Memory overhead ¾about 100% depending on the number of sources of variation and complexity of the models
- Capacity ¾able to analyze 2M+ gate ASIC chips on 64-bit machines

 $# 116$  Wang@UCSB (for private use)

#### Comparison experiments

- In order to compare the two approaches
	- ¾We implemented (to best of our knowledge) PWL and canonical methods for SSTA
	- ¾We also implemented just STA
	- $\triangleright$  Apply with our 0.25 $\mu$ m cell library

ide # 117 Wang@UCSB (for private use)

Slide # 119 Wang@UCSB (for private use)

- $\triangleright$  Comparison at 3 $\sigma$  worst-case delay point
- ¾Comparison at mean delay point
- ¾Use Monte-Carlo analysis output as golden answer
- We artificially make pin-to-pin variations from  $\pm k\%$  to  $\pm 5k\%$ 
	- $\triangleright$  To assess the situations when variations increase

# Comparison

#### 3-sigma error vs Monte-Carlo



5x variance



 $# 118$  Wang@UCSB (for private use)



# Run-time comparison

• For the two larger circuits (seconds):



 $e$ # 120 Wang@UCSB (for private use)

### Summary

- PWL pros:
	- $\triangleright$  Very fast
	- Can support arbitrary distribution (non-Gaussian)  $\ge$  Can support arbitr<br> $\ge$  Variable accuracy
- PWL cons
	- ⊁ Correlations cause a lot of difficulty spatial correlations may be hard to<br>model and handle<br>≻ Mean delay calculation may be inaccurate
	-
- Canonical pros:
	- ¾ Reasonably fast ¾ Accurate

 $\det$  # 121 Wang@UCSB (for private use)

- $\triangleright$  Naturally handles all sorts of correlations well (if model is available)
- Canonical cons
	- $\geq$  Can be slow due to correlation handling
	- $\triangleright$  Assumes Gaussian distributions

# Some SSTA works at DAC 05

- Hongliang Chang, et. al.
	- ¾Canonical representation for non-linear, non-Gaussian parameters
- Yaping Zhan, et. al. ¾Correlation-aware, non-Gaussian distributions
- Lizheng Zhang, et. al. ¾Correlation-preserved, non-Gaussian distribution with Quadratic timing model
- Aseem Agarwal, et. al.
- ¾Statistical gate sizing with SSTA
- Vishal Khandelwal, et. al.
	- ¾Taylor-expansion polynomial-representation based SSTA

 $\pm$  122 Wang@UCSB (for private use)















- The simplified SSTA was applied to (in the DAC 05 paper) ¾ A large microprocessor block (> 100K cells) ¾ Based on 90nm technology
	- ¾ Analyze 492 most critical paths
- Error is computing standard deviation of the margin is on average only 0.19% of path delay
- Only a few paths show up as the most critical paths on 600 samples
- Ordering among paths, decided by a fixed-value STA, does not alter much by either random variations or systematic variation ¾ Random variations die out

 $\triangleright$  Systematic variations make paths within a block track each other well

130 Wang@UCSB (for private







- Target on stages after Static Timing Analysis, before tape-out
- What the tool does: Given a 2-timeframe pattern, estimate its delay distribution as **(mean,** σ**)** based on given a timing model ¾ Benjamin Lee et. al. VTS05, ITC05
- Among many challenges, one difficulty lies in the fact that a pattern may sensitize different sets of paths on different dies ¾ **Hazards** may be present on one die but not another ¾ Overall delay distribution becomes multi-modal
- Let's look at the Monte Carlo simulation results …









 $\det$  # 133 Wang@UCSB (for private use)

# Run times of Pattern-based STA (seconds)



PB-STA is only **2-6** times slower than fixed-delay



 $\pm$  # 136 Wang@UCSB (for private use)























# General approach - filtering

- Because MIS may not occur often, we usually take a filtering approach to rule out gates or cells that MIS are impossible to happen
	- $\triangleright$  For the remaining gates and cells, we assume the worst
- Filtering methods
	- $\triangleright$  Filtering based on timing windows from STA
		- $\checkmark$  If time windows of two signals do not overlap at all, we say that MIS cannot happen for these two signals  $\checkmark$  We need to pursue an iterative algorithm until STA results converge, because if
		- timing windows do overlap, we need to change the gate's output delay and propagate the change to all downstream gates whose delays are affected
	- ¾Filtering based on logic constraints  $\checkmark$  This is a typical ATPG problem
		-
- Adding statistical process variations in the analysis ¾See Agarwal, A.; Dartu, F.; Blaauw, D.; DAC 04, pages:658 - 663

 $\det$  # 149 Wang@UCSB (for private)







# Other models

- T. Sakurai TED 1993
- $\triangleright$  Derives closed form equations to model the waveform of an RC line
- J. Qian, S. Pullela, L. Pillage TCAD 1994
	- ¾ Derive new model for effective capacitance, because others have ±10% error, and optimism is generally unacceptable
	- $>$  Introduce  $\pi$ -model to separate the capacitive element into 2 elements, one before and one after the resistor
- H. Kawaguchi, T. Sakurai ASP-DAC 1998 ¾ n-line coupling capacitance equations without victim and aggressor relationship
- A. Kahng, S. Muddu, and D. Vidhani ASIC/SOC 1999  $\triangleright$  Extend π-model by separating the resistive element into 2 elements, one before the π, and one in the π
	- ¾ Done to reduce the over pessimism and over optimism of SF







ide # 153 Wang@UCSB (for private use)





- abstracted to worst-case bounds of voltages
- $\triangleright$  So the idea is (1) extract power map (2) STA with the map

 $# 158$  Wang@UCSB (for private use

#### Power grid analysis

- Model power-grid as a RLC network
- ¾Circuit abstracted into time-varying piecewise-linear current sources
	- $\geq$  Simulate circuit with the ideal power grid to obtain current profile
- Modified nodal Analysis (MNA) used to solve for power grid node voltages
- Converts the problem into solving a sparse, symmetric-postive-defintite linear system
	- $\div$  G x(t) + C ∂x(t)/ ∂t = b(t)
	- $\triangleright$  G: conductance matrix
	- $\triangleright$  C: admittance matrix due to C,L
	- $\triangleright$  x(t): time-varying vector of voltages at nodes
- $\triangleright$  b(t): time-varying current sources

 $\det$  # 159 Wang@UCSB (for private use)

## IR drop and dI/dt noise

- IR drop
	- <sup>¾</sup>Usually refers to decrease/increase in power/ground rail voltage due to resistance of devices between rail and a node of interest
	- ¾Common practice is to budget a max-per-rail static voltage drop tolerable
	- <sup>¾</sup>Static IR-drop can be calculated from extracted parasitic / average power consumption (DC analysis)
	- ¾Dynamic-IR drop- require vector based analysis

#### • dI/dt noise

- $\triangleright$  Inductive dI/dt noise used to occur mostly on package
- ¾On-chip interconnect's impedance is no longer ignorable
- due to higher frequencies
- $\triangleright$  Change in current (dI)<br>  $\checkmark$  Simultaneous switching big current swing

 $160$  Wang@UCSB (for private use)

#### Various studies

- H Kriplani, FN Najm, IN Hajj, IEEE TCAD '95 ¾ Linear time algorithm: finds upper-bound estimate of current wave-forms at all contact points
- HH Chen and David Ling DAC '97 (cited by 111) ¾ Describes models used for power bus / switching circuits/decoupling capacitors
- H.H. Chen and J.S. Neely, IEEE Transactions on Components, Packaging and Manufacturing Technology, Aug 1998
	-
	- ¾ Analyze IR drop and inductive dI/dt noise ¾ Notes: worst-case dI noise and worst-case IR drop do not occur at same time
	- ¾ Power-supply distribution model
- $\triangleright$  Switching-circuit model

Slide # 161 Wang@UCSB (for private use)

#### Various studies

- Yi-Min Jiang, K-T Cheng, An-Chang Deng, ISLPED 98 ¾ Genetic-algorithm approach to generate patterns ¾ Estimate IR drop and dI noise based on charge/discharge current cell library • Yi-min Jiang, K-T Cheng, DAC '99  $\triangleright$  Statistical model derived by simulating characterization patterns 9 Use GA search to find patterns (last paper)  $\checkmark$  Find average voltage for each cell for each pattern - average voltages form distribution • A. Dharchoudhury, et al, DAC 98 (based on PowerPC)  $\triangleright$  Describes methodology for power supply design/analysis  $\triangleright$  IR-drop analysis is discussed  $\checkmark$  Transistor level is infeasible  $\checkmark$  OTS blocks (standard cells) macro-modeled as current source 9 Each block has an IR-drop budget (voltage drop ) 9 If budget violated, power grid that supplies block is augmented • P. Larsson, IEEE Custom Int. Circuits Conf 1999  $\triangleright$  Describes noise suppression techniques
- 162 Wang@UCSB (for pr  $\triangleright$  Makes some predictions for the future based on process parameters

# Various studies

- Sani Nassif, Joseph Kozhaya, DAC 2000 (fast simulation) ¾ PDE-like multi-grid method for simulation of power grid ( computation wire, not macro-modeling)
	- ¾ Circuit abstracted as time-varying current sources ¾ Grid-reduction technique
- M.Zhao, et al DAC 2000 (Hierarchical analysis)
	- ¾ Difficulties in power network analysis:
	- ¾ Network is huge, typically 1-100 million nodes  $\checkmark$  Sparse linear system solution methods: conjugate gradient
	- ¾ Network is nonlinear due to switching devices
	- 9Solution: simulate individual blocks without power network, then simulate power network using time-variant current profiles  $\triangleright$  Speed-up proposed:
	- $\checkmark$  Macro-model local power grids
- J. Saxena, K. Butler, V. Jayaram, et al, ITC 2003
	- ¾ Structural-tests have a lot of switching activity
	- $\checkmark$  Worst-case sceario for IR-drop
	- ¾ Analyzed chips increased switching activity with structural test induced IR drop caused failure
- $de # 163$  Wang@UCSB (for private use)

#### Various studies

- D. Kouroussis, Rubil Ahmadi, Farid Najm, DAC 2004 <sup>¾</sup>Abstract circuit in terms of current constraints (peak current constraint)
	- ¾Use a upper/lower bound of supply variation
	- $\triangleright$  Extract critical paths
	- $\triangleright$  Verify that voltage of critical paths are within bounds
	- ¾Solve for max. delay of paths given current constraints

#### • Jing Wang , et al. VTS '05

- ¾Power region model
	- 9Assume supply voltage within a region is uniform 9On-chip Ldi/dt drop is neglected
	-

#### ¾Switching Model

- Triangle/Trapezoid current model
- $\checkmark$  Gates see constant average Vdd

164 Wang@UCSB (for pri



#### Study: correlating structure test to functional test

- Motivations
	- ¾Examine the correlation between the frequencies measured using various structural testing and functional testing
	- ¾Investigate structural testing as an option for speed binning
		- $\checkmark$  Reduce tester cost for speed binning
	- ¾Reduce the cost of testing delay defects

#### $\det$  # 167 Wang@UCSB (for private use)



 $# 168$  Wang@UCSB (for private use)

# Structural Testing

- Structural testing provides an attractive complementary/alternative solution
	- ¾Relaxed speed and accuracy requirements on the external pins
	- ¾Number of high performance tester channels are minimized
	- ¾Low cost testers can be used
	- ¾Easier debugging

 $\det$  # 169 Wang@UCSB (for private use

¾Can achieve high fault coverage

# Previous Work

- Earlier studies shown poor correlation due to the lack of coverage of paths around memories (Belete et al, ITC 2001)
- Cory et al, IEEE Design & Test, 9-10/2003, found a linear relationship between the frequencies of the functional and latch-to-latch path delay tests.
- We could not duplicate D&T 2003 result for high performance designs (>1 GHz).

 $\overline{a}$ UCSB (for pri

# Types of Structural Tests

- At-speed memory BIST test
- Transition tests:
	- ¾Simple transition tests: transition tests w/o going through memories.
	- ¾Complex transition tests: transition tests going through memories.
- Path delay tests:
	- ¾Simple path delay tests: latch to latch path delay tests.
	- ¾Complex path delay tests: path delay tests involving
	- memories or Cycle-stealing path

 $\det$  # 171 Wang@UCSB (for private use)

# Chip Used for Experimentation

• MPC7455 microprocessor executing to the PowerPCTM instruction set architecture



# Structural Tests Used

- Simple transition tests: 13K with 70% fault coverage
- Complex transition tests: 12K with 78% fault coverage
- Path delay tests: top 2490 critical timing paths
	- ¾Latch-to-latch paths: 1463
	- ¾Memory paths: 91
	- ¾Cycle-stealing paths: 231
	- ¾Misc. paths, like clock or pre-charge paths: 700



# Path Delay Test Coverage



 $\pm$  # 174 Wang@UCSB (for private use)

# 172 Wang@UCSB (for pri







 $# 178$  Wang@UCSB (for private use)





# Trend Analysis

- Complex transition test provided the closest match to Fmax (on average) both at probe and at final.
- Simple path test was faster than Fmax ¾19.44% faster during packaged test  $> 9.28\%$  faster during probe test
- Complex path test (compared to Fmax) was ¾3% faster during packaged test ¾8% slower during probe test
- ABIST test frequencies were relatively lower (by 2%) at probe than at packaged test

# Result Analysis

- Possible explanation for the performance difference between the probe and package tests:
	- ¾Wafer data collected from newer and faster parts relative to the ones used in the initial package test experiment
	- ¾Electrical environment differences
	- ¾Difference in cooling between wafer-probe and package tests.

# Potential Test Escapes

- $\triangleright$  We analyzed the limiting-speed paths of several die where the frequencies of structural tests were noticeably slower than that of Fmax
- $\geq$  In 88% of the complex transition test cases, the speed limiting paths were associated with complex memory transaction scenarios.
- $\triangleright$  That coincided with chips that passed functional tests but were failing in system tests associated with the same memory transactions. Investigation is ongoing.
- ¾Analysis of fail data of other structural tests led to the identification of test-only paths.

 $\det$  # 183 Wang@UCSB (for private use)

 $\det$  # 181 Wang@UCSB (for private use)



# 184 Wang@UCSB (for private use)

# 182 Wang@UCSB (for private use)



# Speed Binning Results Corresponding average frequency was used for each type of structural test as the cutoff frequency.



 $# 186$  Wang@UCSB (for private use)

# Guard Band Effects

Cut-off Frequencies = Average functional & structural Under-G: additional parts which go into slow bin due to guard bands



 $de # 187$  Wang@UCSB (for private use

Summary

- **Correlation between functional frequency and structural tests frequencies are encouraging**
- **Complex transition tests give the best correlation to the functional frequencies**
- **Almost all the structural tests performed reasonably well in speed binning the parts**
- **The results clearly demonstrate the importance of including structural delay path going through the memory arrays**
- **The data also suggests that some test escapes can be screened by structural tests**

 $88$  Wang@UCSB (for private

Slide # 189 Break 5 minutes for questions Next, we will continue the topic on other studies related to speed binning  $# 190$  Wang@UCSB (for private use) Timing Correlation of Pre-silicon & Post-silicon Two Studies 1. Correlating pre-silicon critical paths to postsilicon speed paths  $\triangleright$  How many pre-silicon paths to be tested in order to cover the top 10 speed paths? 2. Correlating structure testing frequency Tmax to functional testing frequency Fmax  $\triangleright$  Which structurally-tested paths can be used for speed binning (deciding fast vs. slow)?





 $\pm$  # 192 Wang@UCSB (for private use)







- 2. Issue of structural testing for speed binning
	- For high performance designs, correlation between Tmax and Fmax is not high enough



 $\pm$  # 196 Wang@UCSB (for private use)



# Properties of most correlated paths to Fmax



- Ratio = Avg. Speedup relative to Fmax
- Individual path correlation to Fmax is higher than applying whole path delay test set together.
- Most correlated path is 1.6x faster than Fmax
- Frost correlated, but slower paths mask these higher correlated paths out

 $\det$  # 198 Wang@UCSB (for private use)





200 Wang@UCSB (for private use

Slide # 202

**Summary** 

- Post-silicon path delay tests can provide a wealth of information
	- ¾Path ranking correlation metrics
	- ¾Structural Speed-Binning

Thank you

Reference: http://mtv.ece.ucsb.edu/TTEP/

 $\det$  # 201 Wang@UCSB (for private use)

# Acknowledgement

• Many people have directly and indirectly helped the making of this tutorial. Research for their invaluable insights regarding many current issues in timing analysis, DSM timing effects, variation modeling and process cha

• And thank to many others who have directly or indirectly helped …

Slide # 203 Wang@UCSB (for private use)