MTV/Publications/Tutorials

Data Mining and Learning

Data Mining and Learning for Test and Verification Applications

Data mining and learning can be divided into several fields of study:

Algorithms used in different areas are dramatically different. In order to understand each field, we need to pursue the study separately. See how to become familiar with various data mining and learning techniques.

(still under construction ...) Last update: 09/27/2006 02:02:37 PM

In our group, we focus on applying various data mining and learning techniques to test and verification related applications. For example, currently we have two projects funded by Semiconductor Research Corporation (SRC):

Task ID 1173.001 - Statistical Timing Simulation Tools and Methodologies for Speed Test and Performance Validation (see How can I begin to work on this research)
Task ID 1360.001 - Simulation Data Mining for Functional Test Pattern Generation

For 1360.001, this project just started on Dec 1, 2005. You can find the original proposal and related information here (username: src05, password: src051578).

This page was created to help people to understand what mining and learning techniques are available and what the potentials are for these techniques to be applied in design and manufacturing. We divide the techniques into four categories: Association rule analysis, Computational learning, Machine learning, and Statistical learning and discuss their applications separately. You can click on those links to the page of individual discussion and material.

Historical perspective of how I got to this point:
If you wonder how I started to learn the subject of data mining and learning and began the research of applying those techniques in test and verification, this is a brief description of the path I followed:

At the beginning (in 2002), I read the book "The Elements of Statistical Learning" by T. Hastie, R. Tibshirani, J. H. Friedman. Find this book at Amazon. I still consider this is the most fundamental book to begin the learning of the subject.
Then, I started to read the book "An Introduction to Support Vector Machines" by by Nello Cristianini, John Shawe-Taylor. Find this book at Amazon. SVM and kernel methods have been very popular in various applications such as pattern recognition and image processing.
In 2004 Design Automation Conference, I wrote a paper using SVM to analyze critical timing paths. I used this very nice SVM software Torch3 for the implementation. After the work, I realized that technique such as SVM is not enough for solving most of the problems raised in test applications. For most of the problems in test, having a good learning model M from data is not enough. Often, to do meaningful work, we also need to compute the inverse model M^-1(). Computing inverse SVM is difficult. Hence, at this point, I switch my focus to look for other techniques and applications.
Because we need to inverse a learned model, I began to think why bother using a sophisticated learning algorithm like SVM while we can use a simple learning algorithm like the basic iterative algorithm in Rosenblatt's Perceptron (see Chap 2, An Introduction to Support Vector Machine). This inspired me to devise a path learning algorithm and wrote a paper for ICCAD 2004.
After the two papers at DAC and ICCAD, I realized the limitations of the approach. In these two papers, what I did was to formulate the problem such that an established learning technique could be somehow applicable. I realized that this approach would lead me to only limited ground. In order to take this line of research into another level, I need to truly understand the essence of all learning techniques and became a researcher in data mining and learning as well. In continuing to look for inspiration, I found this excellent book "Information Theory, Inference, and Learning Algorithms" by David J. Mackay. Unlike the previous two books that focus on statistical learning, this book covers everything on the principles of learning, especially on the Bayesian perspective of learning.
In parallel, at the beginning of 2005, my student Charles and I wrote a paper on Learning simulation data for function test pattern generation. This paper received some attention from the verification community and was accepted to IEEE Transactions on Computers.
Built on the work in that paper, in 2005, Charles and I began a sequence of studies. Then, we realized that for that line of research, what we needed most was to understand the field of Association Rule Mining and Computational Learning. Therefore, in 2005, we focused our study on these two areas.
In 2005, another major study I did was to understand the field of manufacturing and process modeling. I spent the entire summer to study those subjects. At the end, I compiled a tutorial (username: ttep, password: ttep1578).
After the study, I realized what the most important questions to be answered to overcome the issues of process variations. Then, I began to formulate the new problems to be solved along that line of research, which now I call it "statistical testing."
At the end of 2005, we made significant progress in our research. We wrote two papers. The first paper concerns the learning of Boolean functions. This work is built on what we learned from the Computational learning theory and Association rule analysis. Ben wrote second paper (later appeared in DAC 06) that concerns the learning of spatial delay correlations due to process variations. This work applied Bayesian learning in the context of statistical timing analysis.
At this point, we realized that there are many problems in test and verification, which should be formulated as learning problems. We are still in the process of doing so. We believed that Data mining and statistical learning should be an essential part of research for the developments of future design and manufacturing methodologies and tools. In the new SRC project web page (username: src, password: src1578), you can find the directions we are going to.

How to become familiar with various data mining and learning techniques:

There is no short-cut to get familiar with a field. However, to learn the various algorithms in data mining and learning, it is important to run them by yourself. In many books or public domains, there are people publishing their codes to implement various algorithms. You need to download those codes and try them out. The best way to pursue the study is to

Read the important articles or book on the subject
Understand the key algorithms, how they work, not necessarily why they work
Search the Web for public code to implement those algorithms
Try out those algorithms with some data set to get some sense on why they work

For example, if you want to study Association rule analysis, obviously the Apriori algorithm is the place to start. You can find implementation of Apriori as a stand-alone code or as a Matlab code on a web site devoted to the Association rule mining research.

I have mentioned above that if you are interested in SVM, you should try out Torch.

If you want to try out general statistical learning algorithms, such as Principle Component Analysis or Independent Component Analysis, you can use the R software package. There are many algorithms implemented as the add-on packages to the R framework. You just need to search for them.

If you are interested in kernel machines and pattern recognition, this is a popular area and people have compiled a lot of information on this web site.