Data mining and learning can be divided into several fields of
study:
Algorithms used in different areas are dramatically
different. In order to understand each field, we need to pursue
the study separately.
See
how to become familiar with various data mining and learning
techniques.
|
(still under construction ...) Last update:
09/27/2006 02:02:37 PM
In our group, we focus on applying various
data mining and learning techniques to test and verification
related applications. For example, currently we have two
projects funded by Semiconductor
Research Corporation (SRC):
-
Task ID 1173.001 - Statistical Timing
Simulation Tools and Methodologies for Speed Test and
Performance Validation (see How can I
begin to work on this
research)
-
Task ID 1360.001 - Simulation Data Mining
for Functional Test Pattern Generation
For 1360.001, this project just started on
Dec 1, 2005. You can find the original proposal and related
information here
(username: src05, password: src051578).
This page was created to help people to
understand what mining and learning techniques are available and
what the potentials are for these techniques to be applied in
design and manufacturing. We divide the techniques into four
categories: Association rule
analysis, Computational
learning, Machine learning, and
Statistical learning and
discuss their applications separately. You can click on those
links to the page of individual discussion and material.
Historical perspective of how I got to
this point:
If you wonder how I started to learn the subject of data mining
and learning and began the research of applying those techniques
in test and verification, this is a brief description of the
path I followed:
-
At the beginning (in 2002), I read the
book "The Elements of Statistical Learning" by
T. Hastie,
R. Tibshirani,
J. H. Friedman. Find
this book at Amazon. I still consider this is the most
fundamental book to begin the learning of the subject.
-
Then, I started to read the book "An
Introduction to Support Vector Machines" by by
Nello Cristianini,
John Shawe-Taylor. Find
this book at Amazon. SVM and kernel methods have been
very popular in various applications such as pattern
recognition and image processing.
-
In 2004
Design Automation Conference, I wrote
a paper using SVM to
analyze critical timing paths. I used this very nice
SVM software Torch3 for
the implementation. After the work, I realized that
technique such as SVM is not enough for solving most of the
problems raised in test applications. For most of the
problems in test, having a good learning model M from data
is not enough. Often, to do meaningful work, we also need to
compute the inverse model M-1(). Computing
inverse SVM is difficult. Hence, at this point, I switch my
focus to look for other techniques and applications.
-
Because we need to inverse a learned
model, I began to think why bother using a sophisticated
learning algorithm like SVM while we can use a simple
learning algorithm like the basic iterative algorithm in
Rosenblatt's Perceptron (see Chap 2, An Introduction to
Support Vector Machine). This inspired me to devise a path
learning algorithm and wrote a
paper for ICCAD 2004.
-
After the two papers at DAC and ICCAD, I
realized the limitations of the approach. In these two
papers, what I did was to formulate the problem such that an
established learning technique could be somehow applicable.
I realized that this approach would lead me to only limited
ground. In
order to take this line of research into another level, I
need to truly understand the essence of all learning
techniques and became a researcher in data mining and
learning as well. In continuing to look for inspiration, I
found this excellent book "Information
Theory, Inference, and Learning Algorithms" by David J.
Mackay. Unlike the previous two books that focus on
statistical learning, this book covers everything on the
principles of learning, especially on the Bayesian
perspective of learning.
-
In parallel, at the beginning of 2005,
my student
Charles and I wrote
a paper on Learning
simulation data for function test pattern generation. This
paper received some attention from the verification
community and was accepted to IEEE Transactions on
Computers.
-
Built on the work in that paper, in 2005,
Charles and I began a sequence of studies. Then, we realized
that for that line of research, what we needed most was to
understand the field of
Association Rule Mining and
Computational Learning. Therefore, in 2005, we focused
our study on these two areas.
-
In 2005, another major study I did was to
understand the field of manufacturing and process modeling.
I spent the entire summer to study those subjects. At the
end, I compiled a
tutorial (username: ttep, password: ttep1578).
-
After the study, I realized what the most
important questions to be answered to overcome the issues of
process variations. Then, I began to formulate the new
problems to be solved along that line of research, which now
I call it "statistical testing."
-
At the end of 2005, we made significant
progress in our research. We wrote two papers.
The first paper concerns
the learning of Boolean functions. This work is built on
what we learned from the
Computational learning theory and
Association rule analysis.
Ben wrote second paper
(later appeared in DAC 06)
that
concerns the learning of spatial delay correlations due to
process variations. This work applied Bayesian learning in
the context of statistical timing analysis.
-
At this point, we realized that there are
many problems in test and verification, which should be
formulated as learning problems. We are still in the process
of doing so. We believed that Data mining and statistical
learning should be an essential part of research for the
developments of future design and manufacturing
methodologies and tools. In the
new SRC project web
page (username: src, password: src1578), you can find
the directions we are going to.
How to become familiar with various data mining and learning
techniques:
There is no short-cut to get familiar with a
field. However, to learn the various algorithms in data mining
and learning, it is important to run them by yourself. In many
books or public domains, there are people publishing their codes
to implement various algorithms. You need to download those
codes and try them out. The best way to pursue the study is to
-
Read the important articles or book on
the subject
-
Understand the key algorithms, how they
work, not necessarily why they work
-
Search the Web for public code to
implement those algorithms
-
Try out those algorithms with some data
set to get some sense on why they work
For example, if you want to study Association
rule analysis, obviously the Apriori algorithm is the place to
start. You can find implementation of Apriori as a stand-alone
code or as a Matlab code on a
web site devoted to the Association rule mining research.
I have mentioned above that if you are
interested in SVM, you should try out
Torch.
If you want to try out general statistical
learning algorithms, such as Principle Component Analysis or
Independent Component Analysis, you can use the
R software package.
There are many algorithms implemented as the add-on packages to
the R framework. You just need to search for them.
If you are interested in kernel machines
and pattern recognition, this is a popular area and
people have compiled a lot of information
on this web site.
|