How Neural Networks are used in
Data Mining
Introduction
In
the past several years, information technology and the World Wide Web
have
created lots of innovations in the area of business. More businesses
and
organizations are collecting high quality data on a large scale. The
huge
amount of data can be a gold mine for business management. It is
therefore
increasingly important to analyze the data. However, timely and
accurately
processing tremendous data analysis in traditional methods is a
difficult task.
The ability to analyze and utilize massive data lags far behind the
capability
of gathering and storing it. This gives rise to new challenges for
businesses
and researchers in the extraction of useful information [1].
Data Mining
Data
mining is defined at the process of employing one or more computer
learning
techniques to automatically analyze and extract knowledge from data
contained
within a database. The purpose of data mining is to identify trends and
patterns in data. Ray Kurzwell, “the father of voice-recognition
software”
stated that 98% of all human knowledge is “pattern recognition” [2].
Data
mining is different from traditional statistical analysis. It is aimed
at
finding unsuspected relationships which are of interest or value to the
database owners. Data mining does not compete
with traditional statistical methods in basic statistical tasks.
However, it
offers better solutions in advanced problems than traditional
statistical
methods. Data mining methods and algorithms extract useful regularities
from
large data archives, either directly in the form of knowledge or
indirectly as
functions that allow predicting, classifying or representing
regularities in
the distribution of the data [1].
Neural Networks – Definition and History
A
neural network is a parallel computing system of several interconnected
processor
nodes. The input to individual network nodes is restricted to numeric
values
falling in the closed range [0,1]. Because of this, categorical data
must be
transformed prior to network training [2].
In
1982, John Hopefield published a paper showing how neural networks
could be
used for computational purposes. In 1984, Teuvo Kohonen introduced a
new
algorithm he called an organizing feature
map, which allowed for a process of using neural networks for
unsupervised
learning. This opened a new branch of neural network research where no
“correct” answer is required to learn or train a network. In 1986
Rumelhart,
Hinton and Williams wrote a paper on the back-propogation method, which
opened
up a flurry of activity in the late 1980s and 1990s.
Neural
networks are used extensively in the business world as predictive
models. In
particular, the financial services industry widely uses neural networks
to
model fraud in credit cards and monetary transactions [3].
How Neural Networks Work
Neural
networks attempt to mimic a neuron in a human brain, with each link
described
as a processing unit (PE). Neural networks learn from experience and
are useful
in detecting unknown relationships between a set of input data and an
outcome.
Like other approaches, neural networks detect patterns in data,
generalize
relationships found in the data, and predict outcomes. Neural networks
have
been especially noted for their ability to predict complex processes.
A
processing element processes data by summarizing and transforming it
using a
series of mathematical functions. One PE is limited in ability, but
when
connected to form a system, the neurons or PEs create an intelligent
model. PEs
are interconnected in any number of ways and they can be retrained over
several, hundreds, or thousands of iterations to more closely fit the
data they
are trying to model.
Processing
elements, or PEs are linked to inputs and outputs. The process of
training a
network involves modifying the strength, or weight,
of connections from the inputs to the output. Increase or decreases in
the
strength of a connection is based on its importance for producing the
proper
outcome. A connection’s strength depends on a weight
it receives during a trial-and-error process. This process
uses a mathematical model for adjusting the weights, and is called a learning rule.
Training
repeatedly, or iteratively, exposes a neural network to examples of
historical
data. PEs summarize and transform data, and the connections between PEs
receive
different weights. That is, a network tries various formulas for
predicting the
output variable for each example.
Training
continues until a neural network produces outcome values that match the
known
outcome values within a specified accuracy level, or until it satisfies
some
other stopping criteria.
Figure
1 demonstrates a neural network. Each of the processing units takes
many inputs
and generates an output that is a nonlinear function of the weighted
sum of the
inputs. The weights assigned to each of the inputs are obtained during
a
training process (often back-propagation) in which outputs generated by
the
nets are compared with target outputs. The answers you want the network
to
produce are compared with generated outputs, and the deviation between
them is
used as feedback to adjust weights.
The process of readjusting weights
is important to increasing a model’s accuracy. Notice there are also
four
“hidden nodes”, or middle layer nodes, in Figure 1. These four hidden
nodes are
associated with the weighting process. The number of hidden nodes can
be
adjusted and there can be multiple levels of hidden nodes. The number
of
inputs, hidden nodes, outputs, and the weighting algorithms for the
connections
between nodes determine the complexity of a neural network, its
accuracy, and
the time it takes to create the neural network model. Because the
configuration
of hidden nodes and weights is so critical to neural networks, there
are many
approaches for finding the right number of hidden nodes and readjusting
weights
[3].

Different Types of Neural Networks
The above example
is what is referred to as a feed-forward
network, which is commonly used with supervised
learning studies. Feed-forward networks are very popular due to
their
relative simplicity and stability.
It is possible to perform unsupervised learning
with neural
networks as well. The process is similar, but no output is specified
during
training. In contrast to supervised learning, an unsupervised network
is not
given the desired response, but organizes the data in a way it sees
fit. Such
self-organizing networks divide input examples into clusters depending
on
similarity, each cluster representing an unlabeled category. Kohonen learning is a well-known method
in self-organizing neural networks [3].
Strengths and Weaknesses of Neural Networks
The
greatest strength of neural networks is their ability to accurately
predict
outcomes of complex problems. In accuracy tests against other
approaches,
neural networks are always able to score very high [4].
There are some downfalls to neural
networks. First, they have been criticized as being useful for
prediction, but
not always in understanding a model. It is true that early
implementations of
neural networks were criticized as “black box” prediction engines;
however,
with the new tools on the market today, this criticism is debatable.
Secondly,
neural networks are susceptible to over-training. If a network with a
large
capacity for learning is trained using too few data examples to support
that
capacity, the network first sets about learning the general trends if
the data.
This is desirable, but then the network continues to learn very
specific
features of the training data, which is usually undesirable. Such
networks are
said to have memorized their training data, and lack the ability to
generalize.
Commercial-grade neural networks today have effectively eliminated
overtraining
through “bootstrapping holdout (test) samples”, and by monitoring test
versus
training errors [3].
Another
issue with neural networks is training speed. Neural networks require
many
passes to build. This means that creating the most accurate models can
be very
time consuming [5].
Examples of Neural Networks – Credit Card Analysis
Neural networks are a preferred technique in performing estimation
which
is popular in financial markets and manufacturing. Neural networks are
used in
many applications today. IBM, SAS, SPSS, HNC, Angoss, RightPoint,
Thinking
Machines, and Neo Vista are a few of the vendors working with neural
network
products.
HNC’s Falcon is used in detecting fraud in the financial market to the
point where a sizable portion of all credit cards in America
have been analyzed by HNC
[3].
Currently monitoring more than 450 million
payment card accounts worldwide, Falcon Fraud Manager detects fraud
with pinpoint accuracy via proven neural network models and other
predictive
technologies. The patented profiling technology and broad-based
consortium
models deliver reliable fraud scoring, at the lowest false
positives
in the industry, resulting in significantly reduced fraud
losses
[6].
The
Oak Ridge National Laboratory developed in 2002 a data mining approach
to
predict bankruptcy in personal credit card accounts. Although the
transaction
patterns are time-series data streams, there are both numeric and
character
data attributes for each account, resulting in a number of more complex
factors
to be considered in communicating the knowledge content within this
data. In
the data mining phase of this work, decision trees have been used to
partition
the accounts within the database into groups of bad, bankrupt, or
delinquent,
based on attributes signifying their current status. The transactions
for the
account within each group are then used to train a partially recurrent
neural
network to recognize the behavior pattern in the running balance. When
tested
against a new set of accounts, the system does indicate a predictive
power for
bankruptcy [5].
Example of Neural Network – WEBSOM –
Document Queries
Kohenen
describes a system for exploring very large databases of text documents
called
WEBSOM. The system employs self-organizing maps to locate documents
that most
closely satisfy a document query. Unlike traditional queries, which use
methods
such as keywords or bibliographic information, WEBSOM allows the user
to write
a sort description of the document he is looking for.
The
WEBSOM system actually consists of two separate self-organizing maps.
The first
map is called the fingerprint map,
because it is used to produce a fingerprint (i.e. description) of the
words in
a document and how well they are related. First, an articles (a, an,
the) are
discarded as well as words occur very infrequently. Next the text is
broken up
into overlapping triplets of consecutive words, and each triplet is fed
into
the fingerprint map. When this process is complete, the vectors in each
neuron
serve as a fingerprint or histogram of the document. An important
property of
this approach is that triplets containing semantically similar words
will map
to nearby areas of the map because they tend to appear in similar
triplets.
This means that even if a query does not use exactly the same terms a
document
uses to describe a concept, the system can still properly retrieve the
relevant
document.
Once
the fingerprint of a document has been generated by the first map, it
is given
as training data to the second map, which is called the document
map. That is, the data vector x given to the
document map describes the entire content of the
fingerprint map. This process is repeated for each document in the
database,
after which the document map contains the accumulated fingerprints of
all the
documents.
The user can then write a short
description of the type of document he is looking for. A fingerprint of
the
description is then produced in the manner described previously. The
document
map is then searched to find the set of documents that most closely
matches the
query [6].
Example of Neural Network – Analyzing
Insider
Stock Trading Data
Two
data mining techniques were compared for their ability to improve the
prediction
of abnormal returns using insider stock trading data. The two were
neural
networks (NN) and Multivariate Adaptive Regressive Splines (MARS). In
the
comparison, both analyzed abnormal stock market returns from the same
343
companies over the identical 4-1/2 year
period (1/93-6/97). The major findings were:
1) Both
NN and MARS generally identified the same industries that had the most
predictive
abnormal stock returns.
2) Both
found that predictions further in the future (12 and 9 months ahead)
were more accurate
than predictions closer to the trading date (6 and 3 months ahead) .
3) Both
obtained better predictive accuracy using four – rather than two –
months of
back aggregated stock data.
4) NN
identified a substantially greater percentage of stocks in the group
with the
highest explained variance than did MARS.
5) Data
from small and midsize companies led to higher predictive accuracy than
data
from large size (S&P 500) companies using NN, but not MARS. The
findings
illustrate that the very complex interaction between insider trading
data and
abnormal stock returns can be systematically analyzed using non-linear
techniques.
Of the two assessed, NN led to comparatively more accurate predictions
than did
MARS [7].
References:
- Mo Wang, S.J. Rees, S.Y. Liao,
“Building an online purchasing behavior analytical system with neural
network”, Edited by Zanasi, Brebbia and Melli, DataMining
III., WIT Press, 2002.
- Richard Roiger and Cichael Geatz, Data Mining: A Tutorial-Based Primer, Addison-Wesley,
2003.
- Robert Groth, Data Mining:
Building Completive Advantage, Prentice Hall, 2000.
- Alex Berson, Stephen Smith, Kurt
Thearling, Building Data Mining Applications for CRM,
McGraw-Hill, 1999.
- Usama Fayyad, Georges Grinstein,
Andreas Wierse, Information Visualization in Data Mining
and Knowledge Discovery, Morgan Kaufmann Publishers, 2002.
- “Falcon Fraud Manager”, 15 November
2003, <http://www.fairisaac.com>.
- Alan M. Safer, “A comparison of two
data mining techniques to predict abnormal stock market returns”,
Intelligent Data Analysis 7 (2003) 3–13 3, IOS Press.