How Neural Networks are used in Data Mining


Introduction
In the past several years, information technology and the World Wide Web have created lots of innovations in the area of business. More businesses and organizations are collecting high quality data on a large scale. The huge amount of data can be a gold mine for business management. It is therefore increasingly important to analyze the data. However, timely and accurately processing tremendous data analysis in traditional methods is a difficult task. The ability to analyze and utilize massive data lags far behind the capability of gathering and storing it. This gives rise to new challenges for businesses and researchers in the extraction of useful information [1].
 
Data Mining
Data mining is defined at the process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data contained within a database. The purpose of data mining is to identify trends and patterns in data. Ray Kurzwell, “the father of voice-recognition software” stated that 98% of all human knowledge is “pattern recognition” [2].

Data mining is different from traditional statistical analysis. It is aimed at finding unsuspected relationships which are of interest or value to the database owners. Data mining does not compete with traditional statistical methods in basic statistical tasks. However, it offers better solutions in advanced problems than traditional statistical methods. Data mining methods and algorithms extract useful regularities from large data archives, either directly in the form of knowledge or indirectly as functions that allow predicting, classifying or representing regularities in the distribution of the data [1].

Neural Networks – Definition and History

A neural network is a parallel computing system of several interconnected processor nodes. The input to individual network nodes is restricted to numeric values falling in the closed range [0,1]. Because of this, categorical data must be transformed prior to network training [2].
In 1982, John Hopefield published a paper showing how neural networks could be used for computational purposes. In 1984, Teuvo Kohonen introduced a new algorithm he called an organizing feature map, which allowed for a process of using neural networks for unsupervised learning. This opened a new branch of neural network research where no “correct” answer is required to learn or train a network. In 1986 Rumelhart, Hinton and Williams wrote a paper on the back-propogation method, which opened up a flurry of activity in the late 1980s and 1990s.
Neural networks are used extensively in the business world as predictive models. In particular, the financial services industry widely uses neural networks to model fraud in credit cards and monetary transactions [3].
 
How Neural Networks Work
Neural networks attempt to mimic a neuron in a human brain, with each link described as a processing unit (PE). Neural networks learn from experience and are useful in detecting unknown relationships between a set of input data and an outcome. Like other approaches, neural networks detect patterns in data, generalize relationships found in the data, and predict outcomes. Neural networks have been especially noted for their ability to predict complex processes.
A processing element processes data by summarizing and transforming it using a series of mathematical functions. One PE is limited in ability, but when connected to form a system, the neurons or PEs create an intelligent model. PEs are interconnected in any number of ways and they can be retrained over several, hundreds, or thousands of iterations to more closely fit the data they are trying to model.

Processing elements, or PEs are linked to inputs and outputs. The process of training a network involves modifying the strength, or weight, of connections from the inputs to the output. Increase or decreases in the strength of a connection is based on its importance for producing the proper outcome. A connection’s strength depends on a weight it receives during a trial-and-error process. This process uses a mathematical model for adjusting the weights, and is called a learning rule.

Training repeatedly, or iteratively, exposes a neural network to examples of historical data. PEs summarize and transform data, and the connections between PEs receive different weights. That is, a network tries various formulas for predicting the output variable for each example.

Training continues until a neural network produces outcome values that match the known outcome values within a specified accuracy level, or until it satisfies some other stopping criteria.

Figure 1 demonstrates a neural network. Each of the processing units takes many inputs and generates an output that is a nonlinear function of the weighted sum of the inputs. The weights assigned to each of the inputs are obtained during a training process (often back-propagation) in which outputs generated by the nets are compared with target outputs. The answers you want the network to produce are compared with generated outputs, and the deviation between them is used as feedback to adjust weights.

The process of readjusting weights is important to increasing a model’s accuracy. Notice there are also four “hidden nodes”, or middle layer nodes, in Figure 1. These four hidden nodes are associated with the weighting process. The number of hidden nodes can be adjusted and there can be multiple levels of hidden nodes. The number of inputs, hidden nodes, outputs, and the weighting algorithms for the connections between nodes determine the complexity of a neural network, its accuracy, and the time it takes to create the neural network model. Because the configuration of hidden nodes and weights is so critical to neural networks, there are many approaches for finding the right number of hidden nodes and readjusting weights [3].

 
Generic Neural Network

Different Types of Neural Networks

The above example is what is referred to as a feed-forward network, which is commonly used with supervised learning studies. Feed-forward networks are very popular due to their relative simplicity and stability.

It is possible to perform unsupervised learning with neural networks as well. The process is similar, but no output is specified during training. In contrast to supervised learning, an unsupervised network is not given the desired response, but organizes the data in a way it sees fit. Such self-organizing networks divide input examples into clusters depending on similarity, each cluster representing an unlabeled category. Kohonen learning is a well-known method in self-organizing neural networks [3].
 
Strengths and Weaknesses of Neural Networks
The greatest strength of neural networks is their ability to accurately predict outcomes of complex problems. In accuracy tests against other approaches, neural networks are always able to score very high [4].
           
There are some downfalls to neural networks. First, they have been criticized as being useful for prediction, but not always in understanding a model. It is true that early implementations of neural networks were criticized as “black box” prediction engines; however, with the new tools on the market today, this criticism is debatable.
Secondly, neural networks are susceptible to over-training. If a network with a large capacity for learning is trained using too few data examples to support that capacity, the network first sets about learning the general trends if the data. This is desirable, but then the network continues to learn very specific features of the training data, which is usually undesirable. Such networks are said to have memorized their training data, and lack the ability to generalize. Commercial-grade neural networks today have effectively eliminated overtraining through “bootstrapping holdout (test) samples”, and by monitoring test versus training errors [3].

Another issue with neural networks is training speed. Neural networks require many passes to build. This means that creating the most accurate models can be very time consuming [5].
 
Examples of Neural Networks – Credit Card Analysis
Neural networks are a preferred technique in performing estimation which is popular in financial markets and manufacturing. Neural networks are used in many applications today. IBM, SAS, SPSS, HNC, Angoss, RightPoint, Thinking Machines, and Neo Vista are a few of the vendors working with neural network products.
HNC’s Falcon is used in detecting fraud in the financial market to the point where a sizable portion of all credit cards in America have been analyzed by HNC [3].

Currently monitoring more than 450 million payment card accounts worldwide,
Falcon Fraud Manager detects fraud with pinpoint accuracy via proven neural network models and other predictive technologies. The patented profiling technology and broad-based consortium models deliver reliable fraud scoring, at the lowest false positives in the industry, resulting in significantly reduced fraud losses [6].

The Oak Ridge National Laboratory developed in 2002 a data mining approach to predict bankruptcy in personal credit card accounts. Although the transaction patterns are time-series data streams, there are both numeric and character data attributes for each account, resulting in a number of more complex factors to be considered in communicating the knowledge content within this data. In the data mining phase of this work, decision trees have been used to partition the accounts within the database into groups of bad, bankrupt, or delinquent, based on attributes signifying their current status. The transactions for the account within each group are then used to train a partially recurrent neural network to recognize the behavior pattern in the running balance. When tested against a new set of accounts, the system does indicate a predictive power for bankruptcy [5].
 
Example of Neural Network – WEBSOM – Document Queries
Kohenen describes a system for exploring very large databases of text documents called WEBSOM. The system employs self-organizing maps to locate documents that most closely satisfy a document query. Unlike traditional queries, which use methods such as keywords or bibliographic information, WEBSOM allows the user to write a sort description of the document he is looking for.

The WEBSOM system actually consists of two separate self-organizing maps. The first map is called the fingerprint map, because it is used to produce a fingerprint (i.e. description) of the words in a document and how well they are related. First, an articles (a, an, the) are discarded as well as words occur very infrequently. Next the text is broken up into overlapping triplets of consecutive words, and each triplet is fed into the fingerprint map. When this process is complete, the vectors in each neuron serve as a fingerprint or histogram of the document. An important property of this approach is that triplets containing semantically similar words will map to nearby areas of the map because they tend to appear in similar triplets. This means that even if a query does not use exactly the same terms a document uses to describe a concept, the system can still properly retrieve the relevant document.
Once the fingerprint of a document has been generated by the first map, it is given as training data to the second map, which is called the document map. That is, the data vector x given to the document map describes the entire content of the fingerprint map. This process is repeated for each document in the database, after which the document map contains the accumulated fingerprints of all the documents.
           
The user can then write a short description of the type of document he is looking for. A fingerprint of the description is then produced in the manner described previously. The document map is then searched to find the set of documents that most closely matches the query [6].
 
Example of Neural Network Analyzing Insider Stock Trading Data
Two data mining techniques were compared for their ability to improve the prediction of abnormal returns using insider stock trading data. The two were neural networks (NN) and Multivariate Adaptive Regressive Splines (MARS). In the comparison, both analyzed abnormal stock market returns from the same 343 companies over the identical 4-
1/2 year period (1/93-6/97). The major findings were:

1) Both NN and MARS generally identified the same industries that had the most predictive abnormal stock returns.


2) Both found that predictions further in the future (12 and 9 months ahead) were more accurate than predictions closer to the trading date (6 and 3 months ahead) .


3) Both obtained better predictive accuracy using four – rather than two – months of back aggregated stock data.


4) NN identified a substantially greater percentage of stocks in the group with the highest explained variance than did MARS.


5) Data from small and midsize companies led to higher predictive accuracy than data from large size (S&P 500) companies using NN, but not MARS. The findings illustrate that the very complex interaction between insider trading data and abnormal stock returns can be systematically analyzed using non-linear techniques. Of the two assessed, NN led to comparatively more accurate predictions than did MARS [7].


References:  
 
  1. Mo Wang, S.J. Rees, S.Y. Liao, “Building an online purchasing behavior analytical system with neural network”, Edited by Zanasi, Brebbia and Melli, DataMining III., WIT Press, 2002.
  2. Richard Roiger and Cichael Geatz, Data Mining: A Tutorial-Based Primer, Addison-Wesley, 2003.
  3. Robert Groth, Data Mining: Building Completive Advantage, Prentice Hall, 2000.
  4. Alex Berson, Stephen Smith, Kurt Thearling, Building Data Mining Applications for CRM, McGraw-Hill, 1999.
  5. Usama Fayyad, Georges Grinstein, Andreas Wierse, Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann Publishers, 2002.
  6. “Falcon Fraud Manager”, 15 November 2003, <http://www.fairisaac.com>.
  7. Alan M. Safer, “A comparison of two data mining techniques to predict abnormal stock market returns”, Intelligent Data Analysis 7 (2003) 3–13 3, IOS Press.