THE STEPS INVOLVED IN DATA MINING
The following are various steps in the extraction process.
Data Selection: This involves choosing the types of data to be used. A database may contain data about customer purchases, lifestyles, demographics, census, state taxes, etc. If a retailer wanted to decide how to lay out the display shelves in the store they may only need to use purchase and demographic data.
Data Transformation: Once the data has been selected it often needs to be cleaned up or possibly transformed into values that can be operated on by the type of data mining operation to be performed and the technique to be used. Data may need to be converted into numeric values to be used in a neural network, new attributes may need to be defined or derived. In one case the database included 500 different ways of identifying which state of the U.S. the information came from.
Data Mining: The data is then mined using the desired technique in an effort to extract the information. There are many methods of mining for data. The method used is often based on the type of information you are seeking and the type of data that you have. Some of the methods are: association, sequence-based analysis, clustering, classification, estimation, fuzzy logic, neural networks, fractal-based transforms, and genetic algorithms.
To develop a symbolic classification model to predict if a magazine subscriber will renew their subscription, you first need to use clustering to segment the database and then apply rule induction to create a classification model for each desired cluster.
Data mining can also be:
Result Interpretation: Once the information has been extracted, it is analyzed based on the end users requirements, and the information is identified and presented to the decision maker via the decision support system. The purpose of interpretation is to visualize the output (logically or graphically) and filter the information to be presented to the decision maker. It is not uncommon to find during the interpretation step that the rules or data selection needs to be modified.
Some of the decisions to be made may involve large amounts of money and management tends not to be very enthusiastic about embracing ideas that they cannot understand or analyze for themselves. If management cannot understand the rules it is hard to explain to a client how they reached the decision.