DATA MINING
by Stewart Deck
DataMining. You've probably heard of it. Or maybe you've heard of data warehousing. Mining and warehousing are related -- warehousing brings your data together for analysis. Mining sorts through the data you've collected and turns up interesting and useful connections.
Understanding data mining may be important to you. After all, Forrester Research Inc. in Cambridge, Mass., predicts that the next two years will see an explosion of data mining projects, with almost four times the number that currently exist, says Frank Gillett, a Forrester analyst.
It all starts with a load of finely detailed historical data that needs to be sifted through for gems. Then, you need to decide what discrete problem you want to solve -- increasing direct-mail response rate, finding mortgage customers or boosting grocery sales, for example.
To get through all the data, you need mining tools based on algorithms that scan through the data looking for patterns (such as grocery shoppers buying peanut butter and jelly together).
Most mining tools need to have data in a flat file format in order to start sorting through it, so the data is extracted and put in a flat text file. Then the mining process can begin.
The tools themselves work in a variety of ways. Some are desktop-based, others are client/server. Some, like Right Point Software Inc.'s, have one algorithm that does one type of search. Others, such as SAS Institute Inc.'s offering, include a tool kit of several algorithms.
"Even though mining gives the impression that you can turn a tool loose on the data you have, you need to have a general idea of what you're going after," explains Wayne Eckerson, vice president of technology services at the Data Warehousing Institute in Gaithersburg, Md. "You have to carefully select the variables."
A Fine Line
If you don't include a key variable, you may not get the relationship you're looking for -- too many variables produce too much output, according to Eckerson. But an over-reliance on tools capabilities could lead to trouble, he warns.
There are other areas that could cause problems if not addressed in the beginning stages of a data mining project. You must have someone who knows what they're doing as your mining expert, says Herb Edelstein, an analyst at Two Crows Corp. in Potomac, Md. "To think you can do data mining without a statistical or mining background is mind-boggling," he says.
Properly selecting which data to include for which searches is imperative. Too much data won't produce useful results, so choices need to be made with a feeling for what can have an effect on the business. For example, a project leader with a statistical background may not understand that a customer's age wouldn't be as good a predictor as an age-to-income ratio.
On the other hand, if the project leader has only a statistical and business background, he may not understand data storage, transportation and maintenance requirements, Edelstein cautions. Some projects suffer because too much attention is spent on preparing the data instead of refining the mining models.
The key, Edelstein says, is the data. "The real issue in mining is what you do with the data. Without data, all we have are opinions," he says.