HERB EDELSTEIN DISCUSSES THE USEFULNESS OF DATA MINING 10.14.97 by Alan Beck, editor in chief D S *
D S * : Where are CIO's most likely to go astray in implementing data mining technologies?
EDELSTEIN: "It's important to remember that datamining usually comes into an organization from the application side. Someone in marketing or finance has a problem (s)he wants to solve. Thus, a CIO who is IT oriented, who buys tools and then says, 'We'll now focus the company to use these tools -- pick whatever you want,' is asking for trouble."
D S * : In what way?
EDELSTEIN: "Datamining tools are designed to solve business problems, and you cannot acquire these tools without knowing the specifics of the problems you're going to solve. It's quite likely that many companies will find themselves with more than one data mining tool, because not all tools are good for all problems.
"Typically, the pull for data mining comes not from the IS side but from a particular functional department in an organization that can make use of it. So most good CIOs should know that the way you avoid trouble is by working with your users."
D S * : How should problems be formulated to extract maximum benefit?
EDELSTEIN: "One of the keys to successful datamining is posing the question in such a way that the tool or tools can help you find the answer. But that's not always an easy thing to do: people need experience in focusing the questions down. Sometimes there is a tendency to say, 'I've got this wonderful data mine or knowledge-discovery tool. It will discover what's in here. I don't need to tell it what to look for.' In fact, you do.
"Just as a person working for a boss who issues vague instructions is unlikely to produce a good result, so a data mining tool asked merely, 'How do I improve my direct-mail campaign?' or told 'Find a profitable pattern in this data,' is not going to give you what you're looking for. In the former case, a better query would be: 'What can I do to improve my response rate?' Thus: 'Find patterns of people who are more likely to respond to this mail than others who are not as likely.' Another goal in this context might be: 'Find people who will spend more on their response.' And that pattern is likely to be quite different from the first!"
D S * : Then why is there such enthusiasm for unearthing hidden patterns?
EDELSTEIN: "The emphasis on finding what is hidden carries with it an unfortunate implication. There are many senior executives who are beginning to think that data mining is in some way magical. This is emphatically not the case. Data mining is a good tool, and things may be hidden in data sets. But you will not find them if you don't understand your business and your data. Data mining is not a substitute for knowing your business and your data. It is an aid to those who do."
D S * : But some databases are enormous, and there are so many potential avenues of inquiry to explore and so many dimensions to evaluate. How can these be effectively narrowed?
EDELSTEIN: "Data mining is not a random search tool. For example, people are awed by the fact that Deep Blue beat Garry Kasparov in chess -- a computer defeated a human being! But in fact that is not what happened. What happened was that a group of chess experts used a computer to amplify their chess knowledge. They programmed-in much of their chess expertise and used areas where the computer was strong to help them in areas where they were weak. So _together_, human beings with a computer beat a top-notch chessplayer.
"Datamining is like that. Datamining does not operate independently of people who know their business and know what they're doing. If you've got terabytes of data, and you're relying on data mining to find interesting things in there for you, you've lost before you've even begun. You really need people who understand what it is they're looking for -- and what they can do with it once they find it.
"In the retail business, for instance, enormous amounts of data are generated through the prevalence of scanners. So an association is found between two products -- one apocryphal example is an association between beer and diapers. Now what do you do with that fact? Data mining can't tell you. Indeed, data mining can inform you that, after looking through huge amounts of retail information, that two or three or four different purchases are associated. What does this actually mean for the retailer? Should (s)he put them together or spread them apart in the store?
"It is unlikely, though not impossible, that in sifting through large volumes of information data mining will unearth a fact that will transform your business from marginal profitability into a leader in its field. Data mining can help you find outliers. That's particularly valuable in areas like fraud detection. But in most database marketing applications, it simply lets you do a better job. Now that better job may well translate into huge ROIs for the data mining effort: I know people who are looking at three-digit ROIs. But ultimately, it translates into improving the way you're already doing something -- perhaps from doing it 65% right to 70% right. That could yield 100%-200% ROI, but it's not taking you from dead wrong to 100% right.
"I get concerned about the disconnect between what some people expect from data mining and what data mining can actually deliver."
D S * : Let's get back to that hypothetical beer and diaper association. What is the executive to do if such connections are unearthed?
EDELSTEIN: "Data mining doesn't address that. And just because you find a connection doesn't mean there is anything you can do about it that makes sense. One executive might conclude, "I'll put these two items on opposite sides of the store, so people must walk through more merchandise to buy both." However, this might also make sales go down because people find shopping there inconvenient. Another executive may put the two items close together -- and find there is no significant change in sales. Just because you've noted an association does not mean there is anything you can do to take advantage of it.
"So if you rely solely on datamining without keeping it within the context of your business and your data, if you have no idea what you're going to do with the results, then it will become a great exercise for the consultants -- and you can spend a ton of money, but it won't be particularly good for the company. The most successful data mining projects are those that are focused on answering specific business questions: Who buys this? Under what circumstances do they buy it? Why did something we thought was going to work not work? How effective was this promotion?
"Data mining allows you to take things you can measure and predict things you can't measure."
D S * : So data mining is fruitless if one lacks sufficient business sense to perceive the nature of unearthed relationships?
EDELSTEIN: "Exactly. Suppose I guaranteed that tomorrow such-and-such a horse would win a race. But also suppose that you had never been to a racetrack, had no idea how to place a bet, had no understanding of odds. How useful would that information be to you? To make any money, you would quickly have to learn a lot about betting on horses.
"Suppose I told you that Exxon had discovered a major oil field off the Gulf Coast. Would that really mean that you could make money in the stock market by buying Exxon? In fact, a broker once communicated a similar fact to me before it was publicly announced, and in addition there had been no prior movement in the underlying stock price. So I bought some shares. When the fact was publicized, it had absolutely no effect on the stock price -- nobody cared. I didn't understand the business; I didn't know the real implications of such news.
"Even if datamining does find something useful, if you don't understand what it means for your business, what good is it?
"I also want to stress that data mining is successfully applied to a wide range of problems. The beer-and-diapers type of problem is known as association discovery or market-basket analysis, that is, describing something that has happened in existing data. This should be differentiated from predictive techniques. Most of the big payoff stuff has been in predictive modeling."
---
Herb Edelstein may be contacted at: herb@twocrows.com For more information, see http://www.twocrows.com