[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

VASANT DHAR DISCUSSES KD/DM STRATEGIES                               10.07.97
by Alan Beck, editor in chief                                          D S *

D S * : Numerous decision-support and data mining experts have noted how important -- and how difficult -- it is to correctly formulate problems. What is the best strategy for problem formulation?

DHAR: "Indeed, there is no magic to any of these technologies. They will only work well if you have a reasonable idea of what you're looking for. As they say: "A fool with a tool is still a fool." Before you undertake any kind of data mining effort -- even if it is exploratory -- you must, first, define your business objectives. This might sound obvious, but I'm continually being surprised at how inadequately some go about this step.

"Defining business objectives involves nothing profound. I'm talking about things like: "We want to understand our customer base better," or "We want to increase the productivity of our sales force," or "We want to improve the performance of our trading models." Your definition must be useful for quantifying your goals.

"Once you've actually quantified your objectives, you can begin taking an inventory of data to see whether -- even in principle -- you "can get there from here." Sometimes you simply don't have the data or quality of data. But you must link your data inventory to those business objectives.

"Then you can set up the problem so the right metric is defined. For example, if you're trying to find out whether one group is more productive or behaves differently than another, you must have a quantifiable, useable metric to distinguish between them. At this point, technology can help you make the connection between data and business objectives. Such techniques really shine when the relationships are nonlinear or discontinuous."

D S * : But how can we forge effective business objectives and evaluate the potential usefulness of data if our intuition misleads us so frequently? Similarly, how can we balance open-ended exploration against simply calculating answers to obvious questions?

DHAR: "Things shouldn't be completely open-ended. When you're looking for patterns, you are always looking for things that are interesting with respect to an objective -- that's where business schools come in! You should have a reasonably well-defined objective in mind. And once you have it, even exploratory data analysis is essentially hypothesis-testing. Within the context of an objective, you formulate hypotheses exactly as in statistics.

"For example, if your objective is to ascertain what makes sales people productive, you still haven't formulated any hypotheses. The system will do that for you; it may say: "Let's check to see whether salespeople with a certain number of years experience or background or age or demographic profile tend to do better than others. These are all hypotheses generated internally, and in that sense the analysis is still exploratory. But it's still focused by the initial objective. So matters aren't quite as open-ended as some might think.

"This is not to say that once you establish an objective you will find something. Many times, the problem is formulated, and you find nothing. But that does not change the underlying methodology."

D S * : In the past, you have stressed that KD/DM technology cannot be effectively implemented without bridging the gap between technical and business personnel. Yet, the two groups usually speak very different languages and formulate goals quite differently. How do we solve this problem?

DHAR: "There is no simple answer. There must be a sincere commitment from each side to educate and learn from the other. For the technologist, the challenge is to obtain and retain the intentions of the businesspeople. On the business side, the imperative is to keep an open mind yet not become overawed by the technology or jargon. It is much easier for businesspeople to understand where technologists are coming from, if the latter do a decent job of describing what data mining and exploratory data analysis are. These technologies are not crystal balls but well-defined procedures.

"I believe technologists often have a harder time understanding businesspeople than the other way around. But those technologists who succeed are extremely valuable!"

D S * : Should there be dedicated liaison staff for closing this communications gap?

DHAR: "I'm agnostic on this. Much depends upon the level of commitment and effort an organization is making. If the commitment is large enough, it probably does warrant that kind of position. On the other hand if an organization is just exploring these avenues, it may not want to invest that kind of money up front, although that capability is definitely needed."

D S * : What is the most concise advice you can offer to CIOs and those in similar positions who must ferret out profitable information from complex profiles and large data sets?

DHAR: "First off, it should be understood that just getting into these technologies and doing exploratory data analysis is a clear imperative for most organizations. If you're not doing it, chances are that someone else is -- and they will be learning about the market a lot faster than you will.

"That does not mean it is necessary to invest a huge amount in resources. Usually, a small focused team is the way to go. If I were a CIO, I would find technologically competent people who understand the business -- or at least a few such people who have the ability to understand the business fairly quickly. The group may be either external or internal to the organization. Then have this team demonstrate success on a small number of key problems.

"I've personally seen this approach implemented on Wall Street, and it worked out quite well. Specifically, it demonstrated that these technologies could be profitably applied, and it generated enthusiasm from a larger group who were eager to realize the benefits of this type of thinking.

"It's also possible to start on a small number of problems that vary in their level of difficulty. Thus, the technologists can be subjected to a true test. Take some of the "low fruit" and some of the "high fruit" and demonstrate what it really takes to address problems on either end of the spectrum. The downside is a greater potential for failure on the more difficult problems -- but that can be an important learning exercise. Of course, if you succeed it's dynamite."

---

For more information, see
http://www.stern.nyu.edu/~vdhar
http://www.prenhall.com/allbooks/be_0132820064.html


Alan Beck is editor in chief of D S * and vice president of publications for Tabor Griffin Communications. Comments are always welcome and should be directed to alan@tgc.com

[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]