[ Table of Contents | NEXT ARTICLE ]

GOLDEN MEANS: DATA MINING AND MEASUREMENT; KEEP YOUR FINGERS FLAT ON THE PATIENT'S WRIST
by Manoel G. Mendonca


Many years ago, when he still was a pre-med. student, my older brother took my wrist and said, "Let me take your pulse". He then laid his fingers on my pulse and after some time he told me, "Your heart rate is 84 bpm". Eager to try the procedure myself, I took his wrist, put my fingers on his pulse, and counted his heartbeat. After a brief time, I announced, "Yours is 164". He laughed at me and said, "You did it all wrong. You used the tip of your fingers to take my pulse. This way you counted your pulse as well as mine. You need to keep your finger flat on the patient's wrist."

What does all this have to do with decision making, you might ask. The answer is that all decisions are based on data that are collected, polled, surveyed, sensed, or otherwise measured in some way. If the data are no good, you cannot make good decisions with it.

Sometimes, it is easy to see that the data are fishy. Upon facing a heart rate as described in our little story, a doctor would probably know that there is something wrong with the reading. However, the data used for decision making are far more complex than a simple pulse reading. It is not always clear that the data are fishy. In such cases, the decision-maker may end up saying "Take this patient to a hospital. He has a serious health condition", even though there is nothing wrong with the patient.

The situation is worse when the measurement process is also complex. This often happens when the data collection is manual, and/or the data are based on subjective opinion (e.g., surveys). In those cases, the data are almost always error prone, even when good measurement practices are employed to collect it.

In the above scenario, the only way to check if you have good data is to analyze it and use common sense to interpret the results. One can use traditional statistical analysis of data to do that, but that will give you a limited view of the data and its problems. Traditional data analyses usually focus on analyzing a small number of hypotheses, i.e., recognized key areas of the data. Those areas are analyzed over and over again as new data are collected. Problems that are not caught by the first cycle of data analyses are unlikely to be caught later on.

Data mining techniques, on the other hand, can be quite useful to catch those problems. Data mining goes after unexpected, counter-intuitive data patterns in the data. This lends itself nicely to gaining new insights on data quality as well as on the measurement process being used to collect the data. Let me illustrate those facts with two examples drawn from a recent effort to mine customer satisfaction data for a large software development corporation.

Customer satisfaction surveys are conducted yearly and that data is used to compare the satisfaction of customers with the company's products versus their competition. Since traditional data analyses had been producing very few new insights, the company decided to check what data mining could produce from this data.

One of the data mining analyses we did was aimed at discovering differences in customer satisfaction attributes across product classes. A strong pattern that emerged was that customers of database products had lower satisfaction with the products' maintainability. While we were brainstorming to explain this result, many hypotheses were raised. Most tried to explain why the database products were more difficult to maintain than other products. The discussion ended when someone raised a much simpler hypothesis. Some customers were misunderstanding the question regarding the maintainability of the database products. They were confusing the maintainability of the products with the maintainability of the databases that were built using those products. Just as in our little story, the fingers were not flat on the patient's wrist. For the database products, the survey was measuring two different things at the same time.

Another analysis aimed at discovering temporal deviations of customer satisfaction attributes. One of the strong patterns that emerged was that the customer satisfaction with the products' documentation climbed sharply in 1996. The result was eventually explained by a modification in the 1996 survey questionnaire. The following sentence had been removed from it, "If you are not familiar with the product documentation skip the next question." As one might have guessed, the "next question" was about the customer's satisfaction with the documentation. Even though there was an option, "don't know", for this question, many customers that were not familiar with the product documentation still answered the question. Those customers caused the fluctuation on the documentation satisfaction scores in 1996, not some product improvement, not some market change. This type of problem is even more troublesome than the previous one. It is not a case of not having the fingers flat on the patient's wrist. It is a case of having the fingers on the pulse of the wrong patient (those customers who were not using the documentation).

Both problems were never caught by the periodical, more traditional data analyses because those analyses did not focus on comparing scores between product classes, or between different years. They focused on comparing products within product classes -- a limited view of the data. Those problems would have been missed for a long time if it were not for the adoption of a broader view of the data facilitated by the use of data mining. In this kind of situation, data mining does more than produce new business insights and support strategic decision-making. It helps decision-makers to better understand the nature and quality of their data. It helps them to avoid making poor decisions.

---

For more information, see http://www.virtualgold.com


[ Table of Contents | NEXT ARTICLE ]