THE MAGIC OF DATA MINING
by Ed Colet, Virtual Gold, Inc.
Data mining can be analytically complex, yet we want discovered patterns to be readily understood by end-users, many of whom may not have the technical background to truly understand the underlying analyses. Instead of hoping that end-users accept results on faith, it's would be better if we helped them understand the MAGIC behind data mining.
Consider the following example to illustrate the difference between a straightforward query against data and data mining. A straightforward data query might ask "How many sales of Yogurt were made in the last month?" and the retrieved answer is "52,679". Data mining on the other hand can let the user ask, "Tell me something interesting about sales of Yogurt." Data mining software may discover that, "There were 52,679 units of Yogurt sold during the last month. 79% of sales were also made along with Bananas, and sold on a Thursday or Friday." How or why the software decided to consider sales of fruit or day of week or any other attribute in the data set is typically transparent to the user. He or she need not know how the underlying algorithm is implemented. In a way then, data mining has a "black box" flavor to it in the sense that the user doesn't know what happens inside this analytical box.
The results of data mining patterns that are presented to the user are then expected to be interpreted by the user, without saying how these patterns were discovered. It's almost as if the results should be preceded by the phrase "Believe it or not, ..." or "Take it on faith that...". So how can data mining patterns convince the user that they're important and valuable, etc.? There are two solutions. One is to make the algorithm and/or rationale behind the discovery explicit so the path from raw data to observed result can be followed. If it's a brute force approach, the system tells the user that the software looked at sales of Yogurt and because there's sufficient computer power available it evaluated every possible combination of Yogurt and anything else. Results that are reported are only those patterns that are probabilistically unusual, but happened to occur anyway. The second approach involves more of an intelligent agent implementation. The system may report that it evaluated Yogurt and day of the week and Fruit because it knew that the user's previous queries have explicitly asked about these combinations and the system decided to evaluate it in this case and present the results to the user.
We implicitly expect the user to understand how patterns are found, but neither of these two approaches describing how conclusions are arrived at is widely adopted today. Instead, data mining hopes to convince the user of the value of discovered results on the basis of the "unusualness" or "interestingness" of the patterns alone. When the probabilistically unexpected occurs, we shock the user to attention. Since a lot of data mining is currently exploratory in nature, the next step in the process is typically for the user to subject these patterns to a more formal and rigorous evaluation. It is only at this stage that we expect the user to construct a traceable path from raw data to observed result. This typically involves formal statistical testing, and the computation of p-values, etc. The argument to support or refute the pattern can then be made on technical grounds associated with formal statistics. But because many data mining users are not formally trained statisticians but rather high-level business executives this approach is problematic. A non-technical executive may not appreciate getting buried in technical minutiae in order to evaluate a data mining result. Data mining should appeal to the non-technical end user as well. The choices seem to either accept a pattern on faith, or learn to deal with the formal aspects.
An alternative would be to use what Professor Robert Abelson at Yale refers to as the "MAGIC criteria". MAGIC is an acronym for Magnitude, Articulation, Generality, Interestingness and Credibility. These are properties of data, analysis and presentation that can be used to evaluate the persuasiveness of a statistical argument. It can and should be applied to data mining - but rather than use technical metrics that can be associated with each of these properties, understanding of these properties should be sufficiently comprehensible to a lay user and useful for evaluating results.
The descriptions of these properties are: "M" is for magnitude. It refers to how large the observed effect is. "A" is for articulation. This refers to how specific a pattern is described. For example, "Yogurt selling well on Thursdays and Fridays" is better articulated than "Yogurt sells well on specific days of the week". Well articulated patterns are also more "actionable" (actionability refers to the ease with which a data mining pattern can be translated into subsequent business decisions). The "A" could then also refer actionability. "G" is for generality. It refers to how a pattern can generalize beyond a particular data set, or time frame etc., "I" is for interestingness. This refers to the likelihood that a result can change a user's currently held beliefs. "C" is for credibility, or how believable a pattern or result appears to be.
There are of course, formal metrics for each of these - but presenting them to end-user would seem to defeat the point. Instead, the idea would be to present these in an appropriate way to the end-user, perhaps with well understood measures, so that a non-technical end user can adequately evaluate data mining results. We can then all appreciate the MAGIC behind data mining.
---
For more information, see http://www.virtualgold.com.