[ Table of Contents | NEXT ARTICLE ]

DIFFERENT ANSWERS TO THE SAME BUSINESS QUESTION(S): PART II - RESOLVING THIS DILEMMA
By Ed Colet


Last week in part I, I addressed how and why different analytical approaches on the same data can yield different results - results that may not be consistent with each other. In other words, there are different answers purporting to solve the same business issue. This week in part II, I address ways to resolve this dilemma should it occur. There are several approaches that are possible, and they vary in the extent of their involvement in technical detail: (a) use an evaluation data set, (b) conversion to general measures, (c) convert internal measures of interestingness, and (d) match assumptions with domain knowledge.

Consider a running example of a hypothetical business issue of trying to find interesting patterns related to the reasons customers may cancel or renew their contracts. This scenario is one that's broadly applicable to insurance, telecommunications, banking, media (subscriptions), and other domains. Impressively quantitative arguments based on formal statistics and computational resources to predict who cancels, and why, have resulted in different and inconsistent recommendations. What's the business executive to do?

Use An Evaluation Data Set

An evaluation data set is an unanalyzed portion of data that was not used in the analyses but has similar characteristics as the data that were used in the original analyses. If you have different conclusions from different models, this evaluation data set can be used to test each of the models - and whatever model or approach most accurately describes, predicts, and/or fits this evaluation data set is the "better" model. In fact, this approach of using an evaluation data set is already done as a routine part of various data mining model building techniques (and is used to evaluate the technique itself). In current settings where there's an abundance rather than a scarcity of data, extracting a portion of data to serve for evaluative purposes may a prudent decision.

Conversion To General Measures

Analytical techniques have various indices and measures for assessing how well they perform on the data. Sometimes, these internal measures can be converted into more general measures that can apply to several different models. As such, one isn't comparing "apples vs. oranges", but comparing models against a common standard related to the business issue at hand. In our hypothetical cancel vs. renew scenario, the results of each model can be viewed in terms of a confusion matrix. A confusion matrix is a 2x2 table that shows the proportions that are correctly classified (predicted to cancel and have cancelled; predicted to renew and have renewed) and the proportions that are incorrectly classified (predicted to cancel, but renewed; predicted to renew but cancelled). A confusion matrix can be generated for any type of classification or prediction task whether the underlying technique is a neural network, a decision tree, a regression model, a discriminant analysis, etc.

When looking at a confusion matrix, it's not always entirely clear what is the best model. The performance of a classifier is influenced by the costs of a classification error. Erroneously thinking that someone will cancel when they instead will renew does not have the same business costs as erroneously thinking that someone will renew when in fact they will cancel. The latter may be more costly than the former, and thus the errors should not be considered to be equal. Better models take into account these cost tradeoffs. So, in addition to generating a confusion matrix, one can generate an ROC (receiver operating characteristic) curve. An ROC curve is a graphical view of the probabilities of a correct and an error judgment as the payoff matrix changes. Plotting the ROC curves of various models allows one to compare models while also taking into account their behavior as the assumptions of the costs of errors vary. Regardless of the underlying algorithm or approach, the better model can be determined.

Converting Interestingness

The implementation of a data mining algorithm has a formal criteria that determines whether a pattern(s) is interesting. This measure is usually unique to the data mining approach. But by delving into the inner workings of the algorithm's technical detail these interestingness criteria can be mapped to each other. For example, support and confidence measures are commonly used to measure the frequency with which a pattern occurs and the conditional probability that this pattern will hold. Related to both of these is a measure called Lift. Lift can be thought of as a measure of how much "improvement" the pattern will provide for you vs. not using the pattern. Support, confidence and lift can be mathematically related and "re-written" in terms of each other. Because probability formalisms underlie many of these algorithms it's possible to then express measures associated with one approach in terms of measures associated with another approach.

In terms of reconciling different results from two approaches, one could theoretically do the following. Initially take all patterns marked as interesting based on one approach and keep only those patterns that would also be considered as interesting in terms of the criteria used by the other. Finally only retain patterns that are interesting in terms of both approaches. The resulting patterns are more likely to be the interesting patterns that are robust and stable.

Assumption Matching

Every analytical technique has implicit and explicit assumptions about the data. When faced with different results to the same business issue, one could also look at the technical details to determine what these assumptions are. Sometimes assumptions are inconsistent with what is known about the domain. For example, an algorithm may assume a uniform distribution of males and females in the data - when in fact this might not be true for a particular domain. Subsequent patterns that then show a disproportionate number of cancellations are made by females may not be true if the correct base rates for females were incorporated. A different algorithm may not have made this assumption, and thus not found this "pattern". Under these circumstances, results based on information that is consistent with domain knowledge is the better model.

Last but not least, one can always choose to not act on any data mining results or analysis if they are all inconclusive and inconsistent with each other - and simply conclude that the results are inconclusive.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see http://www.virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]