ON THE IMPORTANCE OF AN EXPERT INTERPRETATION
by Ed Colet
The analysis of data through data mining, traditional database querying, or formal statistical analyses will always generate output that needs to be interpreted before conclusions can be drawn. Finding the information buried within the data is one thing, interpreting it is another. Both are equally important. In this column, the importance of having the expert interpretation is illustrated, and some ways that analytical tools can facilitate this are discussed.
The "Register Guard", a local newspaper in the Eugene, Oregon area ran a front page story earlier this summer with the headline stating that the "Popularity of running tails off, slows to a walk". The article's claim of a decline in the popularity of running was apparently supported by anecdotal evidence and some statistics such as the decline in numbers of participating runners and decreased sales of running shoes. A rebuttal to this claim was submitted by Joe Henderson, a dedicated runner and writer for "Runner's World" magazine. A version of his counter arguments makes up his column in the November issue of "Runner's World".
His counter-argument offers an alternative interpretation for the decreased shoe sales, and some other analysis. The decline in running shoe sales can be attributed to the decreasing purchases made by marginal runners -- those who run less than three times a week, and only up to a mile or two at a time. It's also unlikely that these are the type of people that would regularly enter and participate in road races. It's also known that even non-runners purchase running shoes for casual wear -- and the decline is therefore possibly based on changes in what's currently fashionable. He also points out that if you look at other measures indicative of behaviors of non-marginal runners you find that statistics indicate sustained growth.
The number of running magazines and books devoted to running has increased. US Road race entries are up by 10%, and more people ran marathons last year than in prior years. Therefore, he writes, "Reports of running's decline are greatly exaggerated."
All of the above illustrate several important points relevant to the development of data analytic tools. First is that a statistic or pattern (e.g. a decline in running shoe sales) can be interpreted in different ways leading to vastly different conclusions. Second is that the more accurate interpretation is one that is provided by a domain expert as opposed to a lay person.
But many data analytic tools are very good at analyzing data, retrieving answers, and discovering hidden patterns; they are not as good at supporting the equally important process of supporting the interpretation of these results. Subsequently, the usefulness of the tool turns out to be only as good as the user using it. Given the sophisticated analytical skills one needs to use many of the current data analytic tools, it seems that vendors might think that by making their tools usable only by expert users, then their tools will output expert results. The problem with this is that they're targeted to the wrong expert-user -- the quantitative expert may not also be a domain expert, the one who really knows the business domain and who can interpret discovered patterns.
It is therefore important that domain expertise be incorporated in data analytic tools. One approach is by formalizing expert knowledge into the underlying algorithms - perhaps in the form of business rules. But building an expert system is a difficult and complex undertaking. Another approach is to make it easy for a domain expert (who may not be quantitatively sophisticated) to ask follow up queries and pursue his/her natural (but expert) chain of reasoning. A consistent series of follow up queries can be automated and included in the output reports. This approach has the benefit of supporting the equally important parts of data analysis and data interpretation.
Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.
For more information, see http://www.virtualgold.com.