Next Article Table of Contents Previous Article

HUMAN AND DATA MINING EXPERTISE
By Ed Colet

Data mining technologies are heavily associated with human expertise at many levels. We've seen that successful solutions have often required a team of analytical experts, domain experts, and technology experts. In this column, I examine the underlying aspects of expertise -- and how two facets of expertise when implemented in a data mining solution can result in a successful solution providing long-term and sustained benefits.

Imagine a situation in which for a given business problem, there is a person that can routinely point out hidden but meaningful information that strategically affects your decision making for the better. This person would be kept in high regard as an invaluable asset and compensated accordingly to reward and sustain this type of performance. But consider a slight difference in which a person routinely points out hidden information -- but information that seems to come out of the proverbial left field - seemingly random contributions that are only loosely connected to the problem at hand. Because it is difficult to know what to make of this information, it's not useful in any practical way although it may be mildly entertaining. This person would be considered an eccentric, rather than an expert. One would think that the difference between the expert and the eccentric are their contributions to the business problem (or lack thereof). But it's deeper than this - it goes to their underlying information processing abilities.

Instead of a person being in the above scenario, consider data mining solution/application in the role of the person instead. The "expert" system is useful; the "eccentric" system is not. Even if both systems relied on the same algorithms to analyze the same data, the design and deployment of the solution determines the type of system it will become. How can expertise be designed into a system?

An answer comes from research in cognitive science that focused on human expertise. It is known that a characteristic of expertise is the presence of an extensive knowledge base. As we'll see, the existence of a substantial knowledge base helps one encode information in meaningful ways. Research (e.g. by Hayes) has shown that regardless of the domain there are no short cuts for humans in building their knowledge based -- it takes about 10 years to develop a sufficiently rich knowledge base. For a businessman this could result from years of experience in the field; for an expert chess player it could come from years of experience with sophisticated chess strategies.

In addition to an extensive knowledge base, cognitive science has discovered a second and subtler component to expertise is the use of superior encoding strategies. Early studies (e.g. by de Groot) researched the performance of chess players to try and account for the ability of experts to remember positions of pieces on the board, and even to re-create entire games. Superior memory seemed to be the obvious explanation. But it was later work by Chase and Simon that determined that experts didn't have better memory than non-experts, but that their ability to remember positions of chess pieces on a board was based on the fact that they encoded meaningful and named patterns (e.g. Queen's Gambit) into memory. When pieces were placed in random or impossible positions, experts were no better than novices at remembering them. What's important is that experts don't encode discrete and isolated items, but encode whole formations, and thus it appeared that experts could encode "more" information.

For a data mining system to become as useful as an expert, two aspects of expertise should also be in place -- a rich knowledge/data store, and effective encoding or presentation of information. Fortunately, unlike humans, developing a knowledge base need not take 10 years. A knowledge base consists of a long run data store of facts and relationships among facts. The facts may already exist in the form of repositories of historical data stored in a company's databases. Past and present queries and systematic analyses of such data is a start for building the relationships among facts. Transforming this into a rule-system is one way to create a knowledge base. A second approach is to automatically discern and learn the relationships among facts (attributes in data) and to store this learned information back into the knowledge base as it's accumulated. In other words, data mining results then become part of the data for future mining.

A knowledge base makes it possible for the second principle of expertise to occur. Superior and effective encoding/presentation of new information can be implemented as follows. New information is determined to be interesting and potentially useful only in the context of what's stored in the knowledge base. This approach has been implemented in some of our work with IBM's Advanced Scout software for professional basketball coaches. Advanced Scout takes patterns and automatically indexes them with video segments so that interesting patterns can be placed in an appropriate context (video) for interpretation. So, rather than a pattern presented in isolation, it's presented against a context of other prior knowledge. The end result is much like a human expert pointing out interesting and useful information, rather than isolated and random information.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see www.virtualgold.com.

Top of Page


Previous Article  |  Table of Contents  |  Next Article