Next Article Table of Contents Previous Article

ALLOWING TECHNOLOGY TO GENERATE SOLUTIONS TO PROBLEMS
by Ed Colet

Technologies such as data mining are typically used as powerful tools for rapid calculations. The "real" and truly creative work of generating a solution to a problem, and deciding on a course of action remains the purview of the domain expert or end-user decision maker. In this column I take a look at some evolving trends that suggest that technologies can play a greater role in actually generating new and creative solutions -- ones that a human domain expert wouldn't have discovered or thought of on their own.

A somewhat simplified view of the current process of knowledge discovery through data mining can be summarized as the following sequence of four operations: (1) The use of technology as a fast calculator. Powerful technologies are applied to large data sets to find hidden but potentially important patterns buried within the data. The use of powerful computational power is what enables efficient searching of the large data space. (2) Interpretation of results by a human expert. Potentially important patterns are presented to the end-user analyst or domain expert for interpretation. (3) Follow-up analysis. Discovered patterns are subject to further analysis to examine their veracity. This can be done via formal statistical testing or by issuing additional queries to test for expected predictions or logical inferences. (4) Decision making: The process concludes with a decision or course of action being decided.

Most of this process is human intensive. But humans are notoriously poor at articulating the reasons that underlie their skilled insights. It is the insight provided by a human domain expert that guides the follow-up analysis by knowing exactly what predictions and inferences should follow from a pattern and then testing for these. In the situation when the same data mining application is being used to examine the same data, the better domain expert making the better decisions will come out on top. For example, in the case of two NBA coaches using Advanced Scout software to analyze game data, the difference comes down to which person is the better coach (assuming the players from both teams are equally talented and skilled). So the human still brings the critical element to the process, and ultimately determines the effectiveness of the application. But it's extremely difficult to "capture" and "replicate" the steps followed by the skilled domain expert.

But this paradigm may be changing in the near future based on some interesting developments in the area of Inductive Logic Programming (ILP). The evolution of ILP as applied to data mining can change both the role of technology and the role of the domain expert. ILP enables the computer to play a greater role in generating (or suggesting) a solution to a problem by encapsulating and learning from encoded rules of human reasoning and inferential processes. Knowledge rules can be associated into a theory, and the computer examines subsequent inferences that follow. So rather than relying on technology to only act in the role of a fast calculator, technology can take on some of the tasks associated with follow up analysis and inference testing.

Some successful applications of ILP approaches are being used in the medical sciences field. These range from finding alternative pharmaceuticals for diseases such as Alzheimer's, and to finding improved approaches in embryo selection for IVF implantation. Computer generated solutions to problems in drug design or embryo selection share the characteristic that there are multiple variables and attributes that have subtle and not-well-understood interactions amongst themselves. Computer generated solutions in these areas have been remarkably creative and have opened up promising new directions for research.

If the role of technology expands towards creative inferences, what is the expected role for the human domain expert? As technology grows into a co-collaborator, the human domain expert is still necessary to evaluate a computer-generated solution. The follow-up process of evaluating a solution is thus not that different from the present role of a domain expert in interpreting a discovered pattern. But an additional task for the human domain expert would be to add additional knowledge into the program representing real-world constraints so that subsequent computer-generated solutions are feasible in addition to being novel. A remaining challenge is in finding feasible ways for a domain expert to articulate and encode such knowledge, and transfer it into the technology.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see www.virtualgold.com.

Top of Page


Previous Article  |  Table of Contents  |  Next Article