ALLOWING TECHNOLOGY TO GENERATE SOLUTIONS TO PROBLEMS
by Ed Colet
Technologies such as data mining are typically used as powerful tools for
rapid calculations. The "real" and truly creative work of generating a
solution to a problem, and deciding on a course of action remains the purview
of the domain expert or end-user decision maker. In this column I take a look
at some evolving trends that suggest that technologies can play a greater role
in actually generating new and creative solutions -- ones that a human domain
expert wouldn't have discovered or thought of on their own.
A somewhat simplified view of the current process of knowledge discovery
through data mining can be summarized as the following sequence of four
operations: (1) The use of technology as a fast calculator. Powerful
technologies are applied to large data sets to find hidden but potentially
important patterns buried within the data. The use of powerful computational
power is what enables efficient searching of the large data space. (2)
Interpretation of results by a human expert. Potentially important patterns
are presented to the end-user analyst or domain expert for interpretation. (3)
Follow-up analysis. Discovered patterns are subject to further analysis to
examine their veracity. This can be done via formal statistical testing or by
issuing additional queries to test for expected predictions or logical
inferences. (4) Decision making: The process concludes with a decision or
course of action being decided.
Most of this process is human intensive. But humans are notoriously poor at
articulating the reasons that underlie their skilled insights. It is the
insight provided by a human domain expert that guides the follow-up analysis
by knowing exactly what predictions and inferences should follow from a
pattern and then testing for these. In the situation when the same data
mining application is being used to examine the same data, the better domain
expert making the better decisions will come out on top. For example, in the
case of two NBA coaches using Advanced Scout software to analyze game data,
the difference comes down to which person is the better coach (assuming the
players from both teams are equally talented and skilled). So the human still
brings the critical element to the process, and ultimately determines the
effectiveness of the application. But it's extremely difficult to "capture"
and "replicate" the steps followed by the skilled domain expert.
But this paradigm may be changing in the near future based on some interesting
developments in the area of Inductive Logic Programming (ILP). The evolution
of ILP as applied to data mining can change both the role of technology and
the role of the domain expert. ILP enables the computer to play a greater role
in generating (or suggesting) a solution to a problem by encapsulating and
learning from encoded rules of human reasoning and inferential processes.
Knowledge rules can be associated into a theory, and the computer examines
subsequent inferences that follow. So rather than relying on technology to
only act in the role of a fast calculator, technology can take on some of the
tasks associated with follow up analysis and inference testing.
Some successful applications of ILP approaches are being used in the medical
sciences field. These range from finding alternative pharmaceuticals for
diseases such as Alzheimer's, and to finding improved approaches in embryo
selection for IVF implantation. Computer generated solutions to problems in
drug design or embryo selection share the characteristic that there are
multiple variables and attributes that have subtle and not-well-understood
interactions amongst themselves. Computer generated solutions in these areas
have been remarkably creative and have opened up promising new directions for
research.
If the role of technology expands towards creative inferences, what is the
expected role for the human domain expert? As technology grows into a
co-collaborator, the human domain expert is still necessary to evaluate a
computer-generated solution. The follow-up process of evaluating a solution
is thus not that different from the present role of a domain expert in
interpreting a discovered pattern. But an additional task for the human
domain expert would be to add additional knowledge into the program
representing real-world constraints so that subsequent computer-generated
solutions are feasible in addition to being novel. A remaining challenge is
in finding feasible ways for a domain expert to articulate and encode such
knowledge, and transfer it into the technology.
Ed Colet is the Acting Director of Research at Virtual Gold
Inc.,
responsible for developing analytical methods for data mining and for
investigating human factors and usability issues of business intelligence
systems. At present, he is in the final stage of completing a doctoral
dissertation in the Cognition and Perception program at New York
University's Department of Psychology. Ed has also worked for IBM Research
at the T.J. Watson Research Center. At IBM, Ed was a member of the group
that developed Advanced Scout, the data mining application for NBA teams.
His research interests focus on statistical methods and human factors.
For more information, see www.virtualgold.com.
|