DATA MINING: THE TWO CULTURES, PART II
by Robert Grossman
4. Machine Learning
The essence of data mining is machine learning and this is what
occurs in Step 6 above. Both the PM and DM traditions, start
with a space of learning sets L. Each element of the space is a
particular learning set, that is some data set which is to be
analyzed automatically.
The PM tradition requires a space of models M. In the fraud example above, M is the space of binary classifiers (0 for normal and 1 for fraud).
In the PM tradition, data mining can be thought of as a map from L to a space of models M:
L -------> M (PM perspective)
The input is a data set and the output are (one or more) models: the goal is to produce as accurate a classifier as possible.
For the fraud example, two measures are relevant: the detection rate of the model and the rate of false positives. In practice, increasing the detection rate of the model usually is accompanied by increasing the false positive rate.
The KD tradition replaces the space of models M with a space of assertions or predicates P. To be more concrete, a simple type of assertion is a conditional: if X then Y. For example, if a credit card transaction is for less than $2 and the transaction occurs at a gas station, then the transaction is fraudulent. Here X is the conjunction that the transaction is for less than $2 and occurs at a gas station and Y is claim that the transaction is fraudulent.
From the KD viewpoint, learning can be thought of as a map from L to a space of assertions or predicates P:
L -------> P (KD perspective)
The input is a data set and the output are (one or more) assertions. The goal is for the predicate or predicates discovered to be as relevant and useful as possible.
For the fraud example, three measures are relevant: An assertion has confidence c% if c% of the transactions that contain X also contain Y. An assertion has support s% if s% of the transactions contain X or Y. Finally, one can attach various complexity measures to the assertion. The goal is to find assertions with low complexity and high confidence and which cover an interesting set of examples.
When someone talks about automatically extracting patterns or automatically discovering information, what they really mean is that there is an algorithm which takes a learning set and produces a model (in the PM tradition) or which takes a learning set and produces one or more predicates (in the KD tradition). When there was less digital data and more analysts, automating this step was not as important. Today, with so much more digital data than can ever be analyzed the automation of this step (Step 6 in the data mining process) is a key enabling technology for a variety of scientific, engineering, and business problems.
A deeper understanding of machine learning requires that the two maps above be understood in a probabilistic framework in the sense that one tries to understand the probability that the model or assertions produced is accurate.
5. Impact and Implementation
In this section, we discuss some of the practical and organizational issues in data mining projects.
An executive involved in a data mining project is responsible for making sure that the results of the project can be effectively exploited by the organization. For projects with a KD focus this means that the modeling or analyst group understands the role of data mining for assisting them and are not threatened by a new technology and that reports summarizing discoveries reach the relevant decision makers. For projects with a PM focus this means that the operational managers are included in the early discussions so that the predictive models produced can be easily exploited by the organization.
When designing and implementing data mining systems, the data mining administrator (DMA) must understand whether the primary goal is 1) to improve predictive modeling of an important business process (PM) or 2) to give analysts and modelers new knowledge and insights (KD). Those offering professionl services involving data mining must also be aware of the same distinction.
6. Summary
Both the predictive modeling (PM) culture and the knowledge discovery culture (KD) are essential to data mining. In some sense, data mining is about the interaction of these two cultures and the scaling up of traditional algorithms and systems from small data sets to the large data sets which are common today.
Machine learning is a step in the data mining process. In the PM culture, this step takes a learning set and produces one or more predictive models. In the KD culture, this step takes a learning set and produces one or more assertions (which are interpreted as discovered knowledge). The essence of data mining is that data mining automates this. With the amount of data growing so quickly it is simply no longer practical to develop all predictive models or assertions by hand.
The PM tradition favors accuracy over understandability; the KD culture favors understandability over complexity.
Developing a good practical solution to a data mining problem requires understanding both the PM and KD perspective and implementing a solution incorporating techniques from both cultures. Some problems benefit from a viewpoint emphasizing the PM perspective; others from a viewpoint emphasizing the KD perspective. It is important to understand both cultures and the expectations and objectives of the project team if an appropriate data mining solution is to be successfully developed.
References:
[1] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "From Data
Mining to Knowledge Discovery: An Overview," in Advances in Knowledge
Discovery and Data Mining, edited U. M Fayyad, G. Piatetsky-Shapiro, P.
Smyth, and R. Uthurusamy, AAAI Press/MIT Press, pp. 1-34, 1996.
For more information, see http://www.magnify.com