DATA, ANALYTICAL METHODS, AND BUSINESS OBJECTIVES
by Ed Colet
Data, Analytical Methods, and Business Objectives: Three aspects of data
mining.
A successful data mining solution can be characterized as the application of
effective algorithms on rich data sets in order to acheive well-defined
business objectives. These three aspects of "analytical methods", "data", and
the "business objective" are all inter-related. In this column, I address the
relationship amongst these three aspects, and how the characteristics of any
one of these aspects will influence the others.
In order for a data mining application to have a strong chance of success,
each of these three aspects must be well-defined. If one or more of these
aspects remain vaguely defined or poorly articulated, any data mining
application/solution is less likely to succeed. An organization that has
well-defined
business objectives, but lacks any data is obviously not in a position
to make use of any data mining at all. Having large amounts of stored data
that remain untouched by analytical tools because there isn't a clear business
objective to justify an analysis is also a situation that does not bode well
for the chances of a data mining solution to succeed. Instead, it is
necessary for all three aspects to be sufficiently well-defined.
There is an inter-dependent relationship among these three aspects as well -
in that the defining characteristics of one of those aspects will have
implications (or constraints) on the other two. This can be illustrated as
follows.
If the intent is to provide a commercially viable data mining solution for a
customer (e.g. an organization) it is usually effective to first have the
customer define a business objective. Depending on the domain, examples may
be to increase profit, to reduce churn, to detect fraud, etc. From a
well-defined
objective, one can then assess the nature of available data. For
example, one has to consider several questions such as: Are data available?
Does the data contain attributes relevant to the business objective? If not,
can relevant attributes be derived from existing data? Should additional data
collection mechanisms and procedures be developed? Having a clear
understanding of the characteristics of data then affects the types of
analytical techniques to consider. For example, if the data are primarily
made up of categorical values (i.e. non-numeric values) then one can not apply
algorithms designed for numeric data. By the same token, if numeric
attributes are used, but they are utilized to indicate categories (e.g. 1:
East, 2: West, etc), then applying a numerically oriented analytical
techniques such as computing a mean would be meaningless. Defining a business
objective first, then assessing data, and then applying an appropriate method
is an example of a situation in which all three aspects are well-defined.
One need not always start by articulating a business objective first. If
the intent is for an organization to simply explore and the feasibility of
data mining in general, the starting point may be the organization's data. In
other words, an organization may start by asking, "what can be done with the
data that has been collected and stored?" So, depending on the data
characteristics, a variety of analytical techniques can be demonstrated in
terms of plausible business scenarios. For example, deviation detection
methods can be employed to detect outliers in the data that may be of
interest. Surprising similarities among attributes can be discovered with the
use of association rules analysis. Grouping of similar data base records
relies on clustering analyses. Grouping of records on the basis of values of
one attribute calls for classification techniques. Results from each of these
can then be evaluated by the organization in terms of plausible business
objectives to determine whether a full-scale deployment of a data mining
solution would be valuable. Thus, in this scenario, one proceeds by first
defining the data characteristics, then the analytical techniques, and then
evaluating results in terms of defined business objectives/scenarios.
In cases when the intent is one of pure research, the starting point is
often an analytical technique itself. For example, a "better" algorithm for
detecting patterns may be invented in the course of researching computational
methods. This is then typically tested on artificial data sets that contain
intentionally hidden patterns. Testing the algorithm's performance is based
on simulations in which the characteristics of such data are systematically
varied. For example -- by adding more records to test scalability issues. At
this point, data characteristics have thus become well-defined. If the
algorithm's performance appears promising, hypothetical business
scenarios/objectives are created to demonstrate practical applications for
using the algorithm beyond a research exercise. In a pure research
environment, defining the analytical technique, then generating data, and then
considering business objectives is a common sequence.
So regardless of whether the goal of one's work in data mining is one of
pure research, or is geared completely to developing commercial applications,
it is necessary to have the three related aspects of "data", "objectives", and
"analytical methods" sufficiently well-defined so that the data mining efforts
succeed.
Ed Colet is the Acting Director of Research at Virtual Gold Inc.,
responsible for developing analytical methods for data mining and for
investigating human factors and usability issues of business intelligence
systems. At present, he is in the final stage of completing a doctoral
dissertation in the Cognition and Perception program at New York
University's Department of Psychology. Ed has also worked for IBM Research
at the T.J. Watson Research Center. At IBM, Ed was a member of the group
that developed Advanced Scout, the data mining application for NBA teams.
His research interests focus on statistical methods and human factors.
For more information, see www.virtualgold.com
|