Next Article Table of Contents Previous Article

DATA, ANALYTICAL METHODS, AND BUSINESS OBJECTIVES
by Ed Colet

Data, Analytical Methods, and Business Objectives: Three aspects of data mining.

A successful data mining solution can be characterized as the application of effective algorithms on rich data sets in order to acheive well-defined business objectives. These three aspects of "analytical methods", "data", and the "business objective" are all inter-related. In this column, I address the relationship amongst these three aspects, and how the characteristics of any one of these aspects will influence the others.

In order for a data mining application to have a strong chance of success, each of these three aspects must be well-defined. If one or more of these aspects remain vaguely defined or poorly articulated, any data mining application/solution is less likely to succeed. An organization that has well-defined business objectives, but lacks any data is obviously not in a position to make use of any data mining at all. Having large amounts of stored data that remain untouched by analytical tools because there isn't a clear business objective to justify an analysis is also a situation that does not bode well for the chances of a data mining solution to succeed. Instead, it is necessary for all three aspects to be sufficiently well-defined.

There is an inter-dependent relationship among these three aspects as well - in that the defining characteristics of one of those aspects will have implications (or constraints) on the other two. This can be illustrated as follows.

If the intent is to provide a commercially viable data mining solution for a customer (e.g. an organization) it is usually effective to first have the customer define a business objective. Depending on the domain, examples may be to increase profit, to reduce churn, to detect fraud, etc. From a well-defined objective, one can then assess the nature of available data. For example, one has to consider several questions such as: Are data available? Does the data contain attributes relevant to the business objective? If not, can relevant attributes be derived from existing data? Should additional data collection mechanisms and procedures be developed? Having a clear understanding of the characteristics of data then affects the types of analytical techniques to consider. For example, if the data are primarily made up of categorical values (i.e. non-numeric values) then one can not apply algorithms designed for numeric data. By the same token, if numeric attributes are used, but they are utilized to indicate categories (e.g. 1: East, 2: West, etc), then applying a numerically oriented analytical techniques such as computing a mean would be meaningless. Defining a business objective first, then assessing data, and then applying an appropriate method is an example of a situation in which all three aspects are well-defined.

One need not always start by articulating a business objective first. If the intent is for an organization to simply explore and the feasibility of data mining in general, the starting point may be the organization's data. In other words, an organization may start by asking, "what can be done with the data that has been collected and stored?" So, depending on the data characteristics, a variety of analytical techniques can be demonstrated in terms of plausible business scenarios. For example, deviation detection methods can be employed to detect outliers in the data that may be of interest. Surprising similarities among attributes can be discovered with the use of association rules analysis. Grouping of similar data base records relies on clustering analyses. Grouping of records on the basis of values of one attribute calls for classification techniques. Results from each of these can then be evaluated by the organization in terms of plausible business objectives to determine whether a full-scale deployment of a data mining solution would be valuable. Thus, in this scenario, one proceeds by first defining the data characteristics, then the analytical techniques, and then evaluating results in terms of defined business objectives/scenarios.

In cases when the intent is one of pure research, the starting point is often an analytical technique itself. For example, a "better" algorithm for detecting patterns may be invented in the course of researching computational methods. This is then typically tested on artificial data sets that contain intentionally hidden patterns. Testing the algorithm's performance is based on simulations in which the characteristics of such data are systematically varied. For example -- by adding more records to test scalability issues. At this point, data characteristics have thus become well-defined. If the algorithm's performance appears promising, hypothetical business scenarios/objectives are created to demonstrate practical applications for using the algorithm beyond a research exercise. In a pure research environment, defining the analytical technique, then generating data, and then considering business objectives is a common sequence.

So regardless of whether the goal of one's work in data mining is one of pure research, or is geared completely to developing commercial applications, it is necessary to have the three related aspects of "data", "objectives", and "analytical methods" sufficiently well-defined so that the data mining efforts succeed.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see www.virtualgold.com

Top of Page


Previous Article  |  Table of Contents  |  Next Article