A PRIMER ON THE PROCESS OF DATA MINING
by John K. Thompson, Vice President Marketing, Magnify Inc.
Magnify, Inc. the Chicago-based data mining company, recently published a white paper titled "The Data Mining Process - A Primer". Intended as an easy-to-read overview for an executive and managerial audience, the paper describes the five phases of the data mining process and introduces some of the latest concepts in data-mining such as portable predictive models, predictive services, and closed-loop decision making. The following, excerpted from the introduction to the white paper, indicates some of the other issues it addresses:
The term 'data-mining' has become quite popular in the business world recently, and it is used interchangeably to describe either of two concepts.
In the first context, data-mining refers to the mathematical approach used to discover patterns, associations, changes, and anomalies within a defined set of data. The other interpretation refers to the entire process of data-mining - the set of activities which begins with the preparation of data for the purpose of applying data-mining algorithms, and ends with an analysis of the resulting information. The white paper presents a detailed review of the steps involved in the data-mining process, thus focusing on the latter version of the term.
The data-mining process is quite similar to the process of data-warehousing, and the two, while not strictly dependent, are most certainly interrelated. Data-mining can be performed without data-warehousing and the reverse is true as well, but the majority of the effort and work that is expended to build a data warehouse attains better leverage and return on investment if data-mining is performed on the data it contains.
In both disciplines, it is critically important to understand and follow a well-defined process in order to achieve the goals of a given project. To the casual observer, successful projects in data-mining seem to follow the same process as that of successful data-warehousing projects. In Magnify's view, however, there are distinctions between the two that are subtle, yet critical. A second objective of the paper is to clearly delineate the differences between data-mining and data-warehousing.
The importance of the data-mining process is well recognized and widely documented by industry experts. In a typical corporate setting, however, several rounds of hands-on data-mining projects are necessary before corporate IT staff feel comfortable with the operational components of a data-mining system. Another objective of the paper, then, is to impart a clear understanding of what is required, from an operational framework perspective, to build a solid foundation that supports a data-mining project.
(If you would like to read more, a copy of the entire white paper can be obtained by contacting John Thompson at jkt@magnify.com )