SOME THOUGHTS ON THE CURRENT STATE
OF DATA MINING SOFTWARE APPLICATIONS: PART I
by Kurt Thearling
As a former developer of data mining software, I can understand how difficult it is to create applications that are relevant to business users. Much of the data mining community comes from an academic background and has focused on the algorithms buried deep in the bowels of the technology. But algorithms are not what business users care about. Over the past few years the technology of data mining has moved from the research lab to Fortune 500 companies, requiring a significant change in focus. The core algorithms are now a small part of the overall application, being perhaps 10% of a larger part, which itself is only 10% of the whole.
That being said, the focus of this article is to point out some areas in the remaining 99% that need to be improved upon. Here's my current top ten list:
When someone in marketing needs to have a database scored, they usually have to call someone in IT and cross their fingers that it will be done correctly. If the marketing campaigns that rely on the scores are run on a continuous (daily) basis, this means a lot of phone calls and lot of manual processing. Instead, the process that makes use of the scores should drive the model scoring. Scoring should be integrated with the driving applications via published API's (a standard would be nice but it's probably too soon for this) and run-time-library scoring engines. Automation will reduce processing time, allow for the most up-to-date data to be used, and reduce error.
3) Exporting Models to Other Applications: This is really an extension to #2. Once a model has been produced, other applications (especially applications will drive the scoring process) need to know that they exist. Technologies such as OLE automation can make this process relatively straightforward. It's just a matter of adding the "export" button on the data mining user interface and creating a means to extend the export functionality by external applications. Exporting models will then close the loop between data mining and the applications that need to use the results (scores). Besides exporting the model itself, it would be useful to include summary statistics and other high-level pieces of information about the model so that the external application could incorporate this information into its own process.
4) Business Templates: Solving a business problem is much more valuable to a user than is solving a statistical modeling problem. This means that a cross-selling specific application is more valuable than a general modeling tool that can create cross-selling models. It might be simply a matter of changing terminology and a few modifications to the user interface but those changes are important. From the user's perspective, it means that they don't have to stretch very far in order to take their current understanding of their problem and map it to the software they are using.
---
The concluding segment of this commentary will be published in next week's D S * .