[ Table of Contents | NEXT ARTICLE ]

THE MATURATION OF DATA MINING
by Ed Colet


Data mining and its promise of knowledge discovery through the automated detection of hidden patterns in data is a technology that is undergoing the process of maturation. At first, technical issues are more important and significant than the use of such technology to solve problems. Basic research is more important than applied research. In these early stages, ways to apply technology may not even be known, and if focused upon can even compromise scientific progress. A mature technology on the other hand, is one that is characterized by a situation in which application of the technology is more important than the underlying advancement in basic principles. This is not to say that basic research isn't conducted in mature technologies (it still is), but the field is characterized by evolutionary rather than revolutionary changes. So where is data mining today?

Depending on whom you speak with there are differing views on the maturity of data mining. Some folks believe that data mining is still a young technology in which addressing technical issues is most important. If you talk to researchers, developing the "right" algorithm, coming up with the best clustering technique, or finding the optimal optimization routine are more significant than it's application. It is felt that technically sound techniques will eventually find their way towards solving problems. Other folks, (mostly applied practitioners and consultants) believe that the underlying techniques and the technology aren't as important as their application(s). A neural network is a neural network, whether it's got an extra level of hidden layers and a different architecture really don't matter. What's important for example is whether it can be used to classify credit-worthy loan applicants.

In our view, data mining is a young and rapidly maturing technology. In order for data mining to fully mature, the following three sequential stages must occur: A solid foundation in basic technology, the presence of compelling applications and solutions, and a useful infrastructure for developing such solutions. Characterizing data mining in the context of these stages can impact one's considerations of using data mining in one's business.

Three conditions for a maturing technology:

One prerequisite for successful technology is a firm basis in underlying science. I believe that this exists in the data mining field today. It has deep roots in computer science (AI, databases, etc) and mathematics (statistics). Early scientific papers can be characterized as "competing" algorithms. Today, success in the data mining arena may no longer depend on the best algorithm, or the best technology, but on the best uses of technology. In the early stage of data mining, the thought was that the better algorithm could basically solve any problem. This is somewhat dangerous in that it resulted in a situation similar to one where "if you have a hammer, everything looks like a nail". In the early days there was concern about the hype associated with data mining and the problems it promised to solve. We seem to have moved past this period onto the next stage.

A second condition for maturing technology is the presence of compelling applications. I also believe that this exists today. There are numerous success stories about the benefits of data mining - ranging from retailing to sports. The point is that given competing technologies, success can be determined by the applications and solutions. For example, the greater availability of video content via VHS rather than Betamax formats led to the use of VHS as the standard. In terms of operating systems the lack of applications for OS/2 proved to be problematic despite it's arguably superior underlying technology. Here at Virtual Gold, some of our work has been in providing compelling data mining applications for the sports domain.

The third condition and one necessary for sustained growth - is an efficient way to develop and deploy such applications. This means that an infrastructure for such development must be in place. In terms of software development, this includes developer kits, and more recently technology such as our patent pending VirtualMiner Framework.

In the current state of data mining, this third condition is currently lacking. The consequence is that the adoption of data mining technology into corporate environments can be complex and costly. Application development still relies upon the use of generic programming tools that need to be customized to fit vertical markets. Highly specialized programming skill is required and this translates to high costs. After development, high levels of technical and/or analytical sophistication are often required for using applications effectively. Also, these highly trained analysts are oftentimes not the same people responsible for making decisions based on discovered patterns and can lead to delays between analysis and action.

Having an infrastructure or framework that eliminates these difficulties is a way to hasten the maturity of the field - and represents an evolutionary (even revolutionary) step in the maturation process. By providing developers with tools to develop customized query interfaces that are cross platform, Web-enabled, and can take advantage of existing database query applications, our patent pending VirtualMiner Framework will make it easier to develop compelling applications that are built on top of existing technologies.

Our recommendation to those considering adopting data mining, is that a primary concern should be with the ability to build solutions rapidly, rather than evaluating the underlying pattern detection algorithms.

---

For more information, see http://www.virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]