LINKING THE DATA MINING AND PREDICTIVE SERVICES ENVIRONMENTS
by John K. Thompson
Previously, I have described how data mining systems are being decomposed into more granular components of functionality. In this article, I will describe the initial efforts to build an open standards based technology that enables the packaging of knowledge gained during data mining into portable structures. Implied by the description "portable structures" is the ability to transport these models between firms and locations. The transportation of models and ensembles of models will be facilitated by existing networking capabilities such as the Internet, Intranets, Extranets, and other communication infrastructures.
Currently, there is an on-going effort to build a group, comprised of companies in the United States and Europe, which have common interests in establishing a standard for sharing models between vendor environments. The companies currently engaged in conversation range from data mining software and service firms, hardware vendors, end users, and leading experts in the field of data mining and advanced analytical applications. The work is on track to announce the group's goals, operating structure, membership, and delivery time frames before the end of 1998. I would prefer to be more specific on time frames, but I will freely admit that building consensus among such a diverse group of very large and very small firms is more challenging that building software to spec and on time, which is quite difficult in and of itself.
The portability of predictive models is to be accomplished through the establishment of a new markup language named Portable Predictive Modeling Language (PMML). PMML will be yet another markup language in the rapidly expanding collection of facilities being defined within the parameters of Extensible Markup Language (XML). The World Wide Web Consortium (W3C) has recently adopted and announced XML as the standard metalanguage for defining industry and application specific markup languages. Some of the proposed languages that have been announced are Precision Graphics Markup Language (PGML), Mathematical Markup Language (MathML), and Chemical Markup Language (CML). For a concise overview of the newly proposed XML-based markup languages, refer to the May 25, 1998 issue of INFORMATIONWEEK at http://www.informationweek.com for the article by Jason Levitt.
The value proposition for PMML is based on these premises:
The ultimate goal for the group, the name of the group is currently being debated, but the Data Mining Group (DMG) seems to be the front runner, is to formulate, promote, and implement a standard language that moves the data mining industry away from proprietary stand alone systems to a federated systems approach. The federated approach allows each vendor to build software and systems in the areas of data mining and predictive services based on technological and philosophical premises which are defensible and provide true differentiation, and yet allow the firms using the systems, the end users, to operate in a way that provides interoperability and leverage across their investments in the base level technology and the value added applications that they either build or buy.
As my colleagues and I move this initiative forward, I will provide updates on the progress of the group and of the technology. I am certain that the latter will be of more interest than the former. We are working hard to bring a robust standard into the industry while trying to minimize the political agendas and the non-productive activities that we have seen in standards groups that have gone before us.
As always, your input is welcome.
---
John Thompson, Vice President - Marketing, Magnify, Inc. You can reach me at jkt@magnify.com