[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

LINKING THE DATA MINING AND PREDICTIVE SERVICES ENVIRONMENTS
by John K. Thompson


Previously, I have described how data mining systems are being decomposed into more granular components of functionality. In this article, I will describe the initial efforts to build an open standards based technology that enables the packaging of knowledge gained during data mining into portable structures. Implied by the description "portable structures" is the ability to transport these models between firms and locations. The transportation of models and ensembles of models will be facilitated by existing networking capabilities such as the Internet, Intranets, Extranets, and other communication infrastructures.

Currently, there is an on-going effort to build a group, comprised of companies in the United States and Europe, which have common interests in establishing a standard for sharing models between vendor environments. The companies currently engaged in conversation range from data mining software and service firms, hardware vendors, end users, and leading experts in the field of data mining and advanced analytical applications. The work is on track to announce the group's goals, operating structure, membership, and delivery time frames before the end of 1998. I would prefer to be more specific on time frames, but I will freely admit that building consensus among such a diverse group of very large and very small firms is more challenging that building software to spec and on time, which is quite difficult in and of itself.

The portability of predictive models is to be accomplished through the establishment of a new markup language named Portable Predictive Modeling Language (PMML). PMML will be yet another markup language in the rapidly expanding collection of facilities being defined within the parameters of Extensible Markup Language (XML). The World Wide Web Consortium (W3C) has recently adopted and announced XML as the standard metalanguage for defining industry and application specific markup languages. Some of the proposed languages that have been announced are Precision Graphics Markup Language (PGML), Mathematical Markup Language (MathML), and Chemical Markup Language (CML). For a concise overview of the newly proposed XML-based markup languages, refer to the May 25, 1998 issue of INFORMATIONWEEK at http://www.informationweek.com for the article by Jason Levitt.

The value proposition for PMML is based on these premises:

  1. Firms implementing data mining applications and solutions will want to share models within their organizations and with other firms that are involved in related business practices (i.e. Insurance firms, financial services providers, individual agents, and related companies).

  2. As market pressure reduces the price of data mining applications and systems, end user firms will implement multiple applications and systems, from multiple vendors.

  3. As data mining systems are decomposed, deployment environments and data mining systems will be purchased from different vendors. An example would be that a consumer packaged goods (CPG) company will buy statistical modeling tools from SPSS. The models from SPSS will be exported to PMML based structures and deployed in the predictive services engine from Magnify for production use.

  4. As the PMML standard matures and expands, models based in various algorithms will be able to be merged to produce hybrid models providing more accurate scores and profiles in a real time manner. A representative example would be the merging of a Classification and Regression Tree (CART) based model for consumer credit risk with a Logistic Regression model for bankruptcy to produce a composite score on the probability of a sector of the client base or individual clients defaulting on their financial obligations.

The ultimate goal for the group, the name of the group is currently being debated, but the Data Mining Group (DMG) seems to be the front runner, is to formulate, promote, and implement a standard language that moves the data mining industry away from proprietary stand alone systems to a federated systems approach. The federated approach allows each vendor to build software and systems in the areas of data mining and predictive services based on technological and philosophical premises which are defensible and provide true differentiation, and yet allow the firms using the systems, the end users, to operate in a way that provides interoperability and leverage across their investments in the base level technology and the value added applications that they either build or buy.

As my colleagues and I move this initiative forward, I will provide updates on the progress of the group and of the technology. I am certain that the latter will be of more interest than the former. We are working hard to bring a robust standard into the industry while trying to minimize the political agendas and the non-productive activities that we have seen in standards groups that have gone before us.

As always, your input is welcome.

---

John Thompson, Vice President - Marketing, Magnify, Inc. You can reach me at jkt@magnify.com


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]