Next Article Table of Contents Previous Article

ENTERPRISE INTEGRATION, DATA FEDERATION, AND THE DKMS
by Joseph M. Firestone, Ph.D.

Stonebraker's Enterprise Integration Solutions

In an eloquent article called "United We Stand," [1] Michael Stonebraker offered a viewpoint on the "islands of information" problem. He defined Enterprise Integration as ". . . the ability to read from and write to all of the applications and data sources across the enterprise." And he indicated: "Such integration supports unified views of information and lets you update synchronously across systems." Stonebraker goes on to distinguish four approaches to the "islands of information" problem: (1) application integration, an approach he associates with a number of commercial Enterprise Application Integration packages; (2) data warehousing systems; (3) messaging systems; and (4) data federation systems.

The application integration approach implements integration by attaching a layer of "glue" providing an interface from an external integration system to each application being integrated. Stonebraker faults commercial EAI systems for their: (1) failure to provide a unified view of data across databases; (2) inability to handle "conceptual mismatches; and (3) difficulties in scaling to the enterprise level.

Stonebraker views data warehousing as providing integration of applications at the data level. The integration is physical in that data must be extracted, cleansed, transformed and moved from their original data stores to data warehouses or data marts. He lists the following drawbacks of the data warehousing approach: (1) stale data, only periodically refreshed; (2) inflexibility in analysis due to unavailability of data not in a data warehouse or data mart; (3) copy costs providing a barrier to integration and therefore scalability in certain instances; and (4) overloaded resources because of the strain on the system caused by retail transactional histories.

Messaging systems connect applications by providing an information bus through which updates are published to subscribers. The drawbacks of messaging systems, according to Stonebraker are: (1) subscribers can integrate data only if the data owner has published it onto the information bus, (2) scalability problems, because the bandwidth of the information bus is too narrow to publish all enterprise data, and (3) difficulties in working with dynamic environments.

Data federation performs integration while leaving the source data in place. Multiple physical databases are integrated into a single logical one, and independent enterprise systems produce integrated global behavior through economic forces. Data federation systems retain local control while scaling to hundreds of machines. They also support a global view of enterprise data resources, dynamic load balancing across system resources, and adapt and adjust query execution accordingly.

After stating the four approaches Stonebraker makes three important points:

"Data federation systems are coming. The only way to integrate realtime data in many independent system is to federate the systems. . ." "Data Transformation is a required feature . . ." , since "there are few semantically identical data sets. . . In a data federation system, conversion is done as the end user accesses data." "You can achieve scalability to the enterprise only by logical integration." Messaging and data warehousing systems create physical integration in specific data stores. "Restricting integration to a single physical machine will prohibit enterprise level scalability. Only logical integration over multiple machines offers the possibility of enterprise scalability." And he concludes that an enterprise must decide when to perform application integration and when to perform data integration. And further, that application integration should be used "when performing update integration for small numbers of existing applications; while data integration should be used "when you need unified views or are integrating many systems."

If one does choose the data integration route, then the critical choices are between physical and logical integration, and between aged data and realtime data. Physical integration of aged data leads to data warehouses and data marts, while physical integration of realtime data leads to the Operational Data Store and messaging systems. Logical integration of either aged or realtime data, on the other hand, leads to data federation systems. Logical integration is necessary if it is impractical to move the data to a single location.

Finally, Stonebraker believes that there are powerful market trends moving away from data warehousing and toward data federation. One such trend is marked by business conditions "demanding realtime information, which warehouses can't provide." The second trend is the increasing need of businesses to respond to changing business conditions and external events. This need creates a corresponding need for IT solutions that can produce Dynamic Enterprise integration (DEI) (my term, rather than Stonebraker's), rather than static integration. The combination of these trends suggests a move toward data federation and away from data warehousing, because only data federation can produce the necessary access to both integrated realtime data about rapidly changing conditions, and integrated aged data providing the historical and baseline information, needed for survival and competitive advantage.

Stonebraker's treatment of enterprise integration is one of those possibly seminal articles providing a philosophy that can quickly grab hold of the IT industry and provide direction for a major cycle of activity. As such, it deserves careful examination from readers and critics alike. The issues he raises need a full airing from as many points of view as possible, before we quickly march down the road he has laid before us. The rest of this paper will try to provide this airing and also an alternative point of view -- the point of view of Distributed Knowledge Management System-based (DKMS-based) integration.

Natural and Artificial Systems Integration

Enterprise integration is not synonymous with either integrating data or integrating applications, as Stonebraker's viewpoint suggests. Enterprise integration refers to integrating a natural system, a social system, to be more precise; and not to integrating only one subsystem of this broader system -- its artificial computing system.

When we use the unqualified general term loosely, and apply it to artificial systems integration, rather than natural systems integration, we gloss over the fact that there isn't universal agreement about what we mean in this more specific context. We may all agree that we really don't mean enterprise integration as that term is applied to social systems. We also may agree that we all have in mind integration of certain aspects of the computing system of enterprises. But that is where our agreement ends.

Stonebraker distinguishes between:

Enterprise data integration and enterprise application integration, and also between physical integration and logical integration within the enterprise data integration category. But one can also distinguish:

Enterprise artificial information integration, and enterprise artificial knowledge integration, as major categories of approaches to enterprise artificial systems integration. One's view of what constitutes an adequate enterprise integration solution will depend on which of these types of integration one thinks is essential to the enterprise's ability to compete and adapt.

Stonebraker's view of enterprise artificial systems integration is also too restricted in its development of the application integration category. His discussion of data integration is relatively rich. His discussion of application integration is perfunctory, by comparison.

And, considering the relatively short-shrift he gives application integration, it is not unfair to say that his discussion is biased towards a data integration approach.

This is certainly the type of approach that has been most successful in physically integrating data in recent years. But you can't arrive at a fair assessment of its effectiveness for enterprise artificial systems integration through an analysis that either excludes its main competitors, or provides them with less than comprehensive consideration. Especially since the focus of our integration activity is now shifting from physical integration, which favors data-centric approaches, to logical integration, which does not necessarily favor data.

Enterprise Application Integration (EAI)

Stonebraker criticizes EAI for focusing on synchronized updates across applications and not providing a unified view of data across the enterprise. He also faults it because each application has its own version of some key concept such as "customer" or "order," and he thinks that reconciling such "conceptual mismatches" is much more difficult than integrating the records in an underlying database. Finally, he thinks that the scalability obstacle inherent in integrating 100 different applications by creating 100 specific adapters connecting the applications to the integrative "glue" of an EAI system is insurmountable. These criticisms of EAI are hard to sustain without far more detailed evidence and benchmarking results than Stonebraker presents in his article.

First, certain EAI packages such as Vitria Technology's BusinessWare [2] and Template Software's Enterprise Integration Template (EIT) [3] do provide a unified view of the enterprise. EIT provides a unified view of objects, data, and methods across the enterprise. Vitria provides a unified view of business processes and of data and methods as they relate to processes.

Second, certain EAI packages have the capability to handle conceptual mismatches. They do so by using an object model to provide multiple interpretations of the same data. Again, Template's EIT provides a good example of an existing product that uses a semantic object model to resolve ambiguities across enterprise stovepipes.

Third, Stonebraker's scalability criticism needs to be documented with much more evidence before it can be uncritically accepted. All of the major EAI vendors claim scalability, and many use federation architectures containing distributed application servers to back that claim.

NEON [4] and Vitria Technology, two of the examples cited by Stonebraker, claim scalability through distributed processing and broadly-based connectivity to varieties of data sources. Template, meanwhile, offers as an integrative mechanism a virtual, in-memory, cached, self-reflexive object model, resident in distributed application servers, along with connectivity to data stores and applications of all types.

The above is not to say that Stonebraker is necessarily incorrect in all his claims about the application integration approach. But he seems clearly incorrect in claiming that practitioners of the approach don't generally provide a unified view of the enterprise, and it will take a lot more proof than he presents to support his other two criticisms.

The Data Federation Approach

Stonebraker's various comments on the data federation approach also contain some questionable arguments and conclusions. First, "data federation systems are coming." And it's true that federation is the way to go in enterprise artificial systems integration. But federation is not "the only way to integrate realtime data in many independent systems," if by this Stonebraker means federating data stores alone. Realtime data can also be integrated by federated systems of databases and application servers through an object layer, as has been amply demonstrated by various EAI vendors.

Second, and while "data transformation is a required feature . . ." , since "there are few semantically identical data sets. . .", it's also true that data transformation is not restricted to data integration approaches. If it is true that "in a data federation system, conversion is done as the end user accesses data," it is also true that in-place, realtime conversion can occur in any of the other major integration approaches, provided only that they employ both federated servers and an object layer for performing integration.

Third, and while "you can achieve scalability to the enterprise only by logical integration," I can't agree with the plain implication of the context of this remark, that a data federation approach is the only appropriate one. Logical integration is also achieved by a federated, object layer approach to integration, and that is the real competitor to Stonebraker's data federation approach, not data warehousing or messaging.

Fourth, the suggestion that application integration should be used "when performing update integration for small numbers of existing applications; while data integration should be used "when you need unified views or are integrating many systems," is a direct consequence of the view and that only logical integration through federated data systems is scalable. But as I've indicated earlier, this view is questionable, and is in no way supported by Stonebraker's brief remarks on the subject.

Fifth, in stating that only data federation can produce the necessary access to both integrated realtime data about rapidly changing conditions, and integrated aged data providing the historical and baseline information needed for survival and competitive advantage, Stonebraker again draws a conclusion that is dependent on his ill-supported critique of application integration and his refusal to consider other artificial systems integration approaches. He believes that only the federated data approach can produce DEI.

But such a conclusion is unsupported by his arguments against application integration and belied by his failure to consider approaches beyond data and application integration. It is also contradicted by the consideration that DEI must be based on more than data integration alone. To remain competitive, businesses must have more than current and accurate data. They must also have well-confirmed models, validated business rules and procedures, and other classes of useful knowledge, including software applications. An artificial system that provides true dynamic integration must handle changes in all components of knowledge and information, and not simply in the data-based ones. It is for this reason that the data federation approach is by its very nature inadequate. It integrates only islands of data, it leaves most of the "islands of information" and knowledge of the enterprise still untouched.

Artificial Information Integration

Artificial Information Integration, like data integration, can be physical or logical. Since information is composed of both data and conceptual commitments, physical integration requires that both be extracted from their original disparate enterprise stores and through a process of object warehousing, very similar to data warehousing, be transformed and migrated to a physical object warehouse. Alternatively, multiple sources of information may be placed in communication through a messaging system in exactly the same way this occurs for data integration.

These physical integration alternatives have the same disadvantages for information integration that Stonebraker identified for data integration. And again, a better solution for the integration problem is provided by federating information sources, just as it was by federating data sources.

An information federation, like a data federation, doesn't migrate data anywhere, it manipulates data in place, according to the business rules specified in the system. It employs multiple distributed application servers along with multiple distributed data stores to maintain a unified view through a common object model. These application servers are called Active Information Managers (AIMs). [5] They provide process control and distribution services to the information federation to synchronize and adjust it to locally determined changes. Finally, like data federations, information federations employ broad ranging connectivity to read from and write to, the distributed data stores and applications of the enterprise.

An information federation, like a data federation, is scalable. It is scalable because: its connectivity to application servers allows it to access applications transparently and also because new information managers can be added as needed to distribute the processing and query load across broadly distributed resources.

Artificial Knowledge Integration

The Artificial Knowledge Integration approach is very much like the information integration approach. The difference is in the nature of the information being processed. In the knowledge federation, knowledge production application servers support formal analytical modeling and data mining, and apply validation criteria to the knowledge production process. Otherwise, the Artificial Knowledge Manager (AKM) [6] uses the same object model, process distribution and control services, and connectivity features used by AIMs. Knowledge federations share the same capability to provide a unified view, ability to transform data in place, and scalability, as data and information federations.

The DKMS Solution to the "Islands of Information" Problem

The application, information, and knowledge integration approaches to enterprise artificial systems integration can all be implemented using Distributed Knowledge Management Architecture (DKMA) [5] [7] [8] [9]. The Key Architectural Components of the DKMS are:

The Artificial Knowledge Manager (AKM) Stateless Application Servers Application Servers that maintain State Object/Data Stores Object Request Brokers (ORBs, e.g., CORBA, DCOM) Client Application Components. I've discussed these in some detail elsewhere [6], and for the most part refer you to that treatment. All the components in the system are integrated by the distributed AKM (or perhaps AIM, if the approach is one of information integration) application server. Here is a summary of its features.

The AKM provides Process Control Services, an Object Model of the DKMS, and connectivity to all enterprise information, data stores, and applications. In addition, the AKM provides connectivity to ORBs.

Abbreviations

Web = Web Information Server
Pub = Publication & Delivery Server
KDD = Knowledge Discovery in Databases/Data MiningServers
ETML = Extraction, Transformation, Migration and Loading
DDS = Dynamic Data Staging Area
DW = Data Warehouse
ODS = Operational Data Store
ERP = Enterprise Resource Planning
Query = Query and Reporting Server
CTS = Component Transaction Server
BPE = Business Process Engine
ROLAP = Relational On-Line Analytical Processing

Process Control Services include:

In-memory proactive object state management and synchronization across distributed objects and through intelligent agents; Component management and Workflow Management through intelligent agents Transactional multithreading; business rule management and processing; and metadata management. An In-memory Active Object Model/Persistent Object Store is characterized by:

Event-driven behavior; DKMS-wide model with shared representation; Declarative business rules; Caching along with partial instantiation of objects; A Persistent Object Store for the AKM; Reflexive Objects. Connectivity Services have:

Language APIs: C, C++, Java, CORBA, COM; Databases: Relational, ODBC, OODBMS, hierarchical, network, flat file, etc.; Wrapper connectivity for application software: custom, CORBA, or COM-based; and Applications connectivity whether applications are mainframe, server, or desktop-based. DKM architecture provides a unified view of the enterprise and handles semantic conversions "on-the-fly" through the AKM's object model. It's also scalable to the enterprise level due to (a) its distributed, federated structure, (b) its partial instantiation capability allowing it to load parts of objects into memory (c) its virtual in-memory cached object store, and (d) its broad connectivity to data stores and applications.

The DKMS, The Data Federation and The Enterprise Knowledge Portal

Enterprise Information and Knowledge Portals are subjects of great interest these days. [10][11]. They seem to represent an evolution and convergence of both data warehousing/business intelligence applications and unstructured content management applications. They also represent a very active area of current investment and new business activity.

It is interesting that the success of EIPs and EKPs will probably depend upon the extent to which their back-end is successful in integrating the diverse structured and unstructured data, information, and knowledge accessed through such portals. If integration in the face of change cannot produce data, information, and knowledge consistency, along with acceptable performance, within EIPs and EKPs, these applications will soon lose the favor of business. Viador and Information Advantage, two early entrants into the EIP field from data warehousing, have allied themselves with Stonebraker's Cohera to try to ensure consistency of data across the enterprise and performance across the enterprise. No doubt they realize the vital importance of such consistency to the long-term success of their products.

In other words, the key to the success of EIPs and EKPs is their performance and their capability in adjusting to change in the EIP or EKP application. In the end, the success of EKPs is dependent on architecture, and, in particular, on the success of architectures offering logical integration in providing performance and adaptability.

The issue of which type of federated architecture to use will be a critical issue in determining the success and future of EIP/EKP applications. I believe the approach we follow should not be a data integration approach, or even an information integration approach. I think, instead, it should be a knowledge integration approach.

Those who possess integrated knowledge, after all, have a competitive advantage in decision making over those who possess only integrated data or even integrated information. If this is correct, it means that EKP applications should implement DKM architecture and DKMS solutions rather than data federations.

DKMS Brief No. Nine

References

[1] Michael Stonebraker, "United We Stand," Intelligent Enterprise (April 20, 1999), 39-45.

[2] Vitria Technology at www.vitria.com.

[3] Template Software at www.template.com.

[4] New Era Of Networks at neonsoft.com.

[5] Joseph M. Firestone,"Architectural Evolution in Data Warehousing," available at www.dkms.com/White_Papers.htm

[6] Joseph M. Firestone," The Artificial Knowledge Manager Standard: A Strawman," available at www.dkms.com/White_Papers.htm

[7] Joseph M. Firestone, "DKMS Brief No. One: The Corporate Information Factory or The Corporate Knowledge Factory?" at www.dkms.com/White_Papers.htm.

[8] Joseph M. Firestone, "DKMS Brief No. Three: Software Agents in Distributed Knowledge Management Systems," at www.dkms.com/White_Papers.htm.

[9] Joseph M. Firestone, "DKMS Brief No. Four: Business Process Engines in Distributed Knowledge Management Systems," at www.dkms.com/White_Papers.htm.

[10] Christopher C. Shilakes and Julie Tylman, "Enterprise Information Portals," Merrill Lynch, 16 November, 1998

[11] Joseph M. Firestone, "DKMS Brief No. Eight: Enterprise Information Portals and Enterprise Knowledge Portals," at www.dkms.com/White_Papers.htm.


Biography

Joseph M. Firestone, Ph.D.
CEO, Chief Scientist
Executive Information Systems Inc (EIS)
703-461-8823, eisai@home.com

Joseph M. Firestone, Ph.D. is CEO and Chief Scientist of Executive Information Systems (EIS) Inc. Joe has varied experience in consulting, management, information technology, decision support, and social systems analysis. Currently, he focuses on product, methodology, architecture, and solutions development in Enterprise Information and knowledge Portals, where he performs Knowledge and knowledge management audits, training, and facilitative systems planning, requirements capture, analysis, and design. Joe was the first to define and specify the Enterprise Knowledge Portal Concept. He is widely published in the areas of Decision Support (especially Enterprise Information and Knowledge Portals, Data Warehouses/Data Marts, and Data Mining), and Knowledge Management, and has recently completed a full-length industry report entitled "Approaching Enterprise Information Portals." Joe is a founding member of the Knowledge Management Consortium International (KMCI), Editor of the new KMCI Journal, Chairperson of the KMCI’s Artificial Knowledge Management Systems SIG, a member of its Executive Committee, its Metaprise Project, and the KMCI Institute Governing Council. Joe is a frequent speaker at national conferences on KM and Portals. He is also developer of the Web site www.dkms.com, one of the most widely visited Web sites in the Portal and KM fields. DKMS.com has now reached a visitation rate of 83,000 visits annually.

Executive Information Systems Inc

The Executive Information Systems (EIS) Enterprise Knowledge Portal (EKP) is the only portal solution that provides the assurance that enterprise decision making will be based on validated knowledge. EIS’s EKP lets enterprises avoid the risk involved in Enterprise Information Portals which claim to offer increases in competitive advantage, ROI, speed of innovation, productivity, effectiveness and profitability, but have as a central vulnerability the fact that they are only capable of managing data and information, not knowledge.

Enterprises using EIP-based solutions when they could be using EKP-based ones, are gambling that unvalidated information can produce promised EIP benefits. The central value proposition of the EIS EKP is that it replaces gambling on unvalidated information with knowledge-based decision making. That is why it is much more likely to achieve the promised benefits of EIP-based solutions than its EIP competitors.

For more information, see www.dkms.com

Top of Page


Previous Article  |  Table of Contents  |  Next Article