Next Article Table of Contents Previous Article

DATA WAREHOUSES, DATA MARTS, AND DATA WAREHOUSING
by Joseph M. Firestone, Ph.D.

Data Warehouses, Data Marts, and Data Warehousing: New Definitions and New Conceptions

Introduction

Bill Inmon's definition of the data warehouse has been dominant since the beginning of the field. It is: "a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management's decision making process." [1, P. 1] Data Warehousing has lately been undergoing substantial changes in architecture and a broadening of related functional applications. With these changes have come other definitions of the Data Warehouse and evolving conceptions of data warehousing These have been offered with no real attempt to confront other, different definitions or conceptions, or to explore the reasons for disagreements in definitions, and the conceptual commitments or gaps that these definitions imply. This DKMS Brief will explore a number of data warehouse and data mart definitions and their relation to the idea of the Distributed Knowledge Management System (DKMS). [2] It will also analyze the meaning of "data warehousing," in light of changes in data warehousing systems and changes in definitions.

Data Warehouse, Data Mart and the DKMS

Inmon's phrase, taken alone, does not distinguish a data warehouse from a data mart, or enterprise wide data warehouses from process-oriented data warehouses. That is, it does not distinguish subject-oriented, integrated, time-variant, non-volatile data stores of differing scope.

Inmon and his collaborators define a data mart as "a subset of a data warehouse that has been customized to fit the needs of a department." [3, P. 70] They also emphasize that "a data mart is a subset of a data warehouse, containing a small amount of detailed data and a generous portion of summarized data." There is no agreement on this, as there is a strong counter position that atomic data marts are the foundation of the data warehouse. [4, 346-348]

The following types of DSS data stores all fit the characterization "subjectoriented, integrated, time-variant, non-volatile collection of data in support of management's decision making process." They represent different concepts, but together they should provide a framework for reasoning about the issue of definition.

  • A galactic data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management's decision making process about any and all enterprise business processes and departments, and about the enterprise taken as a whole.
  • A business process-oriented data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management's decision making process about any and all business processes and their interactions with one another and the external world.
  • A department-oriented data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management's decision making process about any and all departments, and their interactions with one another and with the external world.
  • A business process data mart is a subject-oriented, integrated, timevariant, non-volatile collection of data in support of management's decision making process focused on a single business process.
  • A departmental data mart is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management's decision making process focused on a single department.

This framework provides three types of data warehouses and two types of data marts. I think businesses are mostly interested in business process data warehouses and data marts. And that the initial interest in Galactic Data Warehouses has faded, while a focus on departmental data warehouses and data marts is less desirable because it is not consistent with the widely endorsed business process orientation toward increasing productivity. Other recent definitions of data warehouse are focusing on the idea that a simple partwhole relationship exists between these categories and that the union of data marts is the data warehouse.

The union of departmental data marts, however, is not a data warehouse, because this union doesn't necessarily provide management decision support for cross-departmental business processes, or for departmental interactions among themselves and with the external world. Still a departmental data mart is a subset of a galactic data warehouse or a department-oriented data warehouse as defined above.

The union of business process data marts is also not a data warehouse, as Ralph Kimball and his collaborators suggest, [4, Pp. 19, 200-203, 266-271] because this union doesn't necessarily provide management decision support for departments, or for departmental interactions among themselves and with the external world. Still a business process data mart is a subset of a galactic data warehouse or a business process-oriented data warehouse as defined above.

The above definitions of the data warehouse don't preclude the possibility that the data warehouse could be distributed. While insisting that the data warehouse is a unified, integrated logical entity, at the physical level the possibility is there that the data warehouse could be distributed. In other papers, I introduced the DKMS concept [2]. A DKMS is a system that manages the integration of distributed objects into a functioning whole producing, maintaining, and enhancing a business knowledge base. The DKMS must not only manage data, but all of the objects, object models, process models, use case models, object interaction models, and dynamic models, used to process data and to interpret it to produce a business knowledge base. More details on the DKMS are available in [2] and [5].

The DKMS then, is a distributed system of objects. What is the relationship between the Data Warehouse and the DKMS?

A short answer is that a data warehouse is a data store whether distributed or not. While a DKMS is a comprehensive information and knowledge processing system that may include data stores such as data warehouses and data marts, as well clients, application servers, non-DSS data stores, and client front ends. A DKMS can contain a data warehouse and therefore is not strictly comparable to it. What the DKMS is comparable to however, is the data warehousing system -- the system whose components include not only DSS data stores (including the data warehouse and its associated data marts), but also clients and application servers. This comparison will be developed further in discussing the data warehousing process.

Data Warehousing

Moving to "data warehousing" as an activity or process concept, I think the data warehousing field needs to decide whether the term merely refers to establishing, updating, and maintaining a DSS-oriented data store or set of data stores; or whether it refers to a broader process of implementing an information systems application for producing, maintaining, and enhancing an enterprise's knowledge base, including the data store components of that knowledge base. If we use the term in the broad sense, we must recognize that these days such an application and therefore the data warehousing process it supports, can contain (a) a wide variety of components, and not just front-end clients and back-end database servers, and (b) a variety of functions beyond querying and reporting.

Varieties of Components

Regarding the variety of data warehousing system components, for example, Kimball, Reeves, Ross, and Thornthwaite, distinguish the following main as part of their "Big Shopper Data Warehouse Technical Architecture Model." [4, P. 508]

  • Source Systems
  • A Data Staging Area with a data staging application server, archive files and other data stores
  • A Base Level Presentation Server with a metadata catalog and the data warehouse
  • An application server, with a report management server, a ROLAP Web interface, a ROLAP service, and a security service
  • A Web Server
  • A projected OLAP Server Data Mart for Budget Tracking and EIS applications
  • A projected Data Mining Server for Customer Scoring, and Purchase Behavior Pattern Analysis Applications
  • An end user application client An end user advanced analysis client, and
  • A Web data access client

Many of the same or similar components are identified by Berson and Smith [6, P. 116], who also emphasize the importance of the Information Delivery System in data warehousing architecture, and by Douglas Hackney, in his very interesting "Understanding and Implementing Successful Data Marts." [7, esp. Pp. 61-64]

Indeed, in light of these and other examples, it is hard to avoid the conclusion that the concept of a data warehousing system should recognize the possibility of a range of application server components as part of DW architecture.

Most of the acronyms and abbreviations should be familiar. But DDS, ERP, CTS, BPE, and AKM need brief clarifying. DDS refers to the Dynamic Data Store, the data store resulting from the operation of the ETML Server. ERP refers to Enterprise Resource Planning applications such as SAP and BAAN. The issue of the relationship between ERPs and data warehousing is an important current issue in the field, especially since SAP's release of its own data warehousing offering.

CTS is the acronym for Component Transaction Servers. Component Transaction Servers (CTS) such as Sybase's Jaguar CTS, and Microsoft's Transaction Server provide software components with data access and interaction capability. Like Web Application Servers these are stateless, though they do provide broad connectivity and multi-threading, and can play a role in integrating complex data warehousing systems.

BPEs are Business Process Engines, an idea requiring some context to explain. The development of multi-tier distributed processing systems was characterized by the appearance of application servers. Application servers provide services to other components in a distributed processing system by executing business logic and data logic on data accessed from database servers. The class of application servers is sub-divided by Rymer's [8, P. 1ff.] distinction between "stateless" and in-memory server environments. Application Servers with Active in-memory Object Models he calls Business Process Engines, a term similar to Vaskevitch's [9] Business Process Automation Engines.

Application Servers are the primary place where business rules live in the Data Warehousing System. And BPEs mainly perform OLTP and Batch processing under constraints and conditions imposed by these rules.

BPEs may be classified by the processes they support. Some examples of BPEs are: Collaborative Planning Servers, ETML Servers, KDD/Data Mining Servers, Knowledge Publication and Delivery (KPD) Servers, Query and Report creation, modification, scheduling, and Delivery Servers, and Relational On-Line Analytical Processing (OLAP) Servers. Some of these components have been or will be discussed elsewhere. But it is important to recognize that they are all BPEs because they process business rules, both declarative and procedural, and maintain state in memory.

Finally, AKM is the acronym for Active Knowledge Manager. [5] An AKM is a special kind of BPE providing process control and distribution services, an object model of the Data Warehousing System, and connectivity to all enterprise information, data stores, and applications.

Process Control and Distribution Services include:

  • in - memory proactive object state management and synchronization across distributed objects (including business rule management and processing, and metadata management);
  • component management;
  • workflow management;
  • transactional multithreading;
  • CORBA and/or COM messaging,

The in-memory Active Object Model and Persistent Object Store Model components of the AKM include:

  • Event-driven behavior;
  • DKMS-wide model with shared representation;
  • Declarative business rules;
  • Caching through partial instantiation; and
  • A Persistent Object Store for the AKM.
  • Connectivity Services of the AKM include:
  • Language APIs: C, C++, Java, CORBA, COM
  • Databases: Relational, ODBC, OODBMS, hierarchical, network, flat file, etc.
  • Wrapper connectivity for application software: custom, CORBA, or COM-based
  • Applications including all categories

Changes in data, metadata, business rules or other object methods are arbitrated and synchronized by the AKM. It must perform object and component state management across applications, processing types, and physical platforms through messaging and intelligent agent activities, while providing connectivity to the various data sources and applications within the Data Warehousing System. This is Dynamic Integration (DI). [5] It requires distributed, proactive monitoring and management of changes in objects and components introduced by the various server-based applications in the system.

To perform dynamic integration, the system must:

  • look for changes in shared objects and additions to the total pool of objects and relationships,
  • alert all system components sharing the objects of such changes, and also
  • make decisions about which changes should be implemented in each affected component throughout the system.

It is important that changes in shared objects are propagated and new objects are created in real-time, so that a single view of the Data Warehousing System's object model is maintained. This is why in-memory, proactive operation is so important.

Varieties of Functions

In classifying BPEs by the processes they support I also indicated the variety of functions corresponding to the variety in components: Collaborative Planning, ETML, Knowledge Discovery in Databases (KDD)/Data Mining, Knowledge Publication and Delivery (KPD), Query and Report Creation, Modification, Scheduling, and Delivery, Web Information Access, and Relational On-Line Analytical Processing (ROLAP). Planning, KDD, KPD, ROLAP and Web Access functions are developments post-dating initial concepts of the Data Warehousing system. The query and reporting management functions have moved from the client to the server. The cumulative effects of these changes is to greatly expand the process content of the data warehousing system, and therefore to change the nature of data warehousing. While previously data warehousing referred to preparing data for an integrated data store, loading data into that store, and then querying the store an off-the-shelf reporting tool; now the term data warehousing can refer to performing the whole range of the above functions.

Conclusion

The continued evolution of data warehousing systems both in the variety of their components and functions, the developing O-O and distributed processing orientation, and the change in definitions to try to accommodate distributed data marts, clarifies the similarity between the Data Warehousing System and the DKMS. The data warehousing system is increasingly like the DKMS in its distributed character, its multi-functionality, and the variety of its components. Two things still distinguish it from the DKMS.

First, though data warehousing systems are evolving toward an object-based integration, they are not there yet. The AKM present in my personal view of data warehousing is still idiosyncratic. In the view of most data warehousing practitioners, the AKM described in this paper, is replaced by a metadata layer having only a fraction of the functionality of the AKM. And second, data warehousing systems are still conceptualized as primarily DSS-based systems, whereas DKMSs recognize no such limitation.

Nevertheless, the trends in the field are clear. With the increasing pressure from ERP and Web-based applications, data warehousing systems will need to integrate transactional applications. And with the further evolution of ETML technology, object models will soon be used to integrate metadata from various stages of the data warehousing process. When these changes come to pass, during the next two years, the evolution of The Data Warehousing System to the DKMS will be complete.

DKMS Brief No. Six

References

[1] W. H. Inmon, "What is a Data Warehouse?" Prism Tech Topic, Vol. 1, No. 1, 1995

[2] I introduced the DKMS concept in two previous White Papers "ObjectOriented Data Warehouse," and "Distributed Knowledge Management Systems: The Next Wave in DSS." Both are available at www.dkms.com/White_Papers.htm.

[3] See W. H. Inmon, Claudia Imhoff, and Ryan Sousa, Corporate Information Factory (New York, NY: John Wiley & Sons, 1998).

[4] Ralph Kimball, Laura Reeves, Margy Ross, and Warren Thornthwaite, The Data Warehouse Life Cycle Toolkit (New York: John Wiley & Sons, 1998).

[5] Joseph M. Firestone,"Architectural Evolution in Data Warehousing." available at www.dkms.com/White_Papers.htm. See also White Papers 7, 9-11, and the DKMS briefs.

[6] Alex Berson, and Stephen J. Smith, Data Warehousing, Data Mining, and OLAP (New York: McGraw-Hill, 1997).

[7] Douglas Hackney, Understanding and Implementing Successful Data Marts (Reading, MA: Addison-Wesley, 1997).

[8] John Rymer, "Business Process Engines, A New Category of Server Software, Will Burst the Barriers in Distributed Application Performance Engines," Emeryville, CA, Upstream Consulting White Paper, April 7, 1998 at www.persistence.com/products/wp_rymer.html. See also my "DKMS Brief No. Four: Business Process Engines in Distributed Knowledge Management Systems," available at www.dkms.com/White_Papers.htm.

[9] David Vaskevitch, Client/Server Strategies (San Mateo, CA: IDG Books, 1993), P. 288


Biography

Joseph M. Firestone, Ph.D.
CEO, Chief Scientist
Executive Information Systems Inc (EIS)
703-461-8823, eisai@home.com

Joseph M. Firestone, Ph.D. is CEO and Chief Scientist of Executive Information Systems (EIS) Inc. Joe has varied experience in consulting, management, information technology, decision support, and social systems analysis. Currently, he focuses on product, methodology, architecture, and solutions development in Enterprise Information and knowledge Portals, where he performs Knowledge and knowledge management audits, training, and facilitative systems planning, requirements capture, analysis, and design. Joe was the first to define and specify the Enterprise Knowledge Portal Concept. He is widely published in the areas of Decision Support (especially Enterprise Information and Knowledge Portals, Data Warehouses/Data Marts, and Data Mining), and Knowledge Management, and has recently completed a full-length industry report entitled "Approaching Enterprise Information Portals." Joe is a founding member of the Knowledge Management Consortium International (KMCI), Editor of the new KMCI Journal, Chairperson of the KMCIís Artificial Knowledge Management Systems SIG, a member of its Executive Committee, its Metaprise Project, and the KMCI Institute Governing Council. Joe is a frequent speaker at national conferences on KM and Portals. He is also developer of the Web site www.dkms.com, one of the most widely visited Web sites in the Portal and KM fields. DKMS.com has now reached a visitation rate of 83,000 visits annually.

Executive Information Systems Inc

The Executive Information Systems (EIS) Enterprise Knowledge Portal (EKP) is the only portal solution that provides the assurance that enterprise decision making will be based on validated knowledge. EISís EKP lets enterprises avoid the risk involved in Enterprise Information Portals which claim to offer increases in competitive advantage, ROI, speed of innovation, productivity, effectiveness and profitability, but have as a central vulnerability the fact that they are only capable of managing data and information, not knowledge.

Enterprises using EIP-based solutions when they could be using EKP-based ones, are gambling that unvalidated information can produce promised EIP benefits. The central value proposition of the EIS EKP is that it replaces gambling on unvalidated information with knowledge-based decision making. That is why it is much more likely to achieve the promised benefits of EIP-based solutions than its EIP competitors.

For more information, see www.dkms.com

Top of Page


Previous Article  |  Table of Contents  |  Next Article