[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

ALL ROADS LEAD TO THE DATA WAREHOUSE
by Doug Laney, Consulting Manager, Prism Solutions


Breadth of Computing Initiatives

Consider all the types of business systems initiatives organizations are undertaking today: BPR, EIS, DSS, TPS, KBS, AI, CBR, SI, EDI. This is quite a list of acronyms and quite a variety of systems that must be supported with data. Some of these systems produce data needed by other systems. Some of these systems would benefit from designers first examining the data produced by another.

Run vs. Report the Business

When it comes down to it there are really only two types of business systems we develop: one that runs the business and one that reports the business. This is really the difference between systems that are data input centric and those that are data output centric. The common element here is data access, availability, integration, and analysis -- all among the promises of the data warehouse.

Running the Business

Systems that run the business include such initiatives as: transaction processing systems (TPS), business process reengineered (BPR) systems, electronic data interchange (EDI) and traditional systems integration (SI) applications. These systems generate a lot of data that corporate business analysts, marketing specialists, executive management, operational management, customer support and financial management all would love to wrap their arms around. Limiting their access to this data limits their effectiveness; delivering this data to them yields both obvious and inconceivable opportunities for cost savings and new avenues of revenue.

Business Process Reengineering

As the "competitive advantage" promise of the late '80s and early '90s, BPR continues to be a way for IS to respond to strategic organizational and operational changes. The IS manager's "But we've always done it this way!" argument doesn't cut it when executives decide to reengineer the business. What the savvy IS manager should be asking is, "How can I get you the information you need, Mr. Executive, to help you decide how to effectively reengineer the business?" Again, the answer is data access, availability, integration and analysis -- all of which are delivered via a data warehouse.

Now, this savvy IS manager knows that a data warehouse project to support the design of a larger BPR initiative should take only three to four months. It should be an easy sell to ensure the correct design of reengineered systems. This cushion may also help IS gear up for the BPR effort! The BPR initiative itself most certainly has a requirement for improved end user access to data. Knowing the problems caused in the past trying to provide end user access to operational data, the IS manager should propose a data warehouse as part of the BPR effort. The result: smoothly running, uninterrupted business systems and unlimited end user access to information.

Knowledge Based Systems

Another type of system used to run the business is the operational artificial intelligence (AI) / knowledge based system (KBS). Typically integrated with one of the other traditional IS applications, these applications voraciously crave data instead of creating much data.

The renowned failure and limited popularity of KBS systems is, in a large part, due to their inability to access transactional data in the volumes required with any acceptable performance. KBS applications reading live operational databases are doomed to performance anemia and data starvation. Typically, design teams are forced to make concessions that limit the resulting intelligence and utility of these systems. Combine this with the usual expectation build-up of KBS systems, and neither user expectations nor management expectations are very often met.

Now consider intelligent processing of this nature against a high-performance, access-tuned, properly organized data warehouse. AI may yet have a chance!

Transaction Processing Systems

The pain most organizations feel with their TPS applications is in the maintenance cycle. Much of the maintenance activity surrounds collecting a continually widening variety and quantity of data. Business analysts and other end users of data see TPS as the root of all data -- a place that should collect and create all the data they need to analyze. As competitive challenges and business growth drive the need for more data to analyze, these TPS become saddled with difficult maintenance efforts. For example, marketing systems that once merely tracked product sales are now stretched to track service records, additional distribution channels, industry sales data and a deeper history. With a data warehouse, much of the new integrated data requirements and historical data requirements easily can be placed along with reoriented TPS data where they belong --outside the legacy TPS environment.

Electronic Data Interchange

Much like TPS applications, EDI systems are creators of vast amounts of data. With EDI systems, however, data must be formatted and databases organized for very specific processing, as agreed upon by the "interchanging parties." This greatly prohibits the introduction of ancillary data used by one party and not the other. In a data warehouse, both the EDI-specific data and ancillary data can coexist in an integrated database and be made available to end users and other business systems.

Systems Integration

Many early systems integration projects involved coordinating the processing of two or more business functions, e.g. sales and manufacturing. Today we see SI projects attempting to facilitate the integration of business units from merged or acquired companies. The problem remains the same: How do you integrate or migrate the systems without affecting their current operation? Not surprising, the answer here too is very often a data warehouse. It allows IS to design an integrated database accommodating the functions of both systems, or simply providing a single view of the combined operations for analysts and managers. In fact, the data warehouse design becomes a roadmap for merging at the business process level downstream.

Reporting the Business

On the other side of the IS coin are systems designed to report and analyze the business. These include decision support systems (DSS), executive information systems (EIS) and intelligent CASE-based reasoning (CBR) applications. In general, these systems have a need for a huge breadth and depth of data -- and not just data, but information. Turning data into information involves reorganizing data, deriving new data, integrating disparate data and presenting data to end users. As stated earlier, access to operational data is an important consideration.

However, we've seen the damage that can be done by giving users access to raw production data: unmet expectations and unmanageable impact on legacy systems.

Let's examine how the data warehouse can help us manageably meet the expectations and requirements of DSS, EIS and CBR projects.

Decision Support Systems

DSS may be a limited term for a family of applications that support knowledge workers in the organization. Others think of them as applications that support the popular Open Book Management philosophy: empowering employees at all levels with access to business and financial information that they have a direct impact upon.

Either way, DSS requirements demand that information is presented in a fashion that is easy to understand and applies business rules to derive new information. These requirements boil down to a few common-sense DSS data management tenets:

Note these are data management tenets, not a recipe for transforming data on-the-fly from the legacy system to the desktop--also known as a recipe for failure. In a data warehouse these tenets are easily applied, and DSS end users are more assured of the data's integrity, availability, and comprehensibility.

Executive Information Systems

EIS and DSS applications are similar in that they present information for decision making. Certainly there is a vast gray area, but EIS applications may be distinguished in that they typically allow greater flexibility in "slicing-and-dicing" data. EIS can be characterized by canned reports allowing unpredictable drill-down functionality.

To a large degree the DSS data management tenets apply to EIS as well. Yet, EIS also needs data to be even more integrated, summarized, historical and involatile. To further integrate data, it's not enough to simply place data from different sources in the same database. Tables must be physically combined through subject-oriented denormalization techniques. To further summarize the data, physical rollups by department, region, plant, time period, and/or product are needed to allow high-performance drill-down.

Data used to run the business frequently includes only the current state of the business. Once data is overwritten, its previous value cannot be retrieved (discounting archival systems). This is not acceptable for systems that report the business. Especially in EIS applications, an account of last year's inventory levels at this time, for example, is critical to effective trend analysis. For this reason, EIS data must have an involatile historical component. (Yes, some of detailed data becomes less important over time, but the granularity topic is not within the scope of this article.)

Clearly, a high-performance, high-usability EIS application must have a very different data architecture than exists in the legacy system environment. The data warehouse again comes to the rescue by providing a place and a process for storing integrated, summarized, reorganized, historical, involatile information to allow both a variety of canned reports and much-simplified drill-down capabilities. The need for a new way to physically represent data for EIS has led to revolutionary multidimensional data warehouse modeling techniques such as the star schema.

CASE-Based Reasoning and Friends

On the fringe, other types of systems such as CBR, data mining and unstructured data retrieval applications provide organizations with new ways to explore their data. And, just as in EIS and DSS applications, data availability, data organization, data integration and data retention are all tantamount design considerations. Still, CBR, data mining and unstructured data retrieval applications are usually gauged to be only as successful as the quantity of data they have access to. CBR applications assist in decision making by considering the factors of a current event, searching for another past similar event, and extrapolating/interpolating to derive an optimal response. The response, however, can only be as optimal as the amount of "cases" it has to draw from. Data mining applications use sophisticated statistical techniques to uncover correlations between seemingly unrelated data. The larger the sample, the more powerful the inference based on the correlation.

In unstructured data retrieval applications, users search for information in large (usually indexed text-based) files. With more data to search, users can be more specific with their queries. More data to search means higher user satisfation.

None of these data stores is supported by the typical legacy database. Only warehoused data can truly ensure the success of these types of applications.

Conclusion

As organizations consider the multitude of information systems initiatives planned for delivery over the next couple years (and beyond), it can be striking to also consider how they are going to respond to the unforeseen changes in the organization and the unpredictable informational needs of their employees. Some challenges we already are painfully aware of:

Data warehousing is a hot topic for many reasons. The above IS challenges are not going away; they will become more pronounced as the chasm grows between systems that run the business and those that report it. Creating a data architecture that includes data warehousing to bridge this chasm should be considered a long-term solution, not a stop-gap approach to serious data management.

---

For more information, see http://www.prismsolutions.com/

[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]