[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

DATA WAREHOUSE -- PROTOTYPE OR PILOT?
by Sid Adelman


In the late 1700s, Catherine the Great, Empress of Russia, wanted to see how her subjects lived. Her favorite minister (and lover), Prince Grigori Alexandrovich Potemkin wanted to keep her in the dark about the miserable state in which the Russian peasants actually lived. He put her off until he could build clean villages and populate them with Disneyesque peasants, sanitized and appropriately garbed in new, clean rustic costumes. The Empress was shown only these villages and was pleased to see how her subjects supposedly lived. These villages were given the name "Potemkin Villages." In this century we have the equivalent of Potemkin Villages in the prototypes created to show upper management the sanitized version of data warehouse.

A prototype is a throw-away, a mock up of the eventual product. In the case of data warehouse, it usually has a small amount of data that has been carefully selected to show a capability but serves no useful purpose to help an organization make more meaningful decisions. The data in a prototype lacks integrity, is not current and is often plain wrong. It's much like a movie set that looks fine from the front, but behind the facade, there is no flooring, plumbing, electricity, roof or anything thing else needed to make a building useful and usable.

Why would a company want to spend the time and money to develop a prototype? This article will address each of the reasons and point out the fallacy of this approach.

Some develop a prototype to prove the concept of the data warehouse. This is akin to having to prove the concept of wireless transmission. As Michael Haisten of VITAL has said, "We know it works." There have been countless success stories that can be validated without having to develop a prototype.

Others implement a prototype to get management buy-in. The idea that unless management can see a "working" data warehouse in their own organization with some of their own data, they would have no faith in whether or not it actually works. There are other ways to get management buy in without the prototype exercise. Just to name a few: 1. have a query tool vendor demonstrate using some of the organization's sample data, 2. visit other installations that have working data warehouses, and 3. attend conferences where case studies are described. There is the possibility that unless management sees their own people implementing something, they have no faith in their staff's capability to complete a project successfully. This could be the result of previous failures of I/S to actually deliver anything meaningful, on-time and within budget. A prototype will not prove that these in-house incompetents can implement a robust data warehouse.

Still others want the prototype to help the organization evaluate tools. While the prototype may give some indication of the level of vendor service (vendors' service is usually excellent in the selling phase), a prototype rarely uncovers the warts found on most tools. Wart revelation usually comes after the databases get large, the queries get complex and the really dirty and complex data has to be cleaned and transformed. These conditions rarely occurs while building a prototype.

And then others use the prototype to give the organization the experience it needs to build a data warehouse. In fact, it will provide some experience, some of which may be useful but this experience could also be gained building some real and meaningful capability. A prototype does not provide the experience for the really difficult tasks that must be accomplished to deliver a data warehouse such as handling complex and dirty data, designing and tuning very large data bases and dealing with unruly users.

There are other problems with a prototype. No matter how the prototype is explained to the users and to management, they either don't hear or don't accept the explanations. Some common user complaints:

  1. "The data is wrong" -- even though they were told that the data is just sample data and the results are not expected to be correct.
  2. "This is exactly what I want and need right now. Why won't you let me have it?" -- even though they were told it's just a mock up and can't be used.
  3. When users see the speedy response time of the prototype (with 1000 rows of sample data), they come to expect similar response time when the real data warehouse is delivered. They don't remember what you told them about performance differences of databases with 1000 rows and 1,000,000 rows.

Build a Pilot

The sensible alternative to building a prototype is to build a pilot. A pilot is the first real data warehouse application with real data that has been understood, documented, properly selected, cleaning, transformed and summarized. Pilots take longer than prototypes but when they are complete, the organization has something that is meaningful and useful. If the pilot is properly done, it is not a throw away project; it can be expanded and enhanced. The number and types of users can be expanded. The pilot data warehouse can incorporate additional data entities, attributes, subject areas and historical data for trend analysis and can begin to incorporate external data.

A pilot should have the following characteristics:

  1. It should provide information that is important to the organization. Management should gain some important benefits from the use of the pilot. As a result of a successful pilot, management will want to support further data warehouse efforts.

  2. The data should be of a reasonable size, not so small that the experience would be discounted by the naysayers. It should be large enough and sufficiently challenging that important lessons would be learned. It should also not be so large that the project may be delayed or jeopardized because of performance problems.

  3. The user sponsor of the pilot should desperately want the pilot to succeed and would support the effort through the problems that will inevitably arise. The sponsor would back the project financially and provide the right people at the right time for:

  4. The pilot data warehouse will be used. It does little good to have a pilot that is used by two people once a month. The pilot data warehouse should get significant exercise.

  5. The source data for the pilot should come from a minimum number of files and databases and should be reasonable clean. Migrating, cleaning, transforming and integrating data is labor intensive. The more sources for the data and the dirtier the data, the longer the project will take.
The pilot should be completed in a reasonable period of time. Long projects try management's' patience and are often discontinued. The key is scoping. The pilot project should be scoped to minimize the time to implementation.

Any organization that is considering a prototype should instead choose to implement a pilot project. The benefits of a pilot project far outweigh the extra time and effort required for implementation.

Catherine the Great went to her grave believing her subjects to be healthier and happier than they actually were. If Catherine's Minister had been caught in his ruse, he may have experienced an early professional -- or actual demise. I/S Directors who attempt to show their upper management a "Potemkin Data Warehouse" may experience a demise in their careers or at least in their credibility.

For more information, see http://www.planxpert.com


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]