[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

THE DATA WAREHOUSE DATABASE EXPLOSION, PART II
by Sid Adelman


What's an Organization to do?

A company recently monitored which of their reports were actually being used. The sad results were that less than 25% of the reports were ever read. The company shrewdly halted the unread reports and only reinstated them when users demanded their return. Understanding the situation can reveal opportunities to improve.

If an organization understands the problems of letting the databases increase in an uncontrolled manner and is willing to take appropriate steps, it should consider the following:

  1. Recognize that size is important (in this case, big is bad) and that it can have a strongly negative impact on the cost, performance, availability, risk and the management of the DW. This recognition needs to be communicated to all those on the DW team.
  2. Develop data standards that, among other things, include standards on transformations, data replication, data stewardship and naming standards.
  3. Implement a cost justification methodology that can be used uniformly as funds are being requested. The hard numbers can help management understand and evaluate options.
  4. If possible, charge the users for the data they request. This can be very difficult because other departments may also be using the data and distributing the costs may become complex. Chargebacks are never appreciated but they are one of the few techniques that force users to make a business case for additional data.
  5. Allow enough time to implement the DW properly. This admonition extends far beyond the data explosion concern. Management should understand the costs and risks of an unrealistic schedule.
  6. The project should be phased and the users should understand the plan and know when each grouping of data would become available to them.
  7. Educate and sell the users on the importance of their participation in the process of determining data requirements. Instruct them on the costs and problems of frivolous requests for data.
  8. With the users, evaluate the necessity of keeping instance data as well as summarized data. This requirement will probably change as the data ages. Older instance data may be archived while the summarized data will probably be retained.
  9. Carefully consider the need to replicate data. Many organization replicate for questionable reasons ("The user wants his data to be under his physical control.").
  10. Understand the need for how frequently the data is stored (daily, weekly, monthly).
  11. Allow sharing of data rather than each user having their own copy of DW data. If possible, encourage sharing by lowering user costs or appealing to higher motives.
  12. If data models have captured the structure of the DW and if a repository has been populated with metadata about the DW, the models and the metadata should be used to understand the inventory of DW data and to minimize storing redundant data.
  13. Give Data Administration the authority to determine data sources and give them the responsibility for populating metadata in a repository.
  14. Give Data Administration or the DW Administrator the authority to control what data will be stored in the DW. This would minimize storing redundant or unneeded data.
  15. Be sure the right analysts are working on the project. They need to feel comfortable with the users and know what questions to ask. Consider a team that has analysts that come from the user departments.
  16. Monitor the usage of the data. This is not easily done since there are no automated techniques to do so -- at the time of this writing, the author is unaware of any such automated techniques in the RDBMS products. Consider archiving data that is rarely accessed and archiving data that is never accessed (Murphy's corollary is that purged data will be requested three weeks following deletion.).
  17. Review archiving schemes to determine how much of historical data must be online and whether all instance data must be kept online. Determine response time requirements and if historical data may be restored from less costly, but slower, storage media.
  18. Consider the use of data marts (departmental or subject DW databases). They will not decrease the overall amount of data to be stored or the CPU to run the loads but they can substantially improve the performance of the queries.

The steps above are relevant for most DWs regardless of size.

A very large DW database should not automatically be considered as a source of pride buy may be an indication of IT sloth. Organizations need to recognize the source of the disorderly growth of their databases and understand whether that growth is justified.

Many of the problems highlighted in this piece are a result of lack of a strategic DW plan. The problems resulting from the explosion of data may just be symptomatic of a larger problem and could be seen as a wake-up call to organizations to review their DW architecture, strategy, standards and methodologies.

For more information, see http://www.planxpert.com


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]