[
PREVIOUS ARTICLE |
Table of Contents |
NEXT ARTICLE
]
THE DATA WAREHOUSE DATABASE EXPLOSION, PART II
by Sid Adelman
What's an Organization to do?
A company recently monitored which of their reports were actually being
used. The sad results were that less than 25% of the reports were ever read.
The company shrewdly halted the unread reports and only reinstated them when
users demanded their return. Understanding the situation can reveal
opportunities to improve.
If an organization understands the problems of letting the databases
increase in an uncontrolled manner and is willing to take appropriate steps,
it should consider the following:
- Recognize that size is important (in this case, big is bad) and that it
can have a strongly negative impact on the cost, performance, availability,
risk and the management of the DW. This recognition needs to be communicated
to all those on the DW team.
- Develop data standards that, among other things, include standards on
transformations, data replication, data stewardship and naming standards.
- Implement a cost justification methodology that can be used uniformly as
funds are being requested. The hard numbers can help management understand
and evaluate options.
- If possible, charge the users for the data they request. This can be very
difficult because other departments may also be using the data and
distributing the costs may become complex. Chargebacks are never appreciated
but they are one of the few techniques that force users to make a business
case for additional data.
- Allow enough time to implement the DW properly. This admonition extends
far beyond the data explosion concern. Management should understand the
costs and risks of an unrealistic schedule.
- The project should be phased and the users should understand the plan and
know when each grouping of data would become available to them.
- Educate and sell the users on the importance of their participation in the
process of determining data requirements. Instruct them on the costs and
problems of frivolous requests for data.
- With the users, evaluate the necessity of keeping instance data as well as
summarized data. This requirement will probably change as the data ages.
Older instance data may be archived while the summarized data will probably
be retained.
- Carefully consider the need to replicate data. Many organization
replicate for questionable reasons ("The user wants his data to be under his
physical control.").
- Understand the need for how frequently the data is stored (daily, weekly,
monthly).
- Allow sharing of data rather than each user having their own copy of DW
data. If possible, encourage sharing by lowering user costs or appealing to
higher motives.
- If data models have captured the structure of the DW and if a repository
has been populated with metadata about the DW, the models and the metadata
should be used to understand the inventory of DW data and to minimize storing
redundant data.
- Give Data Administration the authority to determine data sources and give
them the responsibility for populating metadata in a repository.
- Give Data Administration or the DW Administrator the authority to control
what data will be stored in the DW. This would minimize storing redundant or
unneeded data.
- Be sure the right analysts are working on the project. They need to feel
comfortable with the users and know what questions to ask. Consider a team
that has analysts that come from the user departments.
- Monitor the usage of the data. This is not easily done since there are
no automated techniques to do so -- at the time of this writing, the author
is unaware of any such automated techniques in the RDBMS products. Consider
archiving data that is rarely accessed and archiving data that is never
accessed (Murphy's corollary is that purged data will be requested three
weeks following deletion.).
- Review archiving schemes to determine how much of historical data must be
online and whether all instance data must be kept online. Determine response
time requirements and if historical data may be restored from less costly,
but slower, storage media.
- Consider the use of data marts (departmental or subject DW databases).
They will not decrease the overall amount of data to be stored or the CPU to
run the loads but they can substantially improve the performance of the
queries.
The steps above are relevant for most DWs regardless of size.
A very large DW database should not automatically be considered as a source
of pride buy may be an indication of IT sloth. Organizations need to
recognize the source of the disorderly growth of their databases and
understand whether that growth is justified.
Many of the problems highlighted in this piece are a result of lack of a
strategic DW plan. The problems resulting from the explosion of data may
just be symptomatic of a larger problem and could be seen as a wake-up call
to organizations to review their DW architecture, strategy, standards and
methodologies.
For more information, see http://www.planxpert.com
[
PREVIOUS ARTICLE |
Table of Contents |
NEXT ARTICLE
]