[ Table of Contents | NEXT ARTICLE ]

HISTORICAL DATA: THE FOUNDATION OF DATA MINING, PART II
by W H Inmon


The changing nature of data, especially at the detailed level, presents an intriguing story when viewed over time.

RESTATING DATA OVER TIME

The classical approach to the amendment of data over time (or the restatement of data over time) is to go back and alter the detailed data, and do a recalculation based on the amended data.

The process to do restatement at the detailed level is tedious and extremely resource intensive. In every case. But when old detailed data is examined and restated it can be relied upon to create a basis for current and crisp analysis and calculation. The data miner feels very satisfied once the old detailed data has been restated properly.

SOME DRAWBACKS TO RESTATEMENT

But going backward in time and reshaping the detailed data has a surprisingly large number of drawbacks. Some of the drawbacks with retro fitting data are -

There are then some powerful reasons not to restate historical data at all. Indeed, at some point in time the volume of data that must be passed during the restatement process precludes historical restatement.

NOT RESTATING DATA

What does an organization do when the point is reached where it is impractical to restate data? There are several possibilities.

The first possibility to make best of the circumstances of not restating data is to make extensive use of metadata. By using metadata to carefully track and define data over time, and to use the metadata to track the changes to data over time, the analyst is able to interpret the changes in data correctly.

A second approach is to operate at a summary level. When examined closely, the differences in data over time are most relevant to detailed data. At the summary level the historical discrepancies that occur are "ironed out". In other words, the higher the level of summarization, the less problem there is with historical restatement.

So simply ignoring historical differences in data at the detailed level and not restating the data is an acceptable approach in many cases.

PRECEDENTS

Is there a precedent for allowing historical discrepancies to occur and stand with no correction or restatement? Interestingly, there are many valid and normal cases. Consider the following -

There are then many measurements over time that exist where the basis for the measurement has changed. The notion that data MUST be classified on exactly the same foundation over all units of time is an idealistic one, but is impractical for all but the smallest collections of historical data.
---
For more information, see http://www.pine-cone.com


[ Table of Contents | NEXT ARTICLE ]