METADATA AND DATA MINING: TWO UNLIKELY PEAS IN A POD, PART II
by W H Inmon
There are many aspects of data and processing that are constantly changing. Viewed from a single instant in time, the data appears to be static because at that single moment in time nothing about the data is changing. But viewed historically, data is dynamic and is seen to be changing. Which is where metadata comes in. Metadata measures data, the change of data over time.
AN EXAMPLE
As an example, suppose a data miner does an analysis based on data created in 1998. Management is impressed with the speed with which the analysis is done. In fact, management is so impressed that a similar report using data from five years previous is requested. Given the historical data in the data warehouse, the report for 1993 is quickly produced. The 1993 report is placed in management's hands. More enthusiasm and congratulations are expected, but instead management grumbles that the IT organization does not know what it is doing when it comes to business. The data miner is taken aback.
Upon further analysis it is shown that the 1993 report has reported revenues of $10,000 while the 1998 report has reported revenues of $3,000,000. Management declares that the data is "all screwed up".
Before the data miner takes this news lying down, the data miner points out to management that --
When management stops and reflects on the many extraneous factors that have occurred over time, management realizes that an increase in revenues from $10,000 to $3,000,000 is quite possible.
Stated differently, when management viewed just the data, an increase from $10,000 to $3,000,000 seemed unlikely. But once the context of the data was considered, the increase seemed very reasonable.
CONTEXT AND CONTENT OF DATA
Stated differently, it is not content of data alone that suffices in the understanding and interpretation of data over time. When management ponders data over time, the context of data is as important as the content. The context of data over time allows the content of data to be understood and interpreted.
And where is the context of data stored and managed? The context of data is stored and managed in the metadata infrastructure.
With metadata describing all the different aspects of data over time, the data miner can reach into the metadata grab bag and explain why conditions occurred. But when there is no metadata and the data miner is asked to interpret data from years past, trying to understand what numbers mean and why they change becomes a very difficult task.
IN SUMMARY
Because data miners operate on historical data, context of data becomes a very important issue. And because context becomes an important issue, metadata -- by extension -- becomes a very important part of the data miners tool kit.
---
For more information, see http://www.pine-cone.com