[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

DATA MINING WITH THE EXPLORATION WAREHOUSE: PART II
by W H Inmon


The exploration warehouse has the property of being physically much smaller than an enterprise data warehouse because the exploration warehouse is based on token based technology. With token based technology data can be represented and stored inside a computer in a much more compact manner than in a standard relational data base. In fact, token based technology compresses data to the extent that the data in the exploration warehouse can be placed entirely in memory, including indexes. Because data - including indexes - can be placed in memory, the speed of analysis is very, very fast for the data miner. And because indexes are stored in memory as well, the data miner can look at the data in the exploration warehouse in any manner desired. And because data can be reduced in size so significantly, the cost of processing goes down dramatically as well.

In truth, an exploration warehouse can be created in standard relational technology and there is merit in doing so. An exploration warehouse in relational technology is better than no exploration warehouse at all. But the advantages offered by token based technology are such that as long as token based technology is available, there is no point in ever having a relational based exploration warehouse.

TOKEN BASED TECHNOLOGY

So why would an organization wish to create a token based exploration warehouse? There are lots of reasons. An exploration warehouse contains history and details. This is exactly what the data miner needs for analytical processing. And what exactly is the data miner looking for? The data miner is looking for patterns in the data that have business relevance. The data miner is looking for such things as correlations among data elements, relationships among data attributes, trends that have hitherto been undiscovered.

Often times the data miner operates on intuition. In some cases the data miner looks for things simply because he/she suspects that they are there. In other cases the data miner scans data on the assumption that the data miner will know what he/she wants when it is found. In a word the data miner is exploring. And the very essence of exploration is looking at data iteratively and heuristically, looking at lots of data and looking at details.

One of the reasons why the exploration warehouse forms such a perfect foundation for the data miner is that the data in the exploration warehouse does not change between queries (unless of course the data miner goes back to the enterprise data warehouse and deliberately reconstitutes the data in the exploration warehouse.)

When the data miner reruns a query, the data miner knows that the differences in results are a function of changed hypotheses, not changed values of data that might vary from one query to another. The ability to stabilize the data in an exploration warehouse is one of the salient characteristics of the DSS exploration environment.

WHAT THE DATA MINER IS LOOKING FOR

Why does the data miner need to constantly refine his/her queries? The reason is that the patterns and relationships that the data miner is looking for are very adept at hiding. There are many reasons why these patterns and relationships do not jump out -

For more information, see http://www.pine-cone.com

The concluding segment of this commentary will appear in next week's edition of D S * .


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]