Next Article Table of Contents Previous Article

THE BASIC REQUIREMENTS FOR A SUCCESSFUL DATA WAREHOUSE
by Zak Pines

Before a business can perform data mining on its data to gain strategic insights, the company's data warehouse must be in proper shape. These warehouse requirements can be broken down into two fundamental necessities - the data has to be accessible, and the data has to be able to support the business view.

For the data to be accessible, it must be saved in a format that is relatively open. Such databases as DB2, Oracle, Sybase, or SQL Server would meet this criterion. If the data is saved in a closed format, another program must be developed to open it - not the worst situation in the world, but one which would require time and money.

The first requirement is straightforward - a member of the company's IT staff will know immediately how the data is formatted. The second criteria, that the data must support the business view, is slightly more involved, and may require some closer scrutiny of the data itself.

For example, if a bank is interested in analyzing customer retention, then data must be collected on a customer-to-customer basis - not an account-to-account basis.

The account-to-account collection is incomplete if one is interested in analyzing conditions for customers renewing vs. canceling their accounts. If a customer moves from one city to another, and cancels his account at the branch in his old city and starts an account at the branch in his new city, the system would record this data as losing a current customer and bringing in a new customer. But in reality, the customer is being retained but simply making an adjustment to his account status with the bank.

Even if the company collects the name of the account holder, the two accounts cannot necessarily be linked. It is possible that two different account holders have the same first and last name; it is also possible that one person's capital is being used in accounts under two different names - a personal checking account under the individual's name and an investment account under his broker's name.

If a problem such as this does develop, whereas the data being collected does not serve to address the business need, the company must make some changes. First of all, all future data collection should be adjusted accordingly - data should be collected on every customer, where every account that the customer holds can be indexed by a unique customer identification number.

This, however, will not solve the problem as far as data that has already been collected - and the bank cannot afford to simply ignore all of its past data. Instead, the IT staff can make educated guesses in translating the account data into customer data, but this new data may not be completely accurate.

Of course, if the company's data has been collected along the appropriate lines in the first place, then this problem does not exist, and data mining can begin without a hitch.

If problems are evident, it could be a difficult process in altering the old data and transforming the data collection process. But it is a step that will be rewarded down the line when a data mining system can be set up to draw strategic insights from the data to help support the company's business view.


Zak Pines is an Analyst and Special Operation associate for Virtual Gold, Inc, an industry leader in intuitive data mining software. Pines is involved in developing end-to-end data mining solutions in various industries. Prior to joining Virtual Gold in 1998, he worked at the IBM T.J. Watson Research Center (Hawthorne, NY), from 1995 to 1997. While at IBM, Pines helped develop Advanced Scout, a data mining program used extensively by coaches of the National Basketball Association to devise new strategies based on the automatic identification of hidden patterns in game data and video. Pines is a graduate of Yale University with a B.A. in economics.

For more information, see www.virtualgold.com

Top of Page


Previous Article  |  Table of Contents  |  Next Article