[ Table of Contents | NEXT ARTICLE ]

CONSIDERATIONS FOR INITIATING DATA MINING INTO AN ORGANIZATION
By Ed Colet


Data mining is an evolving technology for discovering meaningful patterns hidden in large amounts of data. Its roots derive first from technological advances that allowed organizations to collect and store large amounts of data relatively easily. Then came the recognition that information within these large databases represented a competitive asset and thus standard querying tools are routinely used to extract information. More recently, it became clear that further competitive advantages can be gained if hidden patterns (i.e., unknown to the analysts) can be extracted as well. Data mining is the technology that promises to discover such hidden information -- and there's a high level of interest in implementing data mining technology to realize this promise. But getting started is easier said that done. What are some of the considerations and pre-requisites for successfully initiating data mining into your organization?

In implementing a data mining engagement, several aspects should be explicitly addressed with an eye on both present and future considerations. These aspects include business issues, data availability, personnel, technology, and evaluative approaches.

Business issues -- identifying a core competency: In the long run, it's strategically more effective (but more difficult) to mine data on aspects related to the business' core competency. If a business manufactures items, data mining should be targeted towards improving what's manufactured. Quite often, it's convenient to mine data that's readily available, but only pertains to issues that are tangential to the core of one's business. For example, the product manufacturer could mine data on the quantity of items sold since this information is usually readily available. But improving what's manufactured will ultimately translate to an increase in items sold and these benefits are more sustainable in the long run.

Data: Identifying a business' core competency is part of the process. Identifying data directly related to ways to improve this core competency is necessary because data mining isn't possible without data being available. In a data mining engagement, most of the resources and effort are related to ensuring that the relevant data in the appropriate format can be analyzed in a meaningful manner.

Personnel: The involvement of the "right" personnel to provide the "right" skills is another important consideration. An executive characterized these skill sets as the equivalent expertise of 3 Ph.D's. One is a business expert to identify strategic issues facing the organization. Another is a technology expert to ensure that the relevant data buried within data storage systems can be made available. Last but not least is the analytical expert to apply the appropriate mathematical modeling techniques on the data.

Technology: Data mining is a technologically intensive implementation. Data mining technology shouldn't be a completely separate technological initiative because this can excessively burden the I/S departments with additional support and maintenance loads. But if it's integrated with an organization's current infrastructure it becomes easier for data mining to become a routine part of an organization's business practices. This can be achieved by taking advantage of existing investments in technological infrastructure where appropriate and effective. For example, consider using the organization's current data storage technologies and if the organization is moving towards a data warehouse, then plan on evolving the data mining components to use the data warehouse as well.

Evaluative procedures: After data mining is underway, it's necessary to evaluate the project. But the time to conduct an evaluation can be critically important. If evaluated too soon, then results may not be evident because of the inherent complexities associated with the introduction of a new technology. If evaluated too late, then costs can be unnecessarily high if the project is not going to be a success. Evaluative metrics need to be developed so that quantifiable measures are readily interpretable. The most common are financial measures related to ROI (return on investment). The beneficial results of data mining should also be presented in actionable ways -- i.e., clear and actionable business decisions should readily follow from discovered patterns.

Beyond the initial considerations and evaluative approaches, the continued use of data mining within an organization usually involves issues of upward scalability to the use of more data. But it can also involve additional development in terms of additional data collection applications and/or data analytic routines. The continued benefit should be more informed decision-making, and continued competitive advantages.

---

Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see http://www.virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]