[ Table of Contents | NEXT ARTICLE ]

HOW MUCH CAN A BUSINESS SPEND ON DATA MINING?
by Ed Colet


How much can an organization afford to spend on data mining? Needless to say, this depends on a multiplicity of issues (the particular business domain, existing technical infrastructure, etc). Therefore a general answer in terms of an actual dollar amount or percentage of a budget will likely be too variable to be useful. But three points are clear: (1) Getting started with data mining itself can be used to answer the question for a particular organization. (2) There are clear service components associated with a data mining engagement that need to be considered (i.e. budgeted for) and these can drive up costs. (3) A full scale deployment with much lower costs that can provide a sustained return on investment is readily available.

A highly recommended way of getting started on data mining is through a pilot project. A pilot project is typically a short-term engagement of 1 to 3 months between a vendor and an organization (or entirely within the organization) in which certain business issues are defined, data are accessed and mined, and results are presented with a prototype implementation of an application/solution. For a particular organization, this can address the issue of whether the organization can gain any advantage through data mining technology, and whether it is in position to deploy and to sustain a data mining operation on a full scale throughout the organization.

Pilot projects are readily affordable especially in light of current Information Technology (IT) spending. A pilot project can cost less than $10,000. The Meta Group's "1999 Worldwide IT Trends and Benchmark Report" that looked at 1998 IT spending found that in large corporations IT spending ranged from $8000 to $12000 per employee. New technology development and hardware and software maintenance took the largest piece at approximately $2000 each. Interestingly enough, only $1075 was spent on data-center support and just $215 on data warehousing. It was found that a third of the costs are associated with web related business technologies including networking, hardware, software, Internet and Intranet costs (Note: some of these web-related costs may actually pertain to data related processing activity). The point is that a data mining pilot project will cost the organization an amount equivalent to what is spent on the IT support for a single employee.

Data mining is viewed as an expensive undertaking because of what happens after the pilot project. Components associated with a full scale deployment typically include licensing fees for software, software development to integrate with existing infrastructure and applications, necessary end-user training and support due to the complexity of most data mining tools, upgraded hardware and software if necessary, developer training, maintenance, upgrades, and possible outsourcing arrangements. Because, much of these are typical with any software engagement/development project a particular organization can readily know how to budget and allocate for such items.

With data mining, the particular unique product-technology provided by the vendor will typically be priced in the high tens of thousands (either as a purchased piece of software or as a software license/user(s)). Although this is generally affordable, in a typical implementation costs will rise rapidly due to the necessary overhead in conjunction with the product.

Chief among those is typically the "user community". This community is usually the group of quantitative analysts (to analyze the data), a group of IT folks (to ensure large amounts of data are accessible from databases with acceptable performance), and business experts (who know the business issues to pursue). Also associated with this are developers that work to customize code and integrate applications so that all parties can fulfill their roles. It is the costs to attract, and retain these highly skilled personnel that can quickly drive costs upwards. Yearly costs to sustain such a typical full-scale data mining effort can easily surpass a million dollars per year. In fact, a certain financial institution has allocated $40 million dollars to create and sustain their data mining effort.

But a full-scale deployment of a data mining solution can be achieved for much less. We at Virtual Gold, Inc., have long recognized that the typical full-scale implementations and deployments are rather costly not to mention highly complex. This is further supported by a recent survey of IT budgets for 1999 reported in the July 1999 issue of Datamation. Based on 262 respondents to a survey on IT budgets, 52% have IT budgets of under $3 million, 26% of companies have budgets from $3-$10 million, 12% have budgets of $11-$50 million, and only 10% have budgets of over $50 million.

Given the above costs and these budgets, a typical data mining engagement may be out of reach for over half the respondents. A lower cost, readily sustainable, full solution is available through our patent pending VirtualMiner FrameworkTM technology. These lower costs not only enable more organizations to reap the benefits of data mining technology, but also translate into higher returns on investment.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see http://www.virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]