LEARNING CURVES AND RETURNS ON INVESTMENT
by Ed Colet, Virtual Gold, Inc.
Data mining is a new and emerging technology that has been applied to a wide variety of domains. An important issue is to understand the effect that the introduction of new data mining technology can have on an organization. As providers and users of new technology we all hope that ultimately benefits accrue and we reap a healthy return on investment (ROI). Generally speaking, the benefits over time of the introduction of new technology most likely will follow a characteristic curve. Specifically speaking, understanding the nature of this curve has implications for how and when to evaluate ROI.
The February 1999 issue of the Communications of the ACM has an interesting article by Robert L. Glass entitled, "The realities of software technology payoffs". The article reviews research studies that evaluated the benefits of several software technology approaches that effusively promised dramatic improvements. Improvements were considered first in terms of software productivity and also in terms of software quality. These technology approaches included Structured Techniques, 4GL, CASE, Formal Methods, Cleanroom, Process Models, and Object-Orientation. Although some individual studies showed marked improvements of several orders of magnitude (400%-500% in one study of CASE tools), when taken together the results are mixed. Glass concludes that productivity and quality benefits of these approaches can realistically expect to range widely from 5%-70%, but in general, there are "too few studies, and the benefits don't match the claims."
When attempting to draw general conclusions from several approaches it's often difficult for a clear trend to appear. Thus it's difficult to make general claims about ROI. But one thing that was apparent in the article, and was brought out from a study of CASE tools, is that there is a characteristic curve associated with the introduction of new technology. As a plot of benefits (y-axis) over time (x-axis), the curve has a downward slope showing an initial loss of productivity, followed by an upward slope showing slow improvement, and then reaching a maximum peak after some period of time. This has been called the "learning curve". The appearance is that of a "check-mark". The time and benefit scales will differ based on the particular domain, and on a case by case basis -- but the overall trend is thought to be the same.
In the context of data mining, one reason that the curve has this shape has to do with the difficulty in balancing the apparently contradictory elements of complexity and ease of use. Implementing data mining successfully is a complex operation - due in large part to scalability issues. Large data sets and the integration of numerous and possibly diverse sources of data have to be addressed. Ease of use is equally important - the decision maker / analyst may not be a technical end-user, and thus an easy to use interface has to reside on top of the sophisticated mathematical algorithms. The combination of these affects the time to learn and use the data mining solution.
Knowing the nature of this curve should help in determining the appropriate moment to evaluate ROI. If either complexity or ease of use is problematic then the downward phase is prolonged. An ROI evaluation conducted during this phase is not accurate. A problem of course, is that organizations seek to evaluate ROI early in order to consider further investment in the technology. The more appropriate time to evaluate ROI should be during an upward phase, and preferably during the times when the solution is well integrated into the organization (during the peak of the curve) and then the benefits can be accurately assessed.
Another issue is that in today's climate of rapid technological change and accelerated development, what is current today is obsolete tomorrow. As a result new technology has to show benefits sooner and sooner (i.e. the curve should begin to turn upward earlier in time). At least two approaches, both a part of Virtual Gold's philosophy, are possible. One is to work closely with customers to get a sense of the complexity and ease of use issues. This translates into knowing how to integrate a solution into their environment effectively. The second approach is our VirtualMiner(tm) framework -- providing tools to help a software developer build or maintain a data mining application. This would help an internal developer rapidly build an integrated solution. With these approaches, successful solutions can be developed, and accurate returns on investment can be determined.
---
For more information, see http://www.virtualgold.com.