FORMAL COMPUTATIONS AND CONSIDERATIONS IN CALCULATING A
RETURN ON INVESTMENT (ROI)
By Ed Colet
Technology is an expensive undertaking and historically it has been difficult to conclusively show that investments in technology translate into productivity gains. Data mining is a recent technology that promises to provide competitive advantages and financial gains to organizations that invest in it. But a data mining implementation can be quite costly and so a justifiable concern is whether the investment can be definitively shown to be worthwhile. In this two part series, we'll take a look at issues related to return on investment (ROI). Part I addresses the formal computation of ROI, and that the relationship among elements within this formulation can itself be investigated and understood with data mining software. Part II will cover the relationship between the measures of how interesting a pattern may be into financial terms that can be directly used for decision making.
In a previous column ("Learning curves and returns on investment", DSstar, February 9, 1999: Vol. 3, No. 6, http://www.tgc.com/dsstar/99/0209/990209.html), the point was made that the time interval in which an ROI value is computed is critically important. This is because the introduction of new technology has historically followed a learning curve in which there is an initial drop in productivity followed by a rise. The duration of this drop and rise are directly related to issues of ease of use and complexity. As such, the ROI figure could vary widely depending on this curve and one should be cautious when interpreting the ROI value.
In today's column, we'll look more closely at how ROI is computed. In economic terms a "nominal ROI" is defined as the difference between the amount gained (after an agreed upon interval such as 1 year) and the amount initially invested (in that interval), divided by this initial investment. This is formally written as the following equation: ROI = (gain - investment) / investment. Variations of this expression are an "adjusted ROI", where for example the gain amount is adjusted by the opportunity cost had one decided to invest the same amount elsewhere. The adjustment is made by dividing the "gain" term by a figure (e.g. 1.05 assuming a 5% return had the investment been allocated elsewhere).
In practical terms, the only element that one has control over is the initial investment amount. In order to maximize ROI, you want this to be low, and the gain to be high. In any technology investment such as a data mining software implementation, this amount will include the cost to either license or purchase the software. But associated with these initial investments are supplementary costs such as upgraded hardware, networking infrastructure, and the substantial costs of hiring and retaining skilled analysts that are often necessary just to be able to use the software. These supplementary costs can be quite high in a full scale deployment (see "How much can a business spend on data mining?", DSstar, July 20, 1999: Vol. 3, No. 29. http://www.tgc.com/dsstar/99/0720/990720.html). In order to maximize ROI, one has to decide how much of these supplementary costs are necessary and how each additional dollar spent on costs other than the software (e.g. a faster processor, an extra analyst) will relate to the amount gained. After a certain point of course, there will be diminishing returns. But modeling the exact nature of the relationship between supplementary costs and final gain can be complex. Virtual Gold has long understood this relationship, and thus our approach has been to develop and deploy data mining technology in ways that simplify and reduce these costs as much as possible via our pilot programs and our VirtualMiner Framework.
The left term of the numerator, the "final gain" is usually computed with explicit considerations specific to the domain in question. For example, the final gain for a sports data mining application may incorporate factors such as the amount gained per point scored, as well as ticket sales, concession stand sales, etc. On the other hand, a telecommunications application will consider aspects such as churn rates, etc.
Assuming that both the initial investment costs and final gain figures are readily computable, then it is still possible that a maximal ROI may not always represent the best option. Consider the following scenarios in which data mining technology is being compared to a current situation of not using it. Option A is the option of investing $100,000 in data mining per year. An anticipated final gain is computed to be $150,000 per year. The nominal ROI is therefore a healthy 50%. Assume that the current situation has initial costs that are negligible, and known final gains that have historically been computed at $50,000. The nominal ROI is an infinite amount (if negligible costs are represented as zero).
If final gain (e.g. measured as net profit) is most important, then both options show a net gain of $50,000, and neither option seems to be better than the other. If increasing the number of units sold is important, then option A will allow more units sold and one should invest in data mining software. If a higher ROI is most important, then option B, not investing in data mining software appears to be the better option. Deciding on which factor is most important is not a trivial decision and could depend on a variety of circumstances. Deciding which of these factors, and other more intangible factors are most important and under what circumstances is exactly the type of general question suitable for a data mining application. An example of an intangible factor to consider are the potential gains in market share made by a competitor through their use of data mining software.
As such, a financial data mining application optimized to evaluate ROI options could itself be a useful solution.
Next week's column will address ways to translate various data mining metrics into financial terms so that decision making is facilitated.
Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.
For more information, see http://www.virtualgold.com.