[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

DATA MINING -- IT'S NOT JUST FOR STATISTICIANS ANYMORE               10.14.97
by David Danziger, Trajecta, Inc.                                      D S *

In the last few years, the concept of data mining has moved from the hallowed halls of academia and various research consortia into mainstream business practice. However, until very recently, data mining's primary position within firms has been in one of two places: the IT department or the analytical department. Historically, this arrangement made perfect sense. IT staffers have been the only ones with clear access to the data being collected for analysis, and statistical analysts were the only ones with the know-how to use the complex statistical packages required to do any sophisticated manipulation and interpretation of the data. Yet that paradigm no longer holds. A confluence of four key factors has made data, and data-based decision making more accessible throughout organizations from low levels to top-level key executives. An investigation into each of these four factors provides insights for strategic decision makers in companies which hope to make better use of their data, and thus enhance their bottom lines.

Factor number one is the incredible increase of computing power. Many engineers are, no doubt, familiar with Moore's Law. This concept holds that processing power will roughly double every 18 months. This rate of increase has had a tremendous impact on the type of processing which can be managed by a desktop PC as opposed to a supercomputer or even a high-level workstation. For decision makers, this distinction is a critical one. It means that data driven decision making need not involve exorbitant costs for highly specialized one-dimensional hardware. Rather, the same machines which run word processing, spreadsheet, and other common applications can also be used for sophisticated data mining operations. This process has been further augmented by the steep drop in RAM prices which has occurred over the past year or so. Many of the high-end data mining tools rely on boosted RAM levels, and thus the RAM price drop has made data mining that much more affordable on the hardware side.

Key Trend #1: Strategic decision makers can expect computational speed to continue to increase for the foreseeable future. Whether or not it will actually double every 18 months, and whether or not RAM prices will continue their free-fall cannot be stated with certainty. What can be stated is that both raw processing power and large amounts of RAM will continue to be more and more affordable over time, thus making data mining applications more accessible to those who can benefit most from them.

Factor number two is the accumulation of large amounts of data which has occurred over the course of the last several years. The sheer volume of data which can be tapped or acquired by companies has increased to such a degree that even small and mid-size companies can utilize data for better decision making. Many large companies invested sizeable sums into recording demographic and transactional data on their customers. These forwardthinking companies are beginning to see their long-term investments pay dividends as they can now target their marketing and promotional efforts with unparalleled precision. In addition to the large proprietary data repositories held by big companies, the growth of companies which specialize in data warehousing has also had a tremendous effect. These companies, with their multi-terabyte data warehouses, provide a valuable outsourcing option for small, mid-size, and even large companies which do not wish to or cannot afford to make large long-term investments for developing an in-house data warehouse.

Key Trend #2: Data warehousing and data mining will continue to have a symbiotic effect on one another. As data mining gets hotter, data warehousing will continue to grow. Similarly, as data warehousing continues to grow and becomes more efficient, data mining will become even more precise as a result of more and better data. Thus, as data warehouses continue to grow, watch for the price of large amounts of data to drop accordingly.

Factor number three is the advancement of methodologies such as neural networks which insulate the end-user from the statistical guts of a program. This lets decision makers of all stripes realize the benefits of data modeling without requiring a detailed knowledge of statistical concepts. Admittedly, some people love to get in and tinker with mathematical equations, algorithms, and coefficients, and software packages will always be available for these folks. However, this group will represent an ever-smaller percentage of those who are using data mining for business decision making. Most people will want the power and benefits that accompany the complex data mining algorithms without the side effects of waking up in the middle of the night because they've just had a nightmare about their Calculus final exam. Can you remember the difference between a local and an absolute minimum? It's an irrelevant question with the arrival of next-generation data mining programs.

Key Trend #3: Data mining programs will be made easier and easier to use, in the sense that a detailed knowledge of statistics will be less and less necessary. What will be necessary will be detailed domain knowledge. Understanding the business problem at hand and the nature of the data being modeled will be critical for the modeler to derive usable information from the models. In the next generation of tools, look for statistical measures such as "r-squared" and "k-statistic" to be replaced on-screen by more straightforward, operational terms like "ROI, "point of maximum profit" or "point of lowest cost."

Factor number four is the highly visual nature of the current generation of data modeling software packages on the market. The upper echelon of today's data mining tools have advanced graphical user interfaces (GUIs) which give non-expert users the comfort of a point-and-click environment as well as visually detailed representations of the data being modeled. For data mining to continue to penetrate the decision making levels of organizations, these visualization features will be critical. It is much easier to look at a detailed graph, chart, or picture than to look at a series of numbers whether in the data modeling or decision making process.

Key Trend #4: Decision makers are going to want to be able to see results in a graphical way rather than pointing at a cell in a spreadsheet. Data mining and data visualization will thus become invaluable counterparts to one another over the next year to two years. Data mining software companies which are strong in their modeling algorithms, etc. but which cannot produce highly visual interfaces will form partnerships with data visualization companies, or they will not survive. Thus, decision makers looking to make an investment in data mining software should be confident that the product they are purchasing has an intuitive, visual appeal so that people at all levels of the company are comfortable using it, and can understand the derived business knowledge.

As these trends for the future continue to materialize, it will become more and more clear that data mining is NOT just for statisticians anymore.

---

For more information, see http://www.trajecta.com/

[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]