[ Table of Contents | NEXT ARTICLE ]

THE POSSIBILITY OF REAL-TIME DATA MINING
By Ed Colet


Those of us familiar with data mining know that data mining can automatically discover hidden patterns in large amounts of data - and that these discovered patterns can lead to valuable information and new knowledge. For good reasons, data mining is typically performed as an offline analytical process, rather than through real time analysis of data streams. But as we'll see in this column, there are certain situations where real time data mining is the better approach.

In the wide variety of domains that use data mining (retailing, marketing, banking, sports, etc), there exist pre-requisite conditions that are ideal for a data mining solution: a lot of data is available, and valuable hidden patterns are known to exist. Consequently, each of these domains tends to use the same data mining analysis approach - "collect, store and then analyze" data. Without sounding trite, one might think that it's not possible to do this any other way, i.e., you can't analyze data until it's been collected and stored. In general, during an implementation of data mining a lot happens after the collection of data and before the data mining analysis actually occurs. First, online transactional systems collect and process incoming data. Secondly, this data is aggregated and then the aggregated data are migrated into other data storage systems. Thus, the incoming data stream is not the data that is analyzed. Rather it is the aggregated data that are analyzed by data mining systems and tools. Therefore, data mining is done as an offline process that occurs at some scheduled fixed time intervals - daily, weekly, or monthly.

An offline approach to data mining reflects sound practice because the data have to be cleaned, checked for accuracy, etc. It may also be necessary for performance reasons. With such large amounts of data, it is necessary to store them in systems that are optimized for rapid analysis - e.g. the loading of OLAP cubes. Doing data mining offline (not on incoming data streams) is a good strategy as long as the following conditions exist: plentiful data, value to discovering hidden patterns, and sufficient time to perform and interpret the analysis.

There are situations in which there is plentiful data that contain hidden patterns, but there isn't time for post processing of the data prior to it's analysis, and the analysis of real time data streams can be critical - i.e., literally a matter of life and death. Data mining implementations are much less common in these situations and the traditional data mining approach of collect, store and analyze isn't the best approach. As reported in the January/February 1999 issue of Technology Review, the University of Pennsylvania Medical Center is testing an interesting use of technology to analyze and essentially perform data mining in real time.

The UP Medical Center uses an artificial intelligence system to collect and analyze a patient's vital signs. As we know, various data monitors surround a critical care patient. But unless someone (or something) is watching the monitor screens beside a patient, this collected data is not really being used effectively. In contrast to this, the system being tested at UP Medical Center essentially monitors the data that monitors the patient. Several measures including blood pressure, blood flow, respiratory rate, etc are monitored simultaneously. The objective is for the system to discover related and predictive signals of dangerous trends - before they're manifested in obvious ways. For example, blood accumulating around the heart can be a serious problem, but one that is not usually noticed until the patient's blood pressure drops significantly. In addition to this, the system can also learn the patient's ideal vital signs via neural network and fuzzy logic software and if there are deviations therefore alert medical staff to possibly impending problems.

The significance of this program is that it effectively makes better use in real time of data that's being collected. This real time analysis means that preventive actions can be taken beforehand. And when the stakes are critical, such as life or death a real time analysis is a better approach than one that is characterized as a collect, store, and analyze approach. As is illustrated by the system being tested at UP's Medical Center, real time data mining is entirely possible. And perhaps, in terms of medical practice, the effective use of available data can result in a clinician arriving at a Doctor's "decision" rather than a Doctor's "opinion".

---

For more information, see http://www.virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]