GOLDEN MEANS: DISCOVERING KNOWLEDGE FROM DATA -- PART II
by Inderpal Bhandari,
executive editor at large
How does one discover knowledge from data? Let me continue to count the ways.
Last week, I discussed two scenarios for discovery. In the first, the trick was to identify the relevant data. In the second, the trick was to collect and process data in a way that did not affect the outcome of the decision. Let the count go on.
As is well known, the key to success often lies in merely showing up. Well, much of the time, the key to discovery lies in simplifying access to data. Consider a situation where an organization has its mission-critical data in an ultra-reliable mainframe environment. This is not exactly an arrangement where one can play around with data. Without the help of a mainframe programming expert, this is a tall order.
An analogous situation is when such data resides in the proprietary data sets of statistical packages. There you require the help of a statistical expert to access the data. In general, these situations have an intermediary step involving a programming expert before the data show up, i.e., can be used by a business specialist to gain some insight about the business. Of course, not having access to relevant data is equivalent to having no knowledge of that data. And, as we saw last week, adding relevant data leads to discovery. Consequently, discovery often occurs once a business specialist makes use of technology that simplifies access to business data.
GUI front ends to SQL databases as well as to legacy or proprietary systems, On-line Analytical Processing (OLAP) tools, and perhaps, even Query Management Systems are examples of such technology. They help the business specialist play around with the data, to ask questions about it, to explore it, and so forth. That process leads to discovery.
Lastly, consider the scenario where access has been simplified and a business specialist can merrily ask questions to his heart's content. While the business specialist will discover knowledge from data, some patterns will remain hidden. Fundamentally, this occurs because of the combinatorial explosion of questions that can be asked about multidimensional data. There are simply too many possible combinations to think through to be sure that all relevant questions have been asked and answered.
Data mining technology addresses this situation. These programs automatically identify interesting patterns in data. One could say that they guide the user to the questions that the user may not have thought to ask of the data.
The identification of hidden patterns leads to discovery of knowledge from data. The effect is more dramatic than the other discovery scenarios that I have covered, since discovery is two fold: The business specialist first discovers that there are hidden patterns, and then uses those patterns to gain insights about the business.
However, as we have seen from the above, data mining technologies are by no means the only way to discover knowledge from data. Perhaps that is one of the points of confusion with regard to what is and what is not data mining. Manufacturers of data mining technologies have perhaps contributed to that confusion by equating data mining with knowledge discovery.
The contrast between data mining and the other technologies is perhaps best understood in the light of the following anecdote. George Bernard Shaw once sent a telegram to Winston Churchill: AM RESERVING TWO TICKETS FOR YOU FOR MY PREMIERE. COME AND BRING A FRIEND -- IF YOU HAVE ONE.
Churchill replied: CANNOT COME TO FIRST SHOW BUT WILL ATTEND SECOND PERFORMANCE -- IF THERE IS ONE.
Unlike all the other technologies that we covered in this article, data
mining must have an element of one-up-manship. It must go beyond what the
user is capable of discovering on his own by asking questions of data.
---
Inderpal Bhandari can be reached via
http://www.virtualgold.com