DISCOVERING KNOWLEDGE FROM DATA -- PART I
by Inderpal Bhandari, executive editor at large
How does one discover knowledge from data? Let me count the ways. I am not sure if there are as many ways to discover knowledge from data as there are to skin the proverbial cat, but there are indeed a number of situations where such discovery is possible. Let us start at one extreme and work our way across the spectrum.
To begin with, a situation where a decision is about to be made and the decision maker stumbles across data that are relevant to the decision. For example, consider a retail chain whose executives have decided that it needs more outlets to gain market share. They must now decide exactly where to locate those outlets.
Driven mainly by the goal of establishing a strong competitive presence, their initial strategy is to locate the outlets in the zip codes where a competitor already has an outlet. Then, they fortuitously come across data of the demographics of different regions and of the patterns of consumption for related products.
It is clear from even the most cursory examination that the location pattern of the competitor outlets is far from optimal. There are several regions that appear to be prime targets for their product that do not have outlets, while there are several regions that do have outlets but that clearly should have been closed. They decide to target the former regions, forgoing their initial strategy to co-locate with the competitors.
In the above example, the key to discovery was the knowledge that certain data existed (or could be collected) and that data was relevant to making the decision. At the onset, the impediment to discovery was one of two things: either the executives lacked the knowledge that a certain kind of data was relevant; or they failed to apply the knowledge that the data was relevant, having been steered away from the thought of such application by their initial strategic focus. In either case, the way to address such impediments is to seek a second opinion on important decisions and pray for good luck.
Next, let us consider a situation where there is no impediment as above. The decision maker is aware of the relevant data but does not wish to review it. For example, many people do not want to review the plots or critiques of movies, plays, and other such productions. They feel it will lessen the experience of watching them. Consequently, such data cannot be used to decide which movie should be seen.
In general, such situations arise when the measurement of the data can affect the outcome of the decision. Physicists will recognize the analogy to the famous Uncertainty Principle in quantum physics which states that one can never know the exact location and momentum of a particle since measuring one affects the other.
A technology called collaborative filtering can help the hapless movie goer and people similarly affected. Collaborative filtering uses a data base of ratings by people who have already seen the movie. Characteristics of those people are also recorded in the data base. Thus, it is simple to compute the probability that a person with certain characteristics will like a certain movie. Better still, one can even rank the movies that person will like in order of those probabilities, which makes it possible to suggest or recommend a movie (or book or CD or other such product). Clearly, this is an instance of discovering knowledge from data since a user could be alerted to products that they will like but are not aware of.
Next week, we will continue the count of ways in which knowledge can be discovered from data. Technologies and attendant situations that remain to be covered include legacy systems, graphical query interfaces and query management systems, and data mining. Perhaps the significance of such a compilation is best appreciated by considering the advice Oscar Wilde gave to young playwrights: "The first rule for a young playwright to follow is not to write like Henry Arthur Jones. The second and third rules are the same." Unlike the rules stipulated by Oscar Wilde, the rules for decision-making are quite diverse. Our compilation emphasizes the different rules that apply to different decision making scenarios.
---
Inderpal Bhandari can be reached via http://www.virtualgold.com