[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

GUIDED KNOWLEDGE DISCOVERY                                           10.07.97
by Frank McGuff, Principal, Telos Solutions, Inc.                      D S *

Data mining has been shown to be an effective component in today's analytical processes. Corporations are using data mining to perform critical studies of corporate performance, including the following:

(1) Verify and quantify folklore.

Every corporation has long-standing beliefs about their business. Some of these may have reached the status of folklore, while some of them may simply be "gut level" beliefs held by corporate management. The easiest and most powerful first-use of data mining is the verification and quantification of these long standing beliefs.

(2) Detailed customer-product mix.

The simplest question of "who buys what, when and where?" may be the next question to study. The answer to this question can increase your ability to focus on the customer-product mix that is most profitable. This particular question is explored in more detail below.

(3) Discovery of new and useful information.

Once we have exhausted what is commonly believed within your company, you can start to explore new and useful information. By this time, your analysts will be extremely familiar with the Knowledge Discovery process and will feel comfortable moving into new areas.

What is Knowledge discovery in databases?

"KDD Process is the process of using the database along with any required selection, preprocessing, subsampling and transformation of it, to apply data mining methods (algorithms) to enumerate patterns from it; and to evaluate the products of data mining to identify the subset of the enumerated patterns deemed 'knowledge'. "

Knowledge discovery in databases (KDD) places data mining as a single step within a much larger business process. These steps include data extraction and transformation, the data mining process itself, and the interpretation of the results. The most important step, however, is the development of actionable items. KDD should be performed by individual analysts, focusing on their own business requirements. Each analyst should be able to perform the following critical steps in Knowledge Discovery:

(1) Cost effective exploratory studies.

Each analyst should be able to discover new information relations or explore old ones. They should be able to do this through cost-effective exploration and then, with the same tool, develop full-scale models. This is a process that must be performed by the individual analysts in order to satisfy the unique requirements of their business environment.

(2) Development of explanatory models.

Knowledge discovery can help explain the past by discovering the relationship between large numbers of attributes. Although it may be possible for analysts to study one or two related attributes, studying much more than that is possible only by using data mining. Overly simplistic models are as misleading as overly complex models. The Guided Knowledge Discovery product will help the analyst start with a large model and decrease it, removing elements that do not have sufficient impact. Ultimately this will arrive at the model that satisfies their needs. This follows the precept that the model should be as simple as possible, but no simpler.

(3) Development of predictive models.

Above all else, the results of the discovery process must be useful. This means being able to improve your business, by taking new actions. The analyst develops and uses the predictive model to simulate how effective these actions will be, before any costly implementations are undertaken. Not only can the analyst ensure that the desired goals will be achieved but they can also make sure that no negative side effects will be introduced.

(4) Visualization.

It is often easier to detect interesting patterns through visualization than by any other means. The analyst should be able to extract data from the warehouse and not only visualize it but also use the visualization to guide the knowledge discovery process.

Based upon these requirements, any product that performs Guided Knowledge Discovery should meet three objectives:

(1) Embedded business process.

The knowledge discovery process is completely embedded within common business processes. It combines a variety of tools and techniques to present information in whatever format is most appropriate. These consist of data visualization and data mining, with the information residing in a dimensional data warehouse. The product also includes pre-defined studies that simplify the data mining process significantly.

(2) Interpreting the results.

The product helps the user interpret the results. Data mining produces a large number of results; some will be extremely useful while many will be irrelevant. Guided Knowledge Discovery helps you separate what is important from what is plentiful.

(3) Development of actionable items.

It is possible for the user to easily evaluate the potential actionable items. Once the results have been selected, we help the analyst determine the best course of action. Guided Knowledge Discovery helps the analyst build up a set of 'scenarios', each of which can be simulated on the data warehouse. The results of the simulations will help the analyst implement the best course of action.

Summary

Aaron Zornes (Executive Director of the META Group) has said that the 2nd generation of data mining applications will be based upon interactive frameworks. These will provide the ability to:

  1. Analyze very large databases (scaleable).
  2. Perform more iterations and perform them more easily.
  3. Discover more opportunities.
  4. Provide real-time, interactive analysis.
  5. Enable faster, better refined decisions.

These objectives should be met as follows:

  1. Analyze very large databases.

In reality, we want a scaleable process. One that performs well on small, as well as large, amounts of data. There is a lot that can be learned from relatively small amounts of highly aggregated data as well as from large amounts of highly detailed data.

The types of studies performed by analysts cover a range from small enough to fit on a client-based tool to a study large enough to require large parallel processor servers. Guided Knowledge Discovery must support this range with the same user interface. The analyst should be able to use a simple iconic interface that will process the request on their workstation or on the server.

The small studies will extract the data from the warehouse, transform it, execute the mining process and let the user view the results. All of this will take place on the client workstation. The large scale high-end processes can be performed on a parallel processor server. The extraction and data mining process will automatically be separated into parallel processes to reduce processing time. The results will be merged together so that they can be studied as a single result.

2. Perform more iterations and perform them more easily.

The ability to create new studies "by example" means that the analyst never really has to start from scratch. This will dramatically reduce the amount of time that it takes the analyst to go through a complete discovery process.

New studies can be created from old studies as well as from their results.

3. Discover more opportunities.

Since new studies can be developed easily and quickly the analyst will have more time to explore and discover the explanatory and predictive models. Ultimately, this means that the analysts will not only be more effective at their jobs, but also much more efficient.

4. Provide real-time, interactive analysis.

Discovery is an iterative process. In order for it to be most effective, it must be interactive and as near real-time as possible. Guided Knowledge Discovery allows the analyst to explore small sets of data so that the analysis and visualization can be performed very quickly. This significantly reduces the time and expense required to complete the knowledge discovery process.

5. Enable faster, better refined decisions.

The only response time that matters is how long it takes to respond to different business situations. Some of these are oppurtunistic and some are problem solving. In either event, the complexity of today's business environment requires more than simple data extraction and report writing tools.

Conclusion

Guided Knowledge Discovery elevates data mining from a stand-alone operation and embeds it within a complete decision support environment. If you have found that data mining appears to be overwhelming, start to look for products that support the complete process of Knowledge Discovery in Databases.

---

For more information, see http://members.aol.com/fmcguff/index.htm


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]