DATA WAREHOUSE VS PATTERN WAREHOUSE: PART II
by Kamran Parsaye, Information Discovery, Inc.
The patterns a data mining system discovers are stored in a the PatternWarehouse. Just as a data warehouse stores data, the PatternWarehouse stores patterns -- it is an information repository that stores relationships between data items, but not the data. While data items are stored in data warehouse, we use the PatternWarehouse to store the patterns and relationships among them.
A PatternWarehouse is not a knowledge-base. A knowledge-base includes information that is usually known to humans, is often hand-coded and is somewhat static -- changing it will require care and effort. A PatternWarehouse holds far more dynamic information (which is automatically re-generated once a month with new data) is often surprising to users, and detects trends and patterns of change as they happen.
A data warehouse is used to store data items, while the PatternWarehouse stores relationships between the items. The PatternWarehouse contains much more condensed information than the database. While a database may contain 1 million or 100 million records, a pattern-set just uses a few bytes per pattern. Although large databases include more and more patterns, the condensation ratio is immense -- terabytes of data include megabytes of patterns. Let us also note that different data sets include different pattern types; these pattern types may be related. For instance, a class of customers who is highly profitable, may also have specific affinities.
Fast Accessing to the PatternWarehouse
The PatternWarehouse is represented as a set of "pattern-tables" within a traditional relational database. This solves several potential issues regarding user access rights, security control, multi-user access, etc.
But obviously, we need a language to access and query the contents of PatternWarehouses. SQL may be considered an obvious first candidate for this, but when SQL was designed over 30 years ago, data mining was not a major issue. SQL was designed to access data stored in databases. We need pattern-oriented languages to access PatternWarehouses storing various types of exact and inexact patterns. Often, it is very hard to access these patterns with SQL.
Hence a PatternWarehouse can not be conveniently queried in a direct way using a relational query language. Not only are some patterns not easily stored in a simple tabular format, but by just looking up influence factors in pattern-tables we may get incorrect results. We need a "pattern-kernel" that consistently manages and merges patterns. The pattern-kernel forms the heart of PQL.
PQL: The Pattern Query Language(TM) which does for decision support spaces what SQL does for the data space. While SQL relies on the relational algebra, PQL uses the "pattern algebra". PQL was designed to access PatternWarehouses just as SQL was designed to access databases. PQL was designed to be very similar to SQL. It allows knowledge-based queries just as SQL allows data-based queries. And, PQL uses SQL as part of its operation, i.e. PQL queries are decomposed into a set of related SQL queries, then the results are re-combined. However, business users do not usually see PQL. They just click on a graphic user interface to retrieve patterns on the intranet. They can begin to access knowledge immediately just by clicking on a browser-based graphic user interface without lengthy training sessions or analytical know-how.
Using PQL has a multitude of technical and business benefits that reinforce each other. Not only does it provide faster response with less computing, but delivers more accurate, consistent and higher quality knowledge. Responses to knowledge queries are more efficient because patterns have already been pre-computed. The overall computational burden is reduced by avoiding the repeated discovery sessions that are unknowingly performed by multiple analysts. In many cases, avoiding repeat discovery sessions performed by the same analyst is itself a significant benefit.
With the PQL the user still performs analysis (e.g. visualizes affinity patterns) the results delivered for the same level of computational effort are orders of magnitude better because the user now analyzes refined knowledge, not data. And now 100 different analysts will no longer get 100 different answers from the same data because there is a central knowledge repository.
Web-Delivery of Knowledge
The web is a natural medium for delivering knowledge. As more and more corporations deploy intranets, it has become an essential vehicle for information sharing. The PatternWarehouse should naturally be supported on the corporate intranet -- and can easily reach out to the internet.
Inter/intranets and PatternWarehouses work well together because the net easily delivers what the PatternWarehouse stores. And, because knowledge is not measured by volume, but by its content and impact, what moves across the intranet does not have high volume, but has high impact. The result is the ability to access and distribute knowledge with unprecedented ease. And other inter/intranet resources are just a mouse-click away.
The knowledge delivery method used in conjunction with PQL: The Pattern Query Language is browser-based and the graphic user interface uses JavaScript so you can easily tailor it to your specific needs. And, web-based delivery eases the bottleneck of corporate software distribution, making it possible to support many users.
A natural way of delivering information to users on the web is a document organized as a collection of information of different types, e.g. text, data, graphs, etc. An Explainable Document looks like any other web-page at first, but does an incredible amount more by allowing users to dynamically obtain explanations that clarify, justify and substantiate the information presented within the document.
Explainable documents are automatically composed and contain English text and graphs that the system generates all by itself. Each piece of information, be it a sentence or a graph, is generated for a specific purpose -- and makes a specific point. When a user requests an explanation for a piece of information, the system composes a concise and to the point explanation and presents it as yet another explainable document.
Industrial Applications
PatternWarehouses have a wide range of industrial applications from banking to manufacturing and the internet.
In Banking & Financial Services pattern warehouses can be used for Customer Relationship Management by storing patterns of Prospecting & Acquisition, Affinity & Cross Sell, Profitability & Wallet-share, Channel Utilization, Retention & Attrition and Risk Analysis. These patterns can re-shape the thinking of a marketing department. And, by combining trend analysis with wallet-share and affinity results, Right Time Marketing is implemented based on dynamic customer life cycle models.
In Retail & Consumer Packaged Goods PatternWarehouses can be used both for category management and other specific analyses which take advantage of patterns, e.g. multi-dimensional market-basket analysis. For consumer packaged goods companies, the patterns of market-share variation by store-chain and demographics, as well as panel data can be stored -- including historical performance from the chain level, down to the region and store levels. And, patterns of promotions analysis determine the factors influencing the effectiveness of advertising.
In Right Time Direct Marketing trend-based patterns are used to offer the right product to the right customer at the right time. This not only stores patterns for each customer segment, but when each segment is ready for an offer by discovering customer lifecycle models. Typical patterns include: When is the right time to sell product X with product Y? Who is ready to buy product X now (based on having product Y or other factors)?
In Internet Web-log Access Analysis patterns of user activity on a web-site are stored and help focus on-line marketing efforts. The patterns discovered and stored relate to how the user traversed the site from page to page; items the user entered or was interested in; repeat visits and the end-result, e.g. did the user purchase anything. This helps increase click through rates as well as guiding the generation of suitable content.
In Manufacturing Quality & Warranty the warehouse stores patterns of unusual data densities that are the tell-tale signs of process variations within manufacturing and assembly operations. This helps automatically identify the combination of plant, build-date and product models that have a higher brake-down or malfunction rate, allowing quality engineers to quickly zoom into the source of problems. This helps improve product quality by automatically identifying the factors that give rise to problems, thus increasing customer satisfaction by reducing complaints.
---
For more information, see http://www.datamining.com.