[ Table of Contents | NEXT ARTICLE ]

REAL WORLD EXPERIENCES WITH DATA MINING AND KNOWLEDGE DISCOVERY: PART II
by Jill Dyche & Evan Levy


For this specific engagement, Baseline chose the Discovery suite of data mining tools from HyperParallel Corporation ( http://www.hyperparallel.com ). HyperParallel had the advantage of providing more than one type of knowledge discovery algorithm to the problem, and we needed as much flexibility as possible. The HyperParallel algorithms used for this exercise were:

Predictive Modeling: Developing a model from historical data for predicting a behavior such as a customer's likelihood of buying a product.

Clustering: Developing a model that segments customers into unforeseen groups that have similar characteristics.

Sequential Analysis: Detecting sequences of events that occur often.

Affinity Analysis: Detecting sets of products and/or services that are purchased together.

Since this company has over 1.5 terabytes of data, Baseline was careful about selecting the data we wanted to mine. Ultimately, we focused on customer and billing data in order to discover possible customer purchase behavior that might affect the purchasing of different products and services.

The results were surprising. After the initial run, we analyzed large business customers. There was significant interest in reviewing customer product purchases and the unique combinations of products that were purchased by an individual customer. A review of business customers and their purchases uncovered some significant information.

Purchases - High Revenue Customers      Purchases - All Customers

       Custom Calling                     Custom Calling
       Rotary Hunt                        Rotary Hunt
       Ultraline (1.5 Mbit)               Grouping
       Non-Listing                        Code Restriction
       Non-Published                      Non-Published
       Grouping                           VoiceMail
       Multiplex Channel                  Message Waiting

-- Two example affinity results taken from a telecommunications mining activity. --

The table reflects the most popular products purchased by two customer segments: high revenue customers (revenue > $6500/month) and all customers. The above table illustrates some interesting details regarding the similarities and differences between these two phone customer segments. The data clearly supports the traditional view that the highest revenue customers purchase data services (Ultraline) and high quantity voice products (grouping, multiplex). However, these results also uncovered details previously unknown about the high revenue customers: their purchase of vertical features (custom calling, rotary hunt). This finding clearly conflicts with another traditional marketing view that big revenue customers (big companies) utilize their own phone equipment for the support of vertical feature functions.

Not only would this type of affinity analysis been impossible with standard SQL, the business never even considered the possibility of finding this information. This was a product sales area that the company had long ago abandoned because of the advent of commercially available business-oriented phone equipment and PBX's (most business phone systems offer features similar to the phone company's vertical feature offerings). With this new information, they could expand a highly profitable product area and respond more effectively to their customer's needs. Furthermore, they had an entirely new (and much more accurate) marketing strategy for some key products!

In another knowledge discovery iteration, we used predictive modeling (utilizing a combination of induction and decision tree analysis) to help us discover which customers were most likely to buy a particular product. This information -- previously impossible with the telephone company's existing technologies -- helped focus the marketing department on how to generate the most sales with the fewest number of contacts.

Again, the tool did all of the work here: It used 200+ customer description attributes to identify unique clusters of customers based upon an individual product. (Again, this type of exponentially-increasing analysis would have taken years with standard SQL queries!). The engine not only identified and scored the customer clusters (or segments), it also identified the specific attributes affecting each cluster. Some of the findings in this exercise included sometimes strange, but always valid rules:

Customers with Caller ID live in North Carolina and Virginia and have both Custom Calling and Rotary Hunt.

Business customers with custom calling are in Petroleum (fuel stations) and retailers (grocery, department stores), bill more than $374/month, and have Centrex (phone company provided PBX) service.

Business customers with the AutoStar product are in the military (bases) and banking industries in metropolitan areas with greater than 500,000 people and have analog data, rotary hunt, and multiplexed products.

The key here is to find the deltas: Which customers fit the descriptions but do not currently purchase the product? These customers are what target-marketers call "the low-hanging fruit."

---

The third and final part of this series will appear in next week's D S * .


[ Table of Contents | NEXT ARTICLE ]