[ Table of Contents | NEXT ARTICLE ]

PUTTING DATA MINING TO WORK: THE SEQUEL, PART I
by Michael J. A. Berry


Readers of Data Mining Techniques for Marketing, Sales and Customer Support may recall that in the final chapter, "Putting Data Mining to Work," we described an ongoing data mining project for a cellular phone company. At the time that chapter was written, the project had been defined, but not yet executed. Now, several months later, I thought it would be interesting to go back and find out what, if anything, the data mining project had actually accomplished. For this article, I interviewed Gregory Lampshire of the data mining center at Naviant Technology Solutions. Gregory was responsible for most of the data mining work in this engagement.

The Original Problem

The goal of the data mining project was to identify groups of subscribers with an unusually high likelihood to cancel their subscriptions. These high risk customers would be the target of a telemarketing campaign aimed at retaining them. The experimental design allowed for the comparison of three groups:

  1. The general population
  2. Customers judged by the model to be high risk for whom no intervention was performed
  3. Customers judged by the model to be high risk for whom some intervention was performed

Our hope, of course, was that group two would suffer high attrition compared to group one, but that group three would not. This result would demonstrate that data mining is an effective way to develop predictive models for customer churn and that such models are worth having because they can lead to effective intervention.

The Data to be Mined

Two sources of data were to be used to develop the churn model: Call detail data culled from the actual switches that route the calls and customer summary data from an existing marketing database. Our intention was to merge the two data sources so that a given subscriber's data from the marketing database (billing plan, tenure, type of phone, total minutes of use, home town, etc.) would be linked to the detail records for each of his or her calls. That way, a single model could be built based on independent variables from both sources.

Evolution of the Project

Almost at once, the difficulties of working with a mix of operational and historical data forced changes in the experimental design. Although call detail data was regularly written to tape, the tapes were not archived. Each switch had its own collection of reel-to-reel tapes like the ones used to represent computers in 1960's movies. These tapes were continuously recycled so that a 90-day moving window was always current with the tapes from 90 days ago being used to record today's calls. Since 8 tapes were written every day, we found ourselves looking at over 700 tape reels, each of which had to be loaded individually by hand into a borrowed 9-track tape drive. Once loaded, the call detail data, which was written in an arcane format unique to the switching equipment, need extensive preprocessing in order to be made ready for analysis.

The marketing data, on the other hand, consisted of monthly summaries and was about 45 days out of date. Since the 90-day window of the call detail data had already moved past the beginning of one month, and the most recent month of call detail corresponded to marketing data that wouldn't be available for 45 days, there was not much overlap between the two data sets. Due to time and budgetary constraints, we elected not to wait the several months it would have taken to get 3 months of equivalent data from the two systems. Instead, we built separate models based on the two sources.

What the Call Detail Revealed

The 70 million call detail records we collected were reduced to 10 million by filtering out records that did not relate to calls to of from the churn model population of around 400,000 subscribers. Even before predictive modeling began, simple profiling of the call detail data suggested many possible avenues for increasing profitability. Once call detail was available in a queryable form, it became possible to answer questions such as:

The answers to these and many other questions suggested a number of marketing initiatives to stimulate cellular phone use at particular times and in particular ways. Furthermore, as we had hoped, variables built around measures constructed from the call detail, such as size of calling circle, proved to be highly predictive of churn.

Part II of this commentary will appear in the upcoming edition of D S * .


[ Table of Contents | NEXT ARTICLE ]