[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]

DECISION TECHNOLOGIES IN DATABASE MARKETING: PART VIII
by Gene M. Ferruzza, Senior VP, Decision Technologies


The marketer, in this case, is the expert, and the rule or query is based on his experience or expertise. Alternatively, the rule could have been a deliverable from another data-mining project. In either case, the knowledge discovered is the list of customers in the extracted segment. With further data mining, more information may be uncovered from the segment; now that the most profitable segment has been isolated, other new profiles can be developed based on the more- and less-profitable segments.

In many cases, segmentation based on customer behavior data is less useful to the business user than other information that is not routinely captured in the data mart. It may be more useful to segment the customers according to such characteristics as attitudes, feelings, tendencies, or other information unrelated to their past customer behavior. This type of information is available only through direct contact with the customer or prospect, most often through written surveys and telephone or face-to-face interviews.

Customer or prospect data marts often contain data on millions of records, whereas it is impractical to make direct contact with more than a couple thousand individuals through surveys or telemarketing. Primary research data are available only on those surveyed. Using this information to segment the entire customer base (which may number in the millions) depends heavily on the use of data-mining techniques. For the survey data or the segmentation to be useful, it must be imputed onto the rest of the customer or prospect base.

Data imputation involves associating the survey data or segmentation with data that are available for all customers. For example, in a customer satisfaction survey conducted on two thousand customers, we may have information on whether or not these customers are satisfied with product performance. The entire customer base may consist of four million customers. Using customer data available for all four million customers (including the customers who were surveyed), we develop a model with only the two thousand surveyed customers, targeted at describing satisfied and dissatisfied customers. The model is then deployed on all four million customers. The model imputes the satisfaction information onto each individual customer record, indicating whether that customer has the profile of a satisfied or dissatisfied customer (based on the surveyed sample).

Overview of Data-Driven Model Development

Understanding complex modeling methodologies is critical for developing the types of models most useful for database marketing. As mentioned above, complex modeling techniques are data-driven. In data-driven model development processes, the goal is to map customer data (from a database) to a target customer behavior (also represented in the database). The model represents an algorithm for mapping the data to the target behavior.

As discussed above, model development begins with customer data from the data mart, and the resulting model(s), along with customer communications strategies, are deployed through a decision system, or campaign manager. The data-mining processes used to develop complex segmentation models for database marketing vary substantially, depending on the type of model, the modeler, and the customer data that are available. For this reason, I provide only a general description of a complex modeling methodology, organized according to the phases of a model development project: data preparation, data representation, exploratory data analysis, variable reduction, tuning of model parameters, and model recalibration.

Data Preparation

The process of developing a data mart goes a long way towards providing access to centralized customer data in a form useful for data-mining operations, particularly model development. A professionally developed data mart allows the modeler to sample the customer base efficiently and to employ a rich set of customer characteristics. Because customer and prospect databases may involve millions of individual records, the modeler usually will operate on a sample of the database. Proper extraction of the sample is critical to the success of the model.

The modeler first must identify an unambiguous target behavior for each individual record in the sample. Such behaviors include product purchase, defection, high or low usage, marketing response, and customer acquisition. Any one of these behaviors can be used to tune the parameters of parametric or non- parametric models. If the target behavior is extracted from a later time period than all of the other customer data, a predictive model may be developed; if not, the resulting model is descriptive.

It is crucial that the sample be extracted by statistically sound methods. All characteristics of the sample will be apparent in the model's performance. If the sample is biased (e.g., because a segment of customers has accidentally been excluded or included), this bias will affect model performance (e.g., the model will not deploy properly on customers belonging to a missing segment).

Gene Ferruzza may be contacted at gmf@cmsnet.com

Part IX of this series will appear in next week's edition of D S * .


[ PREVIOUS ARTICLE | Table of Contents | NEXT ARTICLE ]