[ Table of Contents | NEXT ARTICLE ]

DECISION TECHNOLOGIES IN DATABASE MARKETING: PART VI
by Gene M. Ferruzza, Senior VP, Decision Technologies


Segmentation techniques include basic and complex data-mining operations using attitudinal, behavioral, and demographic data. All customer segmentation schemes use a set of instructions that, when executed, will place a customer into a particular segment. This set of instructions is referred to as a "model." The remaining discussion of data-mining techniques (both basic and complex) focuses on segmentation models and the process of modeling.

A segmentation model should not be confused with a data model. A data model makes up the design of a database, organizing how and where data will be stored and how data will be retrieved. A segmentation model enables us to classify customers into different categories (e.g., by which product they are likely to buy, or by high or low risk of defection) or to estimate a value for each customer (e.g., how much a customer will spend, or what a customer's usage will be next month). Below, any reference to "models" is to segmentation models.

The customer base usually is segmented according to customer attitudes or behaviors that are relevant to the corporate relationship. Each individual customer must be assigned to a segment through the use of a descriptive or predictive model. An important characteristic of a model is whether it predicts a customer's future behavior or describes a customer's past behavior.

One of the most beneficial aspects of data mining is the ability of predictive models to forecast a customer's behavior, such as purchase activity, defection, response to communications, product usage, or other activities. In contrast, descriptive models are used to profile an individual customer's characteristics in a customer population. They are designed not to forecast any behavior, but only to compare one type of customer with another. For example, a descriptive model may be developed to separate heavy cellular phone users from light users within a customer base. Third-party demographic data are used as inputs to develop a profile of each type of user. The resulting model is used on individual prospects from a prospect database to create a list of individuals who look like heavy users. The model doesn't predict anything about these individuals' future behavior -- it simply produces a list of prospects with profiles similar to those of light and heavy cellular phone users.

All segmentation models fall into one of six categories: a model is either descriptive or predictive and either fixed, parametric, or non-parametric.

Fixed Models

Fixed models are the most straightforward and easiest to understand. An example of a simple fixed model is one for calculating the potential profitability of a customer at a bank: (0.004 * CUSTOMER INCOME) + (TIME-ON-BOOKS in months * 50).

This formula has two parameters (0.004 and 50) that are already known, based on an understanding of what profitability means for this particular bank. These parameters are fixed. Customer income and the time-on-books information come from the individual customer's data. When this model is deployed on the customer base, every customer's INCOME and TIME-ON-BOOKS are given as inputs to the model. The resulting profitability value is added to the customer record, to be used as a value mechanism in a marketing program.

Parametric Models

Parametric models differ from fixed models in that they include one or more parameters that are not fixed, but are undefined. Examples of parametric models are linear and logistic regression models. A parametric version of our banking profitability model might look something like this: (W1 * CUSTOMER INCOME) + (W2 * TIME-ON-BOOKS) + W3 where W1, W2, and W3 are undefined parameters, which are estimated from the data. A parametric model is based on the assumption that there are certain functional relationships between the target behavior and the independent variables. For example, in linear regression models, it is assumed that there is a linear relationship between the independent variables and the target variable. In a logistic regression model, it is assumed that the independent variables are linearly related to a logistic transformation of the target variable.

Regression functions will fit the solution (i.e., the profitability value) to a predefined structure, i.e. (W1 * X1) + (W2 * X2) + W3 = Answer. If the solution doesn't fit this structure (i.e., the function fails to set the parameters with acceptable performance), then the modeler must transform and manipulate the data to fit the structure. This process can be time-consuming; it usually accounts for the majority of the model development time. Once the data are prepared, one of the regression search algorithms (there are several standard techniques to choose from) is used to find the set of parameters (W1, W2, and W3) resulting in the best fit of the data to the target (in this case, the profitability value). The parameters are tuned to fit the data; therefore, the model development process is said to be "data-driven."

Part VII of this series will appear in the next edition of D S * .
---
Gene Ferruzza may be contacted at gmf@cmsnet.com 


[ Table of Contents | NEXT ARTICLE ]