DECISION TECHNOLOGIES IN DATABASE MARKETING: PART X
by Gene M. Ferruzza, Senior VP, Decision Technologies
Simplification of the data also can be important. In contrast to normalization, which ensures that we don't lose information, simplification allows us to eliminate data that hold no value for the model. For example, EDA may indicate that it is important to know whether or not the individual is MARRIED, but that no further benefit is gained from knowing whether the customer's marital status is SINGLE, DIVORCED, or UNKNOWN. In this case the modeler may, to simplify the data, cluster the SINGLE, DIVORCED, and UNKNOWN fields into NOT MARRIED.
Variable Reduction
Variable reduction is becoming increasingly important in data-mining modeling processes because of the increasing amounts of data being collected. For database marketing, it is imperative that we simplify models (by using fewer variables) to aid the user in understanding how the model operates (i.e., makes decisions). In addition, use of fewer variables may make the model more robust and accurate. Development of variable reduction techniques has kept pace with development of modeling technologies. Many effective techniques for variable reduction exist; here, we consider two popular approaches. Bivariate correlation analysis often is used during EDA and usually before any model development. This variable-reduction technique calculates the correlation between each customer characteristic and the target behavior.
For example, if the target behavior is the purchase of a product, each customer characteristic is analyzed across all the customers, to determine whether it is correlated with the purchase of a product. All customer characteristics can then be ranked by the strength of their correlations with product purchase. The modeler then can work first with the most strongly correlated characteristics in a model and may never use the characteristics at the bottom of the list.
A second variable-reduction technique uses data-driven modeling algorithms to help find the key customer characteristics, eliminating the less important ones. In this approach, one or more models are developed, using all possible customer characteristics, to discover which customer characteristics are continuously the key drivers (of the target behavior) and which characteristics can be left out without decreasing the model's effectiveness. An advantage of this approach is that the influence of each customer characteristic is analyzed not alone but along with all of the other characteristics. Thus, this technique takes into account that the importance of some customer characteristics may depend on the state of other customer characteristics (in other words, two or more characteristics may interact to affect the target behavior).
In an example unrelated to marketing, consider a modeler developing a model to estimate the stopping distance of a car. Clearly, whether the road is wet makes a difference. However, its importance may vary depending on the speed of the car, the pressure on the brake pedal, the type of tires, and the type of road. Without considering all these characteristics simultaneously, the modeler has difficulty deciding how large an effect the wet road has on stopping distance.
Tuning of Model Parameters
The model development phase of most data-mining processes is a search for an optimal mapping of customer data to a target behavior or characteristic. In database marketing, a typical behavior model will map all or part of the customer data in a data mart to a specific behavior or characteristic. Figure 5 shows the theoretical mapping and its business application.
Models come in many forms, depending on the algorithms or technology used
to search for the best parameters. As discussed above, these technologies
may be either parametric or non-parametric.
---
Gene Ferruzza may be contacted at gmf@cmsnet.com
---
The final installment of this series will appear in the next edition of
D S * .