[ Table of Contents | NEXT ARTICLE ]

IBM DATA MINING: AN INTERVIEW WITH DAVID MARTIN                      10.21.97
by Alan Beck, editor in chief                                          D S *

To learn more about IBM's perspectives on data mining, D S * interviewed David Martin, Development Manager for IBM's net.Mining Solutions group.

---

D S * : CIOs are faced with enormous numbers of complex profiles. How can they position their technologies to efficiently ferret out provocative behaviors?

MARTIN: "It is vital to look at _all_ the methods and means by which a company interacts with its customers, whether that's through warranty cards, inbound/outbound call centers, the Internet, face-to-face meetings etc., as a component of a multichannel customer relationship management solution. Information collected, analyses produced, and results of various campaigns can be leveraged throughout the organization regardless of where information originally comes from or for what purpose an analysis was done.

"Today many are struggling with the ability of their IT infrastructure to even support customer relationship management, but as an appropriately clear vision of it is communicated down the ranks of an IT organization as well as up and out across that organization, IT's role in synergizing that data will grow, so that the stored knowledge can be both leveraged and increased."

D S * : With the tremendous proliferation in both number of profiles and aspects per profile, can DB dimensions be ascertained a priori?

MARTIN: "It's difficult. As you set up each individual repository that your IT community wishes to support for each group of users, you should provide an architecture strategy oversight group that can review ongoing developments, not with the intent to force that evolving structure into an enterprise-level structure, but with the perspective of looking for opportunities to synergize the subcomponents and unify them where appropriate.

"There are two technical aspects important in this context. One is technology that provides access to heterogeneous databases. Several companies, including IBM, are now providing such resources. Second is the utilization of technical aptitudes of IT personnel to assist the rest of the organization in identifying the opportunities for synergy and then providing suitable support so those disparate islands can be brought closer together.

"Recently, the industry has seen a strong push for "enterprise data marts," which is really that mechanism bubbling up from the bottom rather than being guided from above. The type of characteristics promulgated throughout an organization can only be adequately ascertained through analysis at a high level by trained staff, e.g. statisticians, object modeling experts, and those who thoroughly understand the business domains of the various segments."

D S * : Once the dimensions have been determined, how can they be efficiently tested?

MARTIN: "One technique, called attribute focusing, was developed at IBM by Inderpal Bandhari and has subsequently become quite well known. Here you utilize different information repositories, looking at key attributes of interest to a specific population. Then you attempt to run down all of the potentially correlateable attributes from external sources, looking for deviations from the norm.

"In some respects, this is the reverse of what IBM's FAMS (Fraud and Abuse Management System) employs in order to identify deviations from the expected norm. Attribute focusing looks for attributes that appear to have high correlation, without any knowledge going into the effort to identify those that could be codependent. Many of those aspects are based upon statistical modeling, but the application of those models is carried out in a manner that was not previously considered.

"A second method is providing to trained machine-learning or statistical analysts segments of data representing populations that a component of the organization considers exemplary of either positive or negative occurances. Such exemplars should be cross-analyzed with exemplars from other organizations in a search for the same types of correlations between organizations. Hopefully, the identification of such correlations will elicit specific attributes that can then provide good indications of why those exemplars are representative of those specific subpopulations.

"A number of different, principally statistical techniques can implement this method. Principal components analysis is one; clustering is another. They can be used together to tease out the most distinctive attributes for the exemplars from the subpopulations."

D S * : FAMS emphasizes fuzzy logic in the assignment of scores. Why?

MARTIN: "Fuzzy logic provides for an item to not necessarily be in or out of a given set of elements. Lofti Zadeh, who developed the technique at Berkeley, compares it to a state of being not quite or partially in a set. FAMS looks at the deviations from the norm for a given set for some attribute of the population. In the case of health care, for example, it reviews how many procedures of a certain type were performed during a given period in conjunction with other attributes such as, for example, where an individual went to school, the number of years in practice, location of practice, etc.

"From a fuzzy logic perspective, this can provide partial, complete, or no membership for any given physician or practice group to be in or out of a set having minimum membership criteria. For example, you may have fifteen numeric attributes reflecting the practice of a given physician. By producing Gaussian or other types of curves representing the norm across the entire population, I can selectively set a threshold, say one or two standard deviations, and then identify individuals who might exceed certain of those measures. Through fuzzy logic, you can come up with an entire subset of the entire population who exceed overall the criteria for membership. For instance, someone might have seven out of ten possible attributes exceeding the set norm. Hence the subpopulation that exceeds those seven would be viewed as potentially fraudulent practices.

"Of course, the very same effort could be applied to a wellness perspective: practices with a high value for a low number of people admitted to hospital with a particular disease could be applied for review of other practices with characteristics falling within the ranges of that criteria. Some of these might be, for example, the number of annual physicals performed given a population base. That type of preventive care might be highly correlated with avoiding lengthy hospital stays.

"Essentially, you're building a soft description of the characteristic set you're looking for. Fuzzy logic is used to populate that bucket with those who meet some, if not all, of the ranges of those criteria."

D S * : How does IBM utilize such strategies within its own organization?

MARTIN: "IBM has a real focus on business intelligence. IBM feels business intelligence is a strategic component of IT: it is not a fad or flash-in-thepan. From the glass houses where we process payrolls, claims, and data allowing the organization to automate its underlying clerical infrastructure, IBM sees business intelligence as a means to help customers move forward by assisting them with automation of analytical or knowledge components.

"Thus, the expertise of a single individual can be leveraged by a tool such as IBM's Intelligent Miner or a software package like FAMS or by its Business Discovery Solutions and then disseminated out to the entire organization through applications like Lotus Notes and Domino, so the entire organization can actually see the benefits of automation of knowledge generation and management instead of simply clerical issues. We see business intelligence as a key enabling technology that is critical to the life-blood of corporations' infomation technology.

"IBM itself already sees the value-added benefits of having this type of knowledge management infrastructure in place in their efforts at customer care and customer relationship management. Of course, the first to appreciate this are organizations with very large IT budgets, since they are used to doing analyses against their data for actuarial or risk evaluations. They see that customer acquisition, care and retention are vital to the growth of a base of profitable customers. And IBM allows itself to organize itself in a way that most effectively meets criteria that customers are looking for, regardless of how underlying markets might shift."


Alan Beck is editor in chief of D S * and vice president of publications for Tabor Griffin Communications. Comments are always welcome and should be directed to alan@tgc.com

[ Table of Contents | NEXT ARTICLE ]