[ Table of Contents | NEXT ARTICLE ]

WHEN THINGS ARE INCONCLUSIVE AFTER YEARS OF STUDY - PERHAPS TRY A DATA MINING APPROACH
by Ed Colet


There are many situations in which years of effort and tremendous amounts of financial resources have been devoted to the careful study of a specific issue, only to conclude that little can be concluded. For every study claiming to show an effect, there is another study pointing out the lack of an effect, and thus we arrive at a lack of consensus. If the issue is important enough (e.g. a public health concern), further studies are commissioned. But if skilled and talented minds have devoted themselves to such issues, why does such a situation seem to continue? Perhaps it's because in situations where effects are truly subtle (but important), traditional analytical approaches are limited. If so, the alternative of a data mining approach may be the one that results in true knowledge and insights being gained. In this column, I expand on this notion by discussing it in the context of a possible link between cell phone use and brain cancer.

The issue of cell phones and brain cancer is an example of a situation marked by a lack of agreement about whether there is anything to worry about. After reading about this issue (most recently in the NY Times, 10/26/1999, "Cell phones: Questions but no answers", page F8), it seems that a data mining approach would be helpful. Perhaps we would then see a future headline titled, "Cell phones: Clear answers to some questions".

As reported in the NY Times, Wireless Technology Research (WTR), an independent research group sponsored by the wireless phone industry's trade association has completed a $27 million, 6 year study to research whether there is a causal link between cell phone use and the occurrence of brain cancer. WTR's mandate ends in December of this year. The WTR Director claims that there are enough alarms to advise consumers to distance themselves from cell phones, and the public should be informed about possible risks via a "consumer information package", combined with an extensive after-marketing research program to collect more data. This is in contrast to the wireless industry's stance of assuring the public that cell phones are safe and no immediate action is necessary. Needless to say, they're against making any policy that would risk alarming the public unnecessarily, especially since much of the research should be scrutinized through the scientific peer review process. They are in favor of conducting more studies.

There may be a link between cell phones and brain cancer but numerous factors are probably involved. It is difficult to show causation from correlation, and in this case the correlations themselves are not strong, but may be moderated by numerous other intervening variables. Examples of intervening variables that may confound things and make interpreting results difficult could include such things as the particular wireless technology used (differs in US and Europe), different ways phones are used, differences in the users (age, gender, physical health, socio-economic status can pre-dispose one to medical conditions etc). Given numerous possible confounds, the traditional reductionistic approach is to design experiments that control or eliminate the possibility of confounding (e.g. only use a specific phone model, match experimental and control group on user characteristics, etc). But if the effect is subtle and moderated by other variables (which are now excluded) then this approach may not find the link. In statistical terms, there is insufficient power to detect an effect that exists. Therefore, conducting more traditional studies along these lines may not be productive.

An alternative methodology would be an expansionist rather than a reductionist approach. This would collect data on numerous possibly important variables. This could be done as part of the "after-marketing data collection" proposal of the WTR. Some data can even be automatically collected (type of phone, time and length of call, etc). Coupled with user-profile data (age, gender, etc), this can result in a tremendously rich data set for mining. Amassing a large amount of data over a long-time period and subjecting this analysis to data mining technology may uncover interesting patterns (e.g. right handed users have tumors on the right side). Well-designed data mining technology can easily handle the large amount of data generated by numerous users, and tease out the nature of complex interactions among variables. Doing so may shed more definitive light on the safety of cell phones.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see http://www.virtualgold.com.


[ Table of Contents | NEXT ARTICLE ]