Next Article Table of Contents Previous Article

THE POTENTIAL FOR FLAWED REASONING IN THE CONTEXT OF DM
by Ed Colet

Interpretation is an important aspect of data mining. The analysis of large datasets can generate numerous patterns any of which can be potentially important. A careful interpretation of these results on the part of the end-user/analyst is often warranted. The process of interpreting patterns often leads to more questions to ask, and a sequence of follow-up queries are issued and drill-down reports generated. Thus a chain of reasoning is built. But just as a chain is only as strong as it's weakest link, a flawed step in reasoning can adversely affect conclusions that are drawn and actions that are taken. In this column I look at some errors in reasoning that people frequently and unknowingly commit and why.

Logical reasoning deals with the issue of whether conclusions logically follow from the set of premises. A valid conclusion is one that logically follows from the premises. What can or cannot logically follow is determined by the formal rules of logic. Note that whether an argument is valid or not is independent of the truth or falsehood of the premises or conclusions. Research has shown that people are often not very good at following the formal rules of logic when they undertake deductive or inferential reasoning.

The news about flawed reasoning is not all bad though. While people may not do well in terms of reasoning in the abstract (i.e. with symbols and letters), they can do quite well when the same task and argument structure is presented in a context or domain they're familiar with. Thus a domain expert interpreting data mining patterns can do quite well. But an analyst looking at data mining patterns in an unfamiliar domain -- such as when data mining is used to explore the data space one is not familiar with, then subsequent reasoning may be problematic.

A syllogism requiring the application of a rule of inference called modus ponens is one type of rule that can be followed quite well. Given an argument or syllogism in the form, "A implies B, and given A, then we can infer B" (denoted herein as: "A then B; A; conclude B"), is a form or an argument that can be handled well. The conclusion easily and logically follows from the premises. In the context of data mining a discovered pattern in this form may be: If a customer is aware of the sale/promotion for cheese, then the customer will buy cheese. The customer sees the sale on cheese. Therefore you conclude that a purchase will follow.

While people have little trouble with modus ponens, the rule of modus tollens is often not applied as well. The modus tollens rule states that "A then B; not B; conclude not A". The error that people often make is thinking that A, rather than not A can also be a valid conclusion. In the context of our data-mining example, this argument would take the following form: If a customer has seen the promotion on cheese, then the customer will make a purchase. The customer did not make a purchase. The logical conclusion is that the customer did not see the sale. The analyst/store manager may mistakenly infer that it would be logical to also conclude that the customer possibly did see the promotion (which does not follow from the first premise), and take other actions rather than enhancing awareness of the sale on cheese.

Failing to reason logically can affect a sequence of reasoning in hypothesis testing. This has been demonstrated using a test called the Wason card selection task. A rule such as "if a card has a vowel on one side, then it has an even number on the other" is presented for evaluation. Four cards are laid on the table showing for example, an "E", "K", "4", "7". Which cards are the only ones that need to be turned over to judge the correctness of the rule? The answer is the "E" and the "7". To test the rule, one has to find a violation of it, and this combination of cards is the only ones that can show a possible violation of the rule. Turning over the "4" to reveal either a vowel or consonant would not falsify the rule. Turning over the "K" also provides no information on the validity of the rule.

In the context of our gourmet cheese scenario, imagine that the analyst wants to test the hypothesis that if customers are aware of the sale on cheese, then they will make a purchase. The way to test this is to issue a query or drill-down report to find (a) whether customers saw the sale, but did not make a purchase; and (b) whether customers that did not make a purchase were aware of the sale. Quite often, other reports or queries are unnecessarily issued and/or the appropriate queries or reports are not made. If so, then the process and subsequent string of queries that make up a chain of reasoning is not sound.

What underlies much of the difficulty in logical reasoning is a "confirmatory bias" that people harbor. It's easier to think in terms of positive, rather than negative information. And there is a tendency to confirm a hypothesis or weigh confirmatory evidence more than information that will disconfirm a hypothesis. While it's always reassuring to be told that one's right, it may be more enlightening to find out when one's wrong.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see www.virtualgold.com

Top of Page


Previous Article  |  Table of Contents  |  Next Article