Next Article Table of Contents Previous Article

CAN VOTING PATTERNS REALLY BE MINED?
by Ed Colet

The recently concluded Presidential election was especially closely contested. Already, there are all sorts of extensive and sophisticated data analyses generating all sorts of interesting results. These results can be quite influential in determining future strategies for campaigning, and perhaps even current strategies and policies for governing. As such, one might think that the analysis of voting patterns and the decision-making that follows such analysis is similar to the analysis and decision-making that a business routinely conducts to determine future strategic directions. In this column, I discuss the notion that this cannot really be the case, and that conclusions drawn about voting patterns should be made with caution.

The following interesting results were reported the day after the election by CNN based on exit poll data as of 4:30am the morning after the election. There's an interesting split in votes cast for Gore vs. Bush on the basis of whether there was a gun in the household. For homes that had a gun in the household, 36% voted for Gore, and 61% voted for Bush. For homes that did not have a gun in the household, 58% voted for Gore and 39% for Bush. There's also an interesting split among women that work. Working women split 58% and 39% for Gore vs. Bush, and non-working women split 44% and 52% for Gore vs. Bush. Patterns such as these and many others are typical of the type of results that representatives of each party will be ferreting out for weeks, and that the media will be reporting and interpreting upon for days to come.

But it should be noted that making decisions on purported voting patterns could be risky because the very nature of voting is truly anonymous -- unlike that of a business setting. If a business wanted to determine a strategy based on consumer purchases it is possible to do so because the necessary data are usually available. A business can determine the exact products purchased by a particular individual (especially if made by a credit card and/or shipped to the purchaser's address). Being able to identify the purchaser makes it possible to determine other characteristics such as gender, age, race, etc. In practice, a business usually does not analyze purchases down to the level of the individual due to privacy policies and/or the fact that it's not valuable to do so. But if there's a requirement to extract this information (e.g. in response to a subpoena) it is certainly possible to do so. Unlike a consumer purchase, an individual's vote is truly private and anonymous. After signing in, what happens in the voting booth after the curtain is closed cannot be linked to the individual voter.

Voting patterns are based on aggregated analysis and exit polls. Aggregated analysis associates known characteristics of a voting district with voting results from that district. A district has a certain profile associated with it such as income levels (from the tax rolls), gender and race distributions (from census data), family household sizes and age of children (from school enrollments), and other characteristics. Associating these demographics with voting results is one way to discover patterns that result. But pockets of truly interesting patterns -- e.g. the breakdown of votes based on whether there is a gun in the household may not be known by viewing things in the aggregate. This type of information has to come from one-on-one surveys or exit polls.

An exit poll is based on asking voters whom they voted for and then asking about personal characteristics. This relies on people volunteering information about their vote, as well as information about their gender, race, and age etc. But beyond these obvious characteristics are more sensitive ones such as income, party affiliation, union membership, sexual orientation, who they voted for in the past, religious affiliation and whether they attend religious services, position on issues such as abortion -- and this is only a partial list. Also, admitting to whether there is a gun in your household can be a more sensitive issue with different social stigmas associated with it in the Northeast than in the South, and thus there are regional influences associated with what people are willing to reveal.

At best, a person might divulge only a few of these characteristics along with how they voted. So one has a "snapshot" view rather than a "composite" picture. Few people would be willing to identify themselves on every one of these attributes. So if you wanted to "drill down" to find out how a person with particular values on each of these attributes voted, it's simply not possible -- you don't have this information. Exit polls also cannot capture information from people that refuse to be polled (non-response bias), and assumes that the people that do agree to be polled are responding truthfully.

Controlling for non-response bias to still ensure a representative sample, sampling a sufficient number of people, and adjusting for non-truthful answers, all rely on sophisticated survey techniques. For the most part these techniques are employed to ensure accurate data. But each statistical manipulation or adjustment that is introduced may also introduce a distortion from the true situation -- what vote was cast, and by whom? Because the voting system is one that protects the sanctity of an individual's vote, it is not possible to get to the actual data that one really should be analyzing. Ironically, in a close election, analyzing data becomes more critical, but more difficult to discern true trends and patterns. Elections aren't decided on the basis of polls and surveys, and so strategic decisions based on polls should be made with caution.


Ed Colet is the Acting Director of Research at Virtual Gold Inc., responsible for developing analytical methods for data mining and for investigating human factors and usability issues of business intelligence systems. At present, he is in the final stage of completing a doctoral dissertation in the Cognition and Perception program at New York University's Department of Psychology. Ed has also worked for IBM Research at the T.J. Watson Research Center. At IBM, Ed was a member of the group that developed Advanced Scout, the data mining application for NBA teams. His research interests focus on statistical methods and human factors.

For more information, see www.virtualgold.com.

Top of Page


Previous Article  |  Table of Contents  |  Next Article