CAN VOTING PATTERNS REALLY BE MINED?
by Ed Colet
The recently concluded Presidential election was especially closely
contested. Already, there are all sorts of extensive and sophisticated data
analyses generating all sorts of interesting results. These results can be
quite influential in determining future strategies for campaigning, and
perhaps even current strategies and policies for governing. As such, one
might think that the analysis of voting patterns and the decision-making that
follows such analysis is similar to the analysis and decision-making that a
business routinely conducts to determine future strategic directions. In this
column, I discuss the notion that this cannot really be the case, and that
conclusions drawn about voting patterns should be made with caution.
The following interesting results were reported the day after the election
by CNN based on exit poll data as of 4:30am the morning after the election.
There's an interesting split in votes cast for Gore vs. Bush on the basis of
whether there was a gun in the household. For homes that had a gun in the
household, 36% voted for Gore, and 61% voted for Bush. For homes that did
not have a gun in the household, 58% voted for Gore and 39% for Bush. There's
also an interesting split among women that work. Working women split 58% and
39% for Gore vs. Bush, and non-working women split 44% and 52% for Gore vs.
Bush. Patterns such as these and many others are typical of the type of
results that representatives of each party will be ferreting out for weeks,
and that the media will be reporting and interpreting upon for days to come.
But it should be noted that making decisions on purported voting patterns
could be risky because the very nature of voting is truly anonymous -- unlike
that of a business setting. If a business wanted to determine a strategy based
on consumer purchases it is possible to do so because the necessary data are
usually available. A business can determine the exact products purchased by a
particular individual (especially if made by a credit card and/or shipped to
the purchaser's address). Being able to identify the purchaser makes it
possible to determine other characteristics such as gender, age, race, etc. In
practice, a business usually does not analyze purchases down to the level of
the individual due to privacy policies and/or the fact that it's not valuable
to do so. But if there's a requirement to extract this information (e.g. in
response to a subpoena) it is certainly possible to do so. Unlike a consumer
purchase, an individual's vote is truly private and anonymous. After signing
in, what happens in the voting booth after the curtain is closed cannot be
linked to the individual voter.
Voting patterns are based on aggregated analysis and exit polls. Aggregated
analysis associates known characteristics of a voting district with voting
results from that district. A district has a certain profile associated with
it such as income levels (from the tax rolls), gender and race distributions
(from census data), family household sizes and age of children (from school
enrollments), and other characteristics. Associating these demographics with
voting results is one way to discover patterns that result. But pockets of
truly interesting patterns -- e.g. the breakdown of votes based on whether
there is a gun in the household may not be known by viewing things in the
aggregate. This type of information has to come from one-on-one surveys or
exit polls.
An exit poll is based on asking voters whom they voted for and then asking
about personal characteristics. This relies on people volunteering
information about their vote, as well as information about their gender, race,
and age etc. But beyond these obvious characteristics are more sensitive ones
such as income, party affiliation, union membership, sexual orientation, who
they voted for in the past, religious affiliation and whether they attend
religious services, position on issues such as abortion -- and this is only a
partial list. Also, admitting to whether there is a gun in your household can
be a more sensitive issue with different social stigmas associated with it in
the Northeast than in the South, and thus there are regional influences
associated with what people are willing to reveal.
At best, a person might divulge only a few of these characteristics along
with how they voted. So one has a "snapshot" view rather than a "composite"
picture. Few people would be willing to identify themselves on every one of
these attributes. So if you wanted to "drill down" to find out how a person
with particular values on each of these attributes voted, it's simply not
possible -- you don't have this information. Exit polls also cannot capture
information from people that refuse to be polled (non-response bias), and
assumes that the people that do agree to be polled are responding truthfully.
Controlling for non-response bias to still ensure a representative sample,
sampling a sufficient number of people, and adjusting for non-truthful
answers, all rely on sophisticated survey techniques. For the most part these
techniques are employed to ensure accurate data. But each statistical
manipulation or adjustment that is introduced may also introduce a distortion
from the true situation -- what vote was cast, and by whom? Because the voting
system is one that protects the sanctity of an individual's vote, it is not
possible to get to the actual data that one really should be analyzing.
Ironically, in a close election, analyzing data becomes more critical, but
more difficult to discern true trends and patterns. Elections aren't decided
on the basis of polls and surveys, and so strategic decisions based on polls
should be made with caution.
Ed Colet is the Acting Director of Research at Virtual Gold
Inc.,
responsible for developing analytical methods for data mining and for
investigating human factors and usability issues of business intelligence
systems. At present, he is in the final stage of completing a doctoral
dissertation in the Cognition and Perception program at New York
University's Department of Psychology. Ed has also worked for IBM Research
at the T.J. Watson Research Center. At IBM, Ed was a member of the group
that developed Advanced Scout, the data mining application for NBA teams.
His research interests focus on statistical methods and human factors.
For more information, see www.virtualgold.com.
|