DATA MINING TAKES A LEAF OUT OF JURASSIC PARK
by Krishnakumar Ramanujam
Recently, I read a report about cities in the US where the police constantly monitored several blocks of streets using automatic surveillance equipment. A few nights ago, I also watched a program on CNN that described an entire city in the UK that was under automatic surveillance. The program went on to talk about the trend of increasing use of automatic surveillance by law enforcement agencies, increasing numbers of walled communities, and increasing numbers of prisons.
These trends suggest that large amounts of multimedia data are being gathered. Using these data efficiently and correctly, is, of course, a different question altogether. The book, "Jurassic Park" exemplifies what I am talking about.
In this acclaimed novel, later made into a very successful movie, Michael Crichton takes the reader to a fictitious dinosaur-theme park, located on a remote island. The park has been populated with dinosaurs, thanks to sophisticated genetic engineering technologies. The designers and wardens of the park have paid the utmost attention to safety and security of visitors, for obvious reasons. Apart from the usual gamut of electrified fences, and heavy-duty weaponry -- and this is where the opportunity for data mining entrepreneurs arises -- each animal on the park is monitored throughout the day, using surveillance cameras, sensors, and other mechanisms. The surveillance results are fed into a central computer, which then spews out results, based on a simple querying mechanism.
The computer software controlling the surveillance system has been designed to answer a very simple question - "Can the system account for the presence of <n> <animals>?", where <n>, referring to the number of animals to be accounted for, and <animals>, referring to the exact species of dinosaur, are user inputs.
Therein lay the flaw in the system. The software designers were concentrating on the fact that some animals would attempt to escape, and hence wanted to alert the user when the system could not account for all the animals that were supposed to be there. However, the Achilles heel lay in the fact that the mode of questioning the system never alerted the user when it could actually account for more animals than were supposed to be present.
The park designers never considered the possibility that the number of animals could actually increase due to reproduction (which they thought they had ruled out by having animals of the same gender on the park). Consequently, for example, if there were supposed to be 20 velociraptors in the park, the system would answer "Yes" if asked if it could account for 20 velociraptors, but it might very well also have answered "Yes" if someone had thought to ask if it could account for 30, or 40, velociraptors (depending on the reproductive efficiency of the velociraptor). Eventually, of course, this flaw in the system, juxtaposed with other circumstances, led to the collapse of the park.
The moral of the story: the longest journey begins with a single step, but it must be the right step. You cannot reach Vegas if you board the flight to LA. To get the right answer, you must start with the right question.
In other words, a computer program for the analysis of surveillance data will inevitably look for unusual conditions. But, also inevitably, its software designers will have made some assumptions about what those unusual conditions are. Security will be breached when those assumptions are violated.
---
You can find out more about data mining via http://www.virtualgold.com