Next Article Table of Contents Previous Article

Analysis & Commentary:

SCIENTISTS FIND NEW WAY TO SPOT HACKERS
by Mike Martin

A data mining method long overlooked by the computer security industry may be the best way to detect clever hackers, a team of researchers from universities in Pennsylvania and Iowa said.

Rough sets, the researchers claim, are the technique of choice to identify footprints left by computer intruders. They are a form of data mining, the automated extraction of hidden predictive information from databases, commonly used to detect hackers.

"Intrusion detection systems help network administrators prepare for and deal with network security attacks," Pennsylvania State University information science professor and team member Chao-Hsien Chu told United Press International from University Park. "These systems collect information from network sources and analyze them for signs of intrusion and misuse."

The team's study is the first to evaluate and compare multiple data mining methods for intrusion detection, Chu said.

Computer security breaches occur with alarming frequency. Within the past three years, denial-of-service attacks have shuttered Yahoo, Amazon, eBay, Datek and eTrade.

"The number of attacks is doubling every year and the General Accounting Office estimates that only 1 to 4 percent of these attacks will be detected," Chu said.

Chu's team used a simple e-mail program -- Unix sendmail -- to study the performance of three data mining methods that can be used to detect network attacks: inductive learning, neural networks, and rough sets.

Inductive learning occurs when a subject infers general laws from particular examples. Neural networks allow a computer to learn from input data. Rough sets are mathematical algorithms that take the automated learning process a step further -- they interpret the uncertain, vague, or imprecise information that unauthorized network intruders may leave behind to avoid detection.

With every object, we associate information or data that may be precise or imprecise. If objects are patients suffering from a certain disease, for instance, symptoms of the disease form information about the patients. That information may be precise -- the patient has the flu -- or imprecise -- the patient has a viral infection.

Rough sets replace each vague concept with a pair, or set, of precise concepts -- a lower and the upper approximation of the imprecise term. The imprecise concept of a viral disease might be replaced, for instance, by a set of two precise concepts that roughly approximate the patient's true condition -- the patient has a cold and the patient has a stomach ache.

Such increasingly precise approximation allows one to rule out other precise concepts that do not approximate the patient's condition. AIDS and viral hepatitis, while precise concepts, do not approximate the condition of a patient with the flu. Rough sets rule out inaccurate data -- from any database -- using this method of rough approximation.

While neural network and inductive learning technologies have been used successfully to detect intruders, rough sets have been overlooked, a fact that makes the team's conclusion all the more remarkable.

"Within data mining methods, rough sets provide better accuracy, followed by neural networks and inductive learning," Chu told UPI. Rough sets are approximately 75 percent accurate, neural nets approximately 70 percent accurate, and inductive learning about 51 percent accurate in ferreting out intruders.

Smarter methods such as rough sets will replace commercially available watchdog systems that depend on traditional statistical techniques, Chu said. Such traditional techniques allow a network administrator to detect a simple intrusion.

"If you work for a company that's open from 8 to 5 and you see unexplained network activity increase substantially at 5:05, that may be an intruder's footprint," University of Missouri professor and computer science chairperson Harry Tyrer told UPI from Columbia.

Seasoned hackers, however, rarely leave such clear footprints.

"Rough sets are starting to come up as being important for this reason," Tyrer said. Intrusion evidence may be spotty at best and "rough sets allow you to work with that kind of incomplete information," he explained.

The study appears in the February issue of the journal Decision Sciences.

Top of Page


Previous Article  |  Table of Contents  |  Next Article