Analysis & Commentary:
SCIENTISTS FIND NEW WAY TO SPOT HACKERS
by Mike Martin
A data mining method long overlooked by the computer security industry may be
the best way to detect clever hackers, a team of researchers from universities
in Pennsylvania and Iowa said.
Rough sets, the researchers claim, are the technique of choice to identify
footprints left by computer intruders. They are a form of data mining, the
automated extraction of hidden predictive information from databases, commonly
used to detect hackers.
"Intrusion detection systems help network administrators prepare for and deal
with network security attacks," Pennsylvania State University information
science professor and team member Chao-Hsien Chu told United Press
International from University Park. "These systems collect information from
network sources and analyze them for signs of intrusion and misuse."
The team's study is the first to evaluate and compare multiple data mining
methods for intrusion detection, Chu said.
Computer security breaches occur with alarming frequency. Within the past
three years, denial-of-service attacks have shuttered Yahoo, Amazon, eBay,
Datek and eTrade.
"The number of attacks is doubling every year and the General Accounting
Office estimates that only 1 to 4 percent of these attacks will be detected,"
Chu said.
Chu's team used a simple e-mail program -- Unix sendmail -- to study the
performance of three data mining methods that can be used to detect network
attacks: inductive learning, neural networks, and rough sets.
Inductive learning occurs when a subject infers general laws from particular
examples. Neural networks allow a computer to learn from input data. Rough
sets are mathematical algorithms that take the automated learning process a
step further -- they interpret the uncertain, vague, or imprecise information
that unauthorized network intruders may leave behind to avoid detection.
With every object, we associate information or data that may be precise or
imprecise. If objects are patients suffering from a certain disease, for
instance, symptoms of the disease form information about the patients. That
information may be precise -- the patient has the flu -- or imprecise -- the
patient has a viral infection.
Rough sets replace each vague concept with a pair, or set, of precise concepts
-- a lower and the upper approximation of the imprecise term. The imprecise
concept of a viral disease might be replaced, for instance, by a set of two
precise concepts that roughly approximate the patient's true condition -- the
patient has a cold and the patient has a stomach ache.
Such increasingly precise approximation allows one to rule out other precise
concepts that do not approximate the patient's condition. AIDS and viral
hepatitis, while precise concepts, do not approximate the condition of a
patient with the flu. Rough sets rule out inaccurate data -- from any database
-- using this method of rough approximation.
While neural network and inductive learning technologies have been used
successfully to detect intruders, rough sets have been overlooked, a fact that
makes the team's conclusion all the more remarkable.
"Within data mining methods, rough sets provide better accuracy, followed by
neural networks and inductive learning," Chu told UPI. Rough sets are
approximately 75 percent accurate, neural nets approximately 70 percent
accurate, and inductive learning about 51 percent accurate in ferreting out
intruders.
Smarter methods such as rough sets will replace commercially available
watchdog systems that depend on traditional statistical techniques, Chu said.
Such traditional techniques allow a network administrator to detect a simple
intrusion.
"If you work for a company that's open from 8 to 5 and you see unexplained
network activity increase substantially at 5:05, that may be an intruder's
footprint," University of Missouri professor and computer science chairperson
Harry Tyrer told UPI from Columbia.
Seasoned hackers, however, rarely leave such clear footprints.
"Rough sets are starting to come up as being important for this reason," Tyrer
said. Intrusion evidence may be spotty at best and "rough sets allow you to
work with that kind of incomplete information," he explained.
The study appears in the February issue of the journal Decision Sciences.
|