Intrusion detection is a major nuance in information security, as it aims to
detect exploitation of system vulnerabilities by known legitimate users or
unknown illegitimate users of society. If the system, that is being exploited,
forms part of financial sectors or any mission critical organization, then there
could be catastrophic consequences for the organization. A very popular instance
of masquerader activity happened way back in Nov 2002, when 30,000 credit
histories were stolen by a helpdesk employee and sold to hackers, who logged in
as legitimate users and downloaded all credit reports. The information in those
reports was used to withdraw money from bank accounts and also to carry out
illegal transactions using those cards. This is considered to be one of the most
alarming cases in US history that caused major losses to a large number of
users. So, whenever an illegitimate user logs into the system, he has the
potential to inflict tremendous economic loss to the organization.
Direct Hit! |
Applies To: IT managers USP: Develop intrusion detection solutions using Open Source software Primary Link: None Google Keywords: Intrusion detection |
Previous research on intrusion detection techniques applied statistical
distance based methods such as Euclidean Distance, Mahalanobis Distance,
Manhattan Distance, Canberra Metric and Czekanowski Coefficient for detection.
Then probabilistic techniques were applied for detecting intrusions using data,
which is collected offline. Even Finite Automata models were built for
detection. However, these methods suffer from detection accuracy and the
percentage of detection's also very less. Lately, Neural Networks, Fuzzy Logic,
Genetic Algorithms, and their combinations are being applied to data that is
collected online and offline. As a result, detection accuracy has improved as
compared to previous techniques.
Intrusions can be divided into eight basic categories: Eves Dropping and
Packet Sniffing, Snooping and Downloading, Tampering or Data Diddling, Spoofing,
Jamming, Flooding, Masquerading, Exploiting Vulnerabilities, Password Cracking
and Keys. Many techniques have been developed to test for these intrusions. One
was developed by Matthias Schonlau, who collected Unix command data from 50
users and this is used as a benchmark in evaluating IDS using command sequences.
The results of six methods that were followed from this benchmark, viz
Uniqueness, Bayes One-step Markov, Hybrid Multi-step Markov, Compression,
Sequence-Match, Incremental Probabilistic Action Modeling showed very low false
negatives. The missing alarms fell in the range of 30-60%. Roy A Maxicon
extended Schonlau's work by testing the hypothesis of enriched command lines and
achieved a detection level of 82%.
Snort is an Open Source network intrusion prevention and detection system
utilizing a rule-driven language, which combines the benefits of signature,
protocol and anomaly based inspection methods. It is an advanced IDS using
Apache, MySQL, PHP, and ACID and can work in three different modes: Sniffer,
Packet Logger and Intrusion Detection. Using Snort, data can be collected from
various parts of networks and intruders can be detected.
But when the developed detection system is examined in real-life, it doesn't
produce appreciable results. The reason being that these algorithms can only be
applied when we have perfect training and testing instances, which are less
prone to noise. So, the information security world needs to invent novel
algorithms for intrusion detection to achieve excellent detection accuracy.
Masquerader detection
The aim of this research is to apply machine learning algorithms that are
most appropriate for the classification of proper and improper usage of the
resources, thereby detecting improper users, called masqueraders. Masqueraders
will be detected using two methods: the first one using the variations in the
probability distributions of system usage; and the second one using a machine
learning algorithm.
The audit source for our experiment is the enriched and truncated command
sequences that are processed for the detection of masquerader.
purge rm -i -i; clear ; /bin/ls —al |
more - Enriched command
purge - Truncated command
Naïve Bayes Classifier
Naïve Bayes Classifier is an excellent supervised learning algorithm which
has a very high success rate in text classification and information retrieval.
It is based on the Bayes Rule. The Posterior probability p (u/cs) of user u
given a command sequence cs is given as follows:
p (u) p (cs /u)
p (u/cs) = ...Eq (1)
p (cs)
where p (u) is the prior probability for user u, p (cs / u) is the
probability that the command sequence was generated by user u, and p (cs) is the
probability of occurrence of command sequence cs.
The approach uses four phases for detection and comparison. The first phase
is data preprocessing. Here, the command sequences are taken and the enriched
command sequences are filtered for their arguments and the truncated version of
the commands are processed. In the second phase, learning is done using Naïve
Bayes Classifiers, which use multi-variate Bernoulli method for the modeling of
command sequences. In the third phase, probabilistic approach is applied using
Euclidean distance measure for modeling sequences. The fourth phase compares the
results.
The data set is organized into a block of 100 commands. The learning task
here is to model a binary classifier.
The output binary classification could be stated as Classification = True, if
not Masquerader Classification = False, if Masquerader
The masquerader data were taken as positive examples and the legitimate
user's data were taken as negative examples. Here the true positive refers to
the masquerade block of 100 commands. False positive refers to the legitimate
user's command but misclassified as masquerader.
Probability of masquerader detection
In our experiment, only positive examples are used for training. We compute
only p (ci | u) for user's self profile. For non-self we assume that each
command has equal random probability 1/m. For a given test d, p (d | self) and p
(d | non self) can be compared. If the ratio of p (d | self) to p (d | non self
) is high then it is more likely that this command blocks d from user u.
Comparison results
The performance is measured with false positives and false negatives. False
positives refer to incorrect number of misclassifications. The false positive
rate refers to incorrect classification/sum of correct positive and correct
negative while the false negative refers to missed alarms for the actual
masquerader block. False negative rate refers to incorrect negative/no. of
correct positives. The table shows that Naïve Bayes classifiers perform well
over the usual probability distribution model. The detection rate is higher
whereas the false alarm rate is lower.Individual users are identified by the
Naïve Bayes Classifiers.
Conclusion and future work
In our experiment we tested the masquerade detection problem with two
methods Probability Distribution using Euclidean Distance Measure and Naïve
Bayes Classifiers. We observed that Naïve Bayes Classifiers perform well in the
detection of masqueraders. This can be extended to apply hybrid machine learning
algorithms for detecting the class of intruders using the data collected from
free/Open Source software parallel environment. We have also analyzed the
sequence of system calls executed by privileged programs to detect intrusions in
the system based on the normal usage styles of the system. The enriched and
truncated command sequences are collected from the network and these sequences
are used to build a normal behavior of the user using Naïve Bayes Classifiers
and the detection rate is in sync with usual probability techniques, where it is
very high.
Dr S Mercy Shalinie and Ms T Subbulakshmi, Deptt of Comp Sc and Engg,
Thiagarajar College of Engg, Madurai