1. Introduction
The 21st century has witnessed the profound impact of the Internet, emerging as one of the most transformative inventions in our lives. Presently, the Internet transcends numerous boundaries, revolutionizing the way we communicate, engage in recreational activities, conduct work, shop, socialize, enjoy music and movies, order food, manage finances, extend birthday wishes to friends, and more. The indispensability of these service applications is paramount for modern organizations, demanding uninterrupted availability and global accessibility around the clock.
The exponential growth of sensitive services and webbased applications has become a magnet for hackers seeking lucrative gains, technological secrets, including vaccinerelated information, or any competitive edge. This surge in valuable data has not only enticed criminal organizations globally but has also led certain governmental entities to recruit exceptionally skilled security experts for cyberattack operations.
The continuous expansion of both lawful and unlawful activities has led to an exponential increase in the complexity and volume of Internet traffic. As a result, network security administrators grapple with everevolving and intricate challenges, striving to swiftly impede malicious traffic. To combat this, they heavily rely on a trio of key tools: Firewalls, SIEM (Security Information and Event Management), and IDSs (Intrusion Detection Systems), which stand as primary instruments for detecting and filtering suspicious traffic.
To scrutinize and identify potentially suspicious activities within network traffic using IDSs, two primary detection methods prevail: signaturebased and anomalybased detection. Signaturebased or misuse detection methods employ patternmatching techniques to identify preknown attacks. The primary advantage lies in their high accuracy, ensuring minimal false positives or negatives when detecting previously recognized suspicious attacks. Anomalybased detection methods necessitate an initial phase to comprehend normal traffic patterns, employing techniques like machine learning, statistical analysis, or knowledgebased methodologies. Any significant deviation between observed traffic and established norms is flagged as suspicious. The primary advantage lies in its capability to effectively identify unknown suspicious attacks with commendable accuracy.
The current state of the art presents a myriad of intriguing techniques (e.g., [1][4]) and tools that have notably bolstered network security by effectively detecting and thwarting malicious traffic. Nevertheless, the challenge persists: cyberattacks persistently wreak havoc, inflicting substantial damage. Hence, any novel contribution that mitigates the risks associated with network traffic would be immensely valued.
This paper introduces a novel technique employing differential analysis to discern suspicious network traffic. The approach initially segments traffic into smalltime slices, transforming each of them into a value in ${\mathbb{R}}^{n}$
. Subsequently, it computes the divergence between neighboring slices to unveil abrupt changes in traffic behavior. After that, clustering techniques are applied to abstracted intervals to validate traffic homogeneity (a single class) or detect significant variations (multiple classes), indicating potential suspicious activities.
The approach we introduce is geared towards enhancing the efficiency of Security Information Event Management (SIEM) [5], an integral component of a Security Operations Center (SOC) [6]. A (SIEM), such as Wazuh [7], encapsulates a suite of functionalities aimed at gathering, analyzing, and presenting information sourced from network and security devices. It essentially integrates two vital components: Security Information Management (SIM) and Security Event Management (SEM). SIM focuses on storing, analyzing, and reporting log files, while SEM is responsible for realtime monitoring, event correlation, notifications, and console views.
The rest of this paper is organized as follows. Section 2 delves into related works within the field. Section 3 details the methodology of the approach. Section 4 presents three case studies. Finally, concluding remarks are presented in Section 5.
2. Related Work
The state of the art contains many valuable techniques that have significantly contributed to the improvement of the security of network services and applications. Here, the study focuses on anomalybased detection techniques and methods that try to detect suspicious traffic based on IP packets information such as IP address (layer 3 in the TCP/IP Model), TPC or UDP ports (layer 4) and web application data (layer 5).
Najafabadi et al. proposed in [8] an anomaly detection mechanism for detecting HTTP GET flood attacks. They used the Principal Component Analysis (PCA)subspace method on the browsing behavior instances extracted from HTTP server’s logs in order to detect abnormal behaviors. They apply the approach to detect some DDoS and HTTP GET flood attacks. This approach used the supervised machine learning techniques.
In [9], Betarte et al. proposed a method based on machine learning to enhance the famous ModSecurity [10], a Web Application Firewall provided by OWASP, by using oneclass classification and ngram techniques on three datasets. The proposal method used the supervised machine learning techniques and provides better detection and false positive rates than the original version of ModSecurity.
Wang et al. presented in [11] a new web anomaly detection method which uses Frequent Closed Episode Rules Mining (FCERMining) algorithm to analyze web logs and detect new unknown web attacks. The method used the supervised machine learning techniques and has a detection rate of 96.67% and a false alarm rate of 3.33% for detecting abnormal users.
In [12], Brontë et al., proposed an anomaly detection approach that uses the crossentropy technique to calculate three metrics: cross entropy parameters (CEP), cross entropy value (CEV) and cross entropy data type (CET). These metrics aim to compare the deviation between learned request profiles and a new web request. The crossentropy approach performs better than Value Length and Mahalanobis distance approach. This approach used the supervised machine learning techniques, focused on detecting four types of web attacks: SQLI, XSS, RFI, and DT and has a detection rate of 66.7%.
Ren et al. presented in [13] a method based on the bag of words (BOW) model to extract features and efficiently detect web attacks with hidden Markov algorithms. BOW has higher detection rate and lower false alarm rate when compared with Ngram featureextraction algorithms. This approach used the supervised machine learning techniques to detecting SQL injection and crosssite scripting attacks. The accuracy increased to 96%, but the false alarm rate still remained low.
In [14], Pukkawanna et al. proposed a method using port pair distribution and KullbackLeibler (KL) divergence to detect suspicious flows when the KL divergence deviates from an adaptive 3sigma rulebased threshold. This approach used the unsupervised machine learning techniques to detecting mimicry attacks. The approach does not need any previous learning step.
Hounkpevi proposed in [15] a method using Kmeans, port pair distribution and KullbackLeibler (KL) algorithm that improves [14]. The approach compares the traffic of current time intervals with the nearby ones by applying the kmean algorithm. Any significant divergence means that the current time interval traffic is suspicion. This approach used the unsupervised machine learning techniques to detecting mimicry attacks. The proposal approach seems more efficient than [14].
In [16], Munz et al. presented a novel Network Data Mining approach that applies the Kmeans clustering algorithm to feature datasets extracted from flow records. Training data containing unlabelled flow records are separated into clusters of normal and anomalous traffic. This approach used the unsupervised machine learning techniques to detecting Port scans and D/oS attacks. In this approach there is a challenge to determine the optimum number of clusters.
Asselin et al. presented in [17] an anomaly detection model based on crawling method and ngram model that is effective in reducing the access to the log file generated by the web servers. It has shown to be a good solution for web applications blackbox analysis but it is not efficient for detecting attacks that use cookie or post data. This approach used the unsupervised machine learning techniques to detecting brute force, DDoS, Crawler Miss, High Load, Anomalous Query attacks and has a detection rate of 95%.
Swarnkar and Hubballi described, in [18], a new method for payloadbased anomaly detection that learns normal behavior and detects deviations. The approach makes a frequency range of occurrences of ngrams from packets in training phase and count the number of deviations from the range to detect anomalies. The approach showed lower false positives and higher detection rate when compared to Anagram methods.
Kang et al. [19] described a oneclass classification method for improving intrusion detection performance for malicious attacks. Results scores were evaluated based on artificially generated instances in twodimensional space. In the detection phase, the approach based on simple logic, the center of the normal patterns was determined at (0, 0), and two malicious class centers were at (1, 1) and (−1, −1), respectively. Experimental results on simulated data show better performance.
Camacho et al. [20] developed a framework that used a PCAbased multivariate statistical process control (MSPC) approach. The framework monitors both the Qstatistic and Dstatistic. Thereby, it was possible to establish control limits in order to detect anomalies when they became consistently exceeded.
Yoshimura et al. [21] proposed a new model called DOCIDS, which is an intrusion detection system based on Perera’s deep oneclass classification. This approach used the supervised machine learning techniques to detecting Multiattacks and has a detection rate of 97%.
Zavrak et al. [22] proposed an intrusion detection and prevention architecture called SAnDet which is based on an anomalybased attack detection module that uses the EncDecAD method to detect attacks. This approach used the semisupervised machine learning techniques to detecting DoS and Portscan attacks and has a detection rate of 99.3%.
The evaluation of the previous approaches according to cited criteria is illustrated by Table 1.
Table 1. Evaluation of the approaches.
Author 
Techniques 
Attacks types 
Target 
Learning types 
Logic rules 
Training is not required 
Multi target 
Detection rate 
Pukkawanna et al. [14], 2015 
KullbackLeibler (KL) Divergence 
Mimicry attacks 
TCP/ UDPPorts 
unsupervised learning 
× 
✓ 
× 
12.5% 
Hounkpevi [15], 2020 
 KullbackLeibler (KL) Divergence.  kmean algorithm. 
Mimicry attacks 
TCP/ UDPPorts 
unsupervised learning 
× 
✓ 
× 
66.7% 
Najafabadi et al. [8], 2017 
PCA (Principle Component Analysis)Subspace method 
detecting HTTP GET flood attacks DDOS 
HTTP.Url 
supervised learning 
× 
× 
× 
 
Betarte et al. [9], 2018 
 oneclass classification  ngram 
Multi attacks 
HTTP.Url 
supervised learning 
× 
× 
× 
90% 
Wang et al. [11], 2017 
FCER (Frequent Closed Episode Rules) Mining algorithm 
Unknown web attacks. 
HTTP.Url 
supervised learning 
× 
× 
× 
96.67% 
Bronte et al. [12], 2016 
Cross Entropy. 
SQLI, XSS, RFI, and DT. 
HTTP.Url 
supervised learning 
× 
× 
× 
66.7% 
Ren et al. [13], 2018 
Bag of words (BOW) model  Hidden Markov algorithms. 
SQL injection and crosssite scripting 
HTTP.Url 
supervised learning 
× 
× 
× 
96% 
Munz et al. [16], 2007 
Kmean algorithm. 
Port scans and D/oS attacks. 
TCP/ UDPPorts 
unsupervised learning 
× 
✓ 
× 
 
Asselin et al. [17], 2016 
blackbox approach (crawling based) Ngram model. 
brute force, DDoS, Crawler Miss, High Load, Anomalous Query 
HTTP.Url 
unsupervised learning 
× 
✓ 
× 
95% 
Yoshimura et al. [21], 2022 
oneclass classification. 
Multi attacks 
 
supervised learning 
× 
× 
× 
97% 
Zavrak et al. [22], 2023 
EncDecAD. LSTM. 
DoS Portscan 
 
semisupervised learning 
× 
× 
× 
99.3% 
The existing approaches could be evaluated according to many criteria such as:
Attack Types: The different types of attacks detected by the approach
Target: The fields of the IP packet that are analyzed by the approach to detect suspicious behaviors such as IP address, HTTP.Url and TCPUDP Port.
Learning Types: If the approach uses any supervised or unsupervised machine learning techniques.
Logic Rules: It is useful if the approach provides an expressive language such as temporal logic to specify a rich variety of malicious traffics (finegrained specification).
Training is not required: Most of existing approaches require a training step, but some few others do not.
MultiTarget: It is related to the ability of the approach to detect suspicious traffic that requires the analysis of many fields in IP packets in the same time.
Detection Rate: It gives the percentage of detected bad traffics.
3. Methodology
The detection of suspicious traffic is based on the following simple observation: the nature of the traffic should not change suddenly. If this happens, it will be suspicious. For example, there is no reason that the nature of the traffic between the period P_{1} = [10 am  10:30 am] will be so different from the period P_{2} = [10:30 am  11 am]. However, distinctions might reasonably exist between daytime and nighttime traffic patterns, as well as between traffic from different years.
Let $\mathcal{F}\mathrm{:}\mathbb{R}\to \mathbb{R}$
be a function such that $y=\mathcal{F}\left(x\right)$
measures a particular feature related to the network traffic.(e.g., x is time and y is the number of packets coming from a specific country). Assume that the curve of $\mathcal{F}$
is as shown by Figure 1, then it is clear that there exists a sudden variation from $f\left(4\right)$
to $f\left(5\right)$
which is suspicious.
More precisely, the traffic τ will be scattered to one or many sequences of ordered slices. On each of these slices, we apply a function $\mathcal{F}$
that measure some of its features. After that, we compute the distance between successive values of $\mathcal{F}$
as shown in Figure 2. The sudden changes of $\mathcal{F}$
appears, if there exist a big deviations between the measured distances.
Figure 1. Sudden variation in traffic.
Figure 2. Looking for sudden variation in traffic.
The function F may not solely yield a singular real value within $\mathbb{R}$
; instead, its outputs could exist within ${\mathbb{R}}^{n}$
. For example, it might produce a complete distribution that assesses various characteristics across analyzed slices of the trace. In such scenarios, assessing the disparity between F values could involve employing measures like KLdivergence or Euclidean distance.
Furthermore, in determining whether the variation between successive F values exhibits abrupt changes or unacceptable deviations, clustering analysis could be valuable. If the resultant clusters surpass one in number, and the expectation dictates smooth change in traffic distributions across successive slices, we conclude that the analyzed traffic is suspicious.
In the subsequent sections, we elaborate on and formalize all of these analyses.
To maintain simplicity in presenting the approach, we concentrate solely on network traffic. However, it’s important to note that the same concept can be extended to analyze any type of log file.
3.1. Preliminary Notations
In order to articulate the definition of suspicious traffic formal and more succinctly, it’s essential to establish a set of initial notations.
We assume that network traffic is represented by a sequence of stamped IP packets or messages where each one of them is a structure that contains a header and a payload. We suppose that we have access to any field (e.g., IP addresses, ports and protocols) to any nonencrypted header of the network protocols (e.g., IP, TCP and UDP) inside an intercepted traffic.
Definition 1 (Messages). We denote by $\mathcal{M}$
the set of messages that could be found in the network traffic.
${f}_{n}$
: we use ${f}_{n}$
to range over the possible fields in messages of $\mathcal{M}$
. Examples of ${f}_{n}$
are given in Table 2.
$m@{f}_{n}$
: if m is a message and ${f}_{n}$
is an attribute, we denote by $m@{f}_{n}$
the value of ${f}_{n}$
in m.
Table 2. Examples of attributes.
Stamped messages are called events and are defined as follows:
Definition 2 (Events). We denote by $\mathcal{E}$
, the set of the possible events built from $\mathcal{M}$
as follows:
$\begin{array}{l}e\mathrm{::}=\langle t\mathrm{,}m\rangle \\ t\mathrm{::}=time\\ m\in \mathcal{M}\end{array}$
$e@{f}_{n}$
: we denote by $e@{f}_{n}$
the value of ${f}_{n}$
in e. It is defined as follows: $\langle t,m\rangle @\text{T}=t$
and $\langle t,m\rangle @{f}_{n}=m@{f}_{n}$
, if ${f}_{n}\ne \text{T}$
.
A sequence of stamped events forms a trace.
Definition 3 (Trace). A trace τ over $\mathcal{E}$
is defined using the following BNF grammar:
$\begin{array}{l}\tau \mathrm{::}=\u03f5\mathrm{}e\mathrm{}e\mathrm{.}\tau \\ e\in \mathcal{E}\end{array}$
where $\u03f5$
is the empty trace. The “.” represents the chronological order, i.e., if e appears before e' in a trace τ, then necessarily e happened at a previous time than e'.
We introduce the following propositional logic allowing to verify whether an event in a trace respects some conditions. The main purpose of this language is to define specific patterns of messages we are looking for within the trace, such as message having a given source or destination IP addresses or ports.
Definition 4 (Propositional Event Logic). Let ${f}_{n}$
be a field name and v be a value, we introduce the Propositional Event Logic (PEL) as follows:
$\begin{array}{l}p\mathrm{,}q\mathrm{::}=\text{true}\mathrm{}\text{false}\mathrm{}{f}_{n}\mathrm{}op\mathrm{}v\mathrm{}p\vee q\mathrm{}p\wedge q\mathrm{}\neg p\\ op:=\text{\hspace{0.17em}}=\mathrm{}\mathrm{}\ne \mathrm{}\mathrm{}\le \mathrm{}\mathrm{}\ge \mathrm{}\mathrm{}<\mathrm{}\mathrm{}>\end{array}$
An event e respects a proposition p, and we say that $p\left(e\right)=\text{true}$
, if one of the following conditions holds:
$\begin{array}{l}\text{true}\left(e\right)=\text{true}\\ \left(\neg p\right)\left(e\right)=\neg p\left(e\right)\\ \left(p\vee q\right)\left(e\right)=p\left(e\right)\vee q\left(e\right)\\ \left(p\wedge q\right)\left(e\right)=p\left(e\right)\wedge q\left(e\right)\\ \left({f}_{n}\mathrm{}op\mathrm{}v\right)\left(e\right)=\left(e@{f}_{n}\right)\mathrm{}o{p}^{\mathrm{?}}\mathrm{}v\end{array}$
For instance, to know if (TCP.DestPort = 80)(e), we check if (e@TCP.DestPort) = ^{?} 80.
3.2. Trace Slicing
This step requires meticulous attention to ensure the approach’s effectiveness is maximized. It’s important to decompose the trace into one or multiple sequences of slices characterized by smooth variations. The end user must have a clear understanding of their activity’s nature to identify instances where sudden changes should not occur. Below, we provide some illustrative examples:
Significant and sudden fluctuations in traffic volume are often indicative of potential Denial of Service (DoS) attacks. To detect this activity, it’s appropriate to divide the traffic trace τ into successive discrete slices, denoted as ${\tau}_{1}\mathrm{,}\cdots \mathrm{,}{\tau}_{n}$
, each representing a predefined time window, such as 10 minutes.
The previous analysis will be more precise and efficient if we separate the traffic of different IP addresses. Also input traffic can be separated from output. Sudden variation in input traffic can be du to DoS attack but variation of output traffic can be generated by a malware (e.g. botnet) activity. Therefore this kind of separation allow us either to know the IP address in the suspicious traffic as well as the nature of the attack.
Input and output traffic of different IP address can be further separated into traffics related to different IP protocols and TCP ports.
The previous divisions can be further refined as we will show in the case study section. For instance, we can separate the traffic of different days of the weak. By doing so, we assume that traffic related to successive Monday should not present a sudden change.
The forthcoming definition introduces a slicing function designed to partition a trace, catering to diverse scenarios and requirements.
Definition 5 (Slicing). Let p be a propositional formula in PEL and τ be a trace in $\mathcal{T}$
. We inductively introduce a slicing function ${\mathcal{S}}_{p}\left(\tau \right)$
as follows:
$\begin{array}{l}{\mathcal{S}}_{p}\left(\u03f5\right)\mathrm{::}=\u03f5\\ {\mathcal{S}}_{p}\left(e\right)\mathrm{::}=\{\begin{array}{ll}\u03f5\hfill & \text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.05em}}p\left(e\right)=\text{false}\hfill \\ e\hfill & \text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.05em}}p\left(e\right)=\text{true}\hfill \end{array}\\ {\mathcal{S}}_{p}\left(e\mathrm{.}\tau \right)\mathrm{::}={\mathcal{S}}_{p}\left(e\right)\mathrm{.}{\mathcal{S}}_{p}\left(\tau \right)\end{array}$
Let ${p}_{1}\mathrm{,}\cdots \mathrm{,}{p}_{n}$
denote propositions. We extend the selection function to operate on sets of sequences of propositions as follows:
$\begin{array}{l}{\mathcal{S}}_{\left\{{p}_{1},\cdots ,{p}_{n}\right\}}\left(\tau \right)=\left\{{\mathcal{S}}_{{p}_{1}}\left(\tau \right),\cdots ,{\mathcal{S}}_{{p}_{n}}\left(\tau \right)\right\}\\ {\mathcal{S}}_{\langle {p}_{1},\cdots ,{p}_{n}\rangle}\left(\tau \right)=\langle {\mathcal{S}}_{{p}_{1}}\left(\tau \right),\cdots ,{\mathcal{S}}_{{p}_{n}}\left(\tau \right)\rangle \end{array}$
If $p\left(i\right)$
is a proposition that depends on i, we use the notation ${\langle p\left(i\right)\rangle}_{sart\mathrm{,}jmp}^{end}$
as an abbreviation of
$\langle p\left(start\right)\mathrm{,}p\left(start+jmp\right)\mathrm{,}\cdots \mathrm{,}p\left(start+n\ast jmp\right)\rangle $
where n is the natural number such that $n\ast jmp\le end$
and $\left(n+1\right)\ast jmp$
. For instance:
${\langle p\left(i\right)\rangle}_{\mathrm{1,2}}^{8}$
is same as $\langle p\left(1\right)\mathrm{,}p\left(3\right)\mathrm{,}p\left(5\right)\mathrm{,}p\left(7\right)\rangle $
, and
${\langle \left(\text{T}\ge \mathrm{10.00.}j\right)\left(\wedge \text{T}\le \mathrm{10.00.}\left(j+10\right)\right)\rangle}_{j=\mathrm{0,10}}^{60}$
is same as $\langle {p}_{1}\mathrm{,}\cdots \mathrm{,}{p}_{6}\rangle $
, where:
$\begin{array}{l}{p}_{1}=\left(\text{T}\ge \mathrm{10.00.00}\right)\left(\wedge \text{T}\le \mathrm{10.00.10}\right)\\ {p}_{2}=\left(\text{T}>\mathrm{10.00.10}\right)\left(\wedge \text{T}\le \mathrm{10.00.20}\right)\\ {p}_{3}=\left(\text{T}>\mathrm{10.00.20}\right)\left(\wedge \text{T}\le \mathrm{10.00.30}\right)\\ {p}_{4}=\left(\text{T}>\mathrm{10.00.30}\right)\left(\wedge \text{T}\le \mathrm{10.00.40}\right)\\ {p}_{5}=\left(\text{T}>\mathrm{10.00.40}\right)\left(\wedge \text{T}\le \mathrm{10.00.50}\right)\\ {p}_{6}=\left(\text{T}>\mathrm{10.00.50}\right)\left(\wedge \text{T}\le \mathrm{10.00.60}\right)\end{array}$
Example 1 (Selection). Let τ be the trace containing the traffic captured between 10:00:000 and 10:00:052 focusing on IP.Prot as shown by Table 3.
Table 3. Captured traffic.
Let $\phi ={\langle \left(\text{T}\ge \mathrm{10.00.}j\right)\left(\wedge \text{T}\le \mathrm{10.00.}\left(j+10\right)\right)\rangle}_{j=\mathrm{0,10}}^{60}$
. When slicing τ using φ, we compute ${S}_{\phi}\left(\tau \right)$
, resulting in the sequence $\langle {\tau}_{1}\mathrm{,}\cdots \mathrm{,}{\tau}_{6}\rangle $
, as illustrated in Table 4.
Table 4. Sliced captured traffic.
3.3. Feature Measuring
Each slice, derived from the preceding step, undergoes transformation into an element in ${\mathbb{R}}^{n}$
($n\ge 1$
) by quantifying certain characteristics through a predefined function F. For simplicity, we concentrate on a class of functions F that produce distributions by tallying events adhering to specified conditions, as delineated in the following definition:
Definition 6 (Feature Measuring Function). Let q be a propositional formula in PEL and τ be a trace in $\mathcal{T}$
. We introduce a slicing function ${\mathcal{F}}_{q}\left(\tau \right)$
inductively as follows:
$\begin{array}{l}{\mathcal{F}}_{q}\left(\u03f5\right)\mathrm{::}=0\\ {\mathcal{F}}_{q}\left(e\right)\mathrm{::}=\{\begin{array}{ll}0\hfill & \text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.05em}}q\left(e\right)=\text{false}\hfill \\ 1\hfill & \text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.05em}}q\left(e\right)=\text{true}\hfill \end{array}\\ {\mathcal{F}}_{q}\left(e\mathrm{.}\tau \right)\mathrm{::}={\mathcal{F}}_{q}\left(e\right)+{\mathcal{F}}_{q}\left(\tau \right)\end{array}$
Broadly speaking, ${\mathcal{F}}_{q}\left(\tau \right)$
returns the number of packets in τ that satisfy the property q.
We also extend the selection function to operate on both a sequence of propositions ${q}_{1}\mathrm{,}\cdots \mathrm{,}{q}_{n}$
and a set of traces as follows:
$\begin{array}{l}{\mathcal{F}}_{\langle {q}_{1},\cdots ,{q}_{n}\rangle}\left(\tau \right)=\langle {\mathcal{F}}_{{q}_{1}}\left(\tau \right),\cdots ,{\mathcal{F}}_{{q}_{n}}\left(\tau \right)\rangle \\ {\mathcal{F}}_{q}\left(\langle {\tau}_{1},\cdots ,{\tau}_{n}\rangle \right)=\langle {\mathcal{F}}_{q}\left({\tau}_{1}\right)\cdots {\mathcal{F}}_{q}\left({\tau}_{n}\right)\rangle \end{array}$
${\mathcal{F}}_{q}\left(\left\{{\tau}_{1},\cdots ,{\tau}_{n}\right\}\right)=\left\{{\mathcal{F}}_{q}\left({\tau}_{1}\right)\cdots {\mathcal{F}}_{q}\left({\tau}_{n}\right)\right\}$
Example 2. Let’s examine the trace provided in Example 1. Let $\psi =\langle {q}_{1}\mathrm{,}{q}_{2}\mathrm{,}{q}_{3}\mathrm{,}{q}_{4}\rangle $
such that ${q}_{1}=\left(\text{IP}\text{.Prot}=1\right)$
, ${q}_{2}=\left(\text{IP}\text{.Prot}=6\right)$
, ${q}_{3}=\left(\text{IP}\text{.Prot}=17\right)$
and ${q}_{4}=\left(\text{IP}\text{.Prot}\ne 1\right)\wedge \left(\text{IP}\text{.Prot}\ne 6\right)\wedge \left(\text{IP}\text{.Prot}\ne 17\right)$
, then when applying the function ${\mathcal{F}}_{\psi}$
to the slices ${\tau}_{1}\mathrm{,}\cdots \mathrm{,}{\tau}_{6}$
as depicted in Table 4, the resulting outcomes are as illustrated in Table 5.
Table 5. Quantification of slices using $\mathcal{F}$
.
${\mathcal{S}}_{\phi}\left(\tau \right)$

${\mathcal{F}}_{\psi}\left({\tau}_{i}\right)$

${\tau}_{1}$

$\langle \text{1},\text{3},\text{1},0\rangle $

${\tau}_{2}$

$\langle \text{1},\text{2},\text{2},0\rangle $

${\tau}_{3}$

$\langle 0,\text{3},\text{2},0\rangle $

${\tau}_{4}$

$\langle 0,0,0,\text{5}\rangle $

${\tau}_{5}$

$\langle 0,\text{2},\text{1},\text{1}\rangle $

${\tau}_{6}$

$\langle \text{2},0,0,0\rangle $

For instance, ${\mathcal{F}}_{\psi}\left({\tau}_{1}\right)=\langle \mathrm{1,3,1,0}\rangle $
indicates that in slice ${\tau}_{1}$
, there is 1 packet with IP.Prot = 1, 3 packets with IP.Prot = 6, 1 packet with IP.Prot = 17, and 0 packets with other IP.Prot values.
The distributions of these slices serve as inputs to algorithms like KLDivergence, enabling the measurement of traffic divergence across distinct slices. However, in cases where certain events are absent during observation, their frequencies register as zero, posing a challenge for computing KLDivergence and potentially leading to division by zero errors. To address this issue, we must either explore alternative divergence techniques or slightly adjust the data distribution through methods such as smoothing. The following definition illustrates one of the wellknown smoothing techniques.
Definition 7 (Laplace Smoothing). Let $v=\langle {v}_{1}\mathrm{,}\cdots \mathrm{,}{v}_{n}\rangle $
be a sequence of real numbers. We denote by ${\pi}^{k}\left(v\right)$
the kLaplace Smoothing Distribution (kLSD) of a trace and we define it as follows:
${\pi}^{k}\left(v\right)=\langle \frac{k+{v}_{1}}{k+{\displaystyle {\sum}_{i=1}^{n}{v}_{i}}}\mathrm{,}\cdots \mathrm{,}\frac{k+{v}_{n}}{k+{\displaystyle {\sum}_{i=1}^{n}{v}_{i}}}\rangle $
We augment the function $\mathcal{F}$
with Laplace smoothing as follows:
Definition 8 (Feature Measuring Function with Smoothing). We denote by ${\widehat{\mathcal{F}}}_{p}$
, the smoothed version of ${\mathcal{F}}_{p}$
achieved through the application of the smoothing function ${\pi}^{1}$
. More formally:
${\widehat{\mathcal{F}}}_{p}={\pi}^{1}\circ {\mathcal{F}}_{p}$
Example 3. By applying ${\pi}^{1}$
to column 2 of Table 5, we obtain ${\widehat{\mathcal{F}}}_{\psi}\left({\tau}_{i}\right)$
as shown by column 3 of Table 6.
Table 6. Quantification and smoothing of slices using $\widehat{\mathcal{F}}$
.
${\mathcal{S}}_{\phi}\left(\tau \right)$

${\mathcal{F}}_{\psi}\left({\tau}_{i}\right)$

${\widehat{\mathcal{F}}}_{\psi}\left({\tau}_{i}\right)$

${\tau}_{1}$

$\langle \mathrm{1,3,1,0}\rangle $

$\langle \frac{1+1}{1+5}\mathrm{,}\frac{1+3}{1+5}\mathrm{,}\frac{1+1}{1+5}\mathrm{,}\frac{1+0}{1+5}\rangle =\langle \frac{1}{3}\mathrm{,}\frac{2}{3}\mathrm{,}\frac{1}{3}\mathrm{,}\frac{1}{6}\rangle $

${\tau}_{2}$

$\langle \mathrm{1,2,2,0}\rangle $

$\langle \frac{1+1}{1+5}\mathrm{,}\frac{1+2}{1+5}\mathrm{,}\frac{1+2}{1+5}\mathrm{,}\frac{1+0}{1+5}\rangle =\langle \frac{1}{3}\mathrm{,}\frac{1}{2}\mathrm{,}\frac{1}{2}\mathrm{,}\frac{1}{6}\rangle $

${\tau}_{3}$

$\langle \mathrm{0,3,2,0}\rangle $

$\langle \frac{1+0}{1+5}\mathrm{,}\frac{1+3}{1+5}\mathrm{,}\frac{1+2}{1+5}\mathrm{,}\frac{1+0}{1+5}\rangle =\langle \frac{1}{6}\mathrm{,}\frac{2}{6}\mathrm{,}\frac{1}{2}\mathrm{,}\frac{1}{6}\rangle $

${\tau}_{4}$

$\langle \mathrm{0,0,0,5}\rangle $

$\langle \frac{1+0}{1+5}\mathrm{,}\frac{1+0}{1+5}\mathrm{,}\frac{1+0}{1+5}\mathrm{,}\frac{1+5}{1+5}\rangle =\langle \frac{1}{6}\mathrm{,}\frac{1}{6}\mathrm{,}\frac{1}{6}\mathrm{,1}\rangle $

${\tau}_{5}$

$\langle \mathrm{0,2,1,1}\rangle $

$\langle \frac{1+0}{1+5}\mathrm{,}\frac{1+2}{1+5}\mathrm{,}\frac{1+1}{1+5}\mathrm{,}\frac{1+1}{1+5}\rangle =\langle \frac{1}{6}\mathrm{,}\frac{1}{2}\mathrm{,}\frac{1}{3}\mathrm{,}\frac{1}{3}\rangle $

${\tau}_{6}$

$\langle \mathrm{2,0,0,0}\rangle $

$\langle \frac{1+2}{1+5}\mathrm{,}\frac{1+0}{1+5}\mathrm{,}\frac{1+0}{1+5}\mathrm{,}\frac{1+0}{1+5}\rangle =\langle \frac{1}{2}\mathrm{,}\frac{1}{6}\mathrm{,}\frac{1}{6}\mathrm{,}\frac{1}{6}\rangle $

When detecting suspicious activities within traffic data, it can be advantageous to prioritize specific positions within the values returned by $\widehat{\mathcal{F}}$
in ${\mathbb{R}}^{n}$
. For instance, if $\widehat{\mathcal{F}}$
yields $\left({v}_{1}\mathrm{,}\cdots \mathrm{,}{v}_{n}\right)$
where each ${v}_{i}$
represents traffic originating from a specific country, these values might be weighted according to the respective country’s reputation in cyberattacks, assigning greater weight to countries with negative reputations. Presently, there’s a lack of a systematic approach to guide end users in determining these weight values. However, we believe that finetuning these weights based on intuition could enhance detection capabilities.
The subsequent definition formalizes the concept of weights.
Definition 9 (Weighting Function ω). We denote by ω a weighting function that accepts weights in ${\left({\mathbb{R}}^{+}\right)}^{n}$
, a tuple in ${\mathbb{R}}^{n}$
, and returns a probability distribution, i.e.: $\omega \mathrm{:}{\left({\mathbb{R}}^{+}\right)}^{n}\times {\mathbb{R}}^{n}\to {\left[\mathrm{0,1}\right]}^{n}$
.
Let ${V}_{1}\mathrm{,}\cdots \mathrm{,}{V}_{m}$
be in ${\mathbb{R}}^{n}$
). We extend ω to a set $\left\{{V}_{1}\mathrm{,}\cdots \mathrm{,}{V}_{m}\right\}$
and a sequences $\langle {V}_{1}\mathrm{,}\cdots \mathrm{,}{V}_{m}\rangle $
of tuples as follows:
$\begin{array}{l}\omega \left(\left\{{V}_{1}\mathrm{,}\cdots \mathrm{,}{V}_{m}\right\}\right)=\left\{\omega \left({V}_{1}\right)\mathrm{,}\cdots \mathrm{,}\omega \left({V}_{m}\right)\right\}\\ \omega \left(\langle {V}_{1}\mathrm{,}\cdots \mathrm{,}{V}_{m}\rangle \right)=\langle \omega \left({V}_{1}\right)\mathrm{,}\cdots \mathrm{,}\omega \left({V}_{m}\right)\rangle \end{array}$
The following definition provides an example of ω.
Definition 10. (Product Scalar Weighting Function) We define the scalar product weighting function, abbreviated as spw, as follows: $\begin{array}{l}\text{spw}\mathrm{:}{\left({\mathbb{R}}^{+}\right)}^{n}\times {\mathbb{R}}^{n}\to {\left[\mathrm{0,1}\right]}^{n}\\ \text{spw}\left(\text{w}\mathrm{,}v\right)=\langle \frac{{\text{w}}_{1}\times {v}_{1}}{\text{w}\mathrm{.}v}\mathrm{,}\cdots \mathrm{,}\frac{{\text{w}}_{n}\times {v}_{n}}{\text{w}\mathrm{.}v}\rangle \end{array}$
where $w\mathrm{.}u$
is the scalar product of the tow vectors w and u, i.e.: $\text{w}.u={\displaystyle \sum _{i=1}^{n}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\text{w}}_{i}\times {v}_{i}$
We extend the function $\widehat{\mathcal{F}}$
by incorporating a weighting function as follows:
Definition 11 (Feature Measuring Function with Smoothing and Weighting). Let ω a weighting function. In the sequel, we denote by ${\widehat{\mathcal{F}}}_{p\mathrm{,}\omega}$
, the weighted version of ${\widehat{\mathcal{F}}}_{p}$
using the weighting function ω. More precisely: ${\widehat{\mathcal{F}}}_{p\mathrm{,}\omega}=\omega \circ {\widehat{\mathcal{F}}}_{p}$
and for any trace τ and a weight vector w, we have: ${\widehat{\mathcal{F}}}_{p\mathrm{,}\omega}\left(\text{w}\mathrm{,}\tau \right)=\omega \left(\text{w}\mathrm{,}{\widehat{\mathcal{F}}}_{p}\left(\tau \right)\right)$
Example 4. Let’s examine the trace provided in Example 3. Suppose we aim to prioritize packets containing ports not in 1, 6, 17. As an example, we apply the weighting function $\omega =\text{spw}$
with weights $\text{w}=\langle \mathrm{0.2,0.2,0.2,0.4}\rangle $
. The results are illustrated in Table 7.
Table 7. Slice distribution.
${\mathcal{S}}_{\phi}\left(\tau \right)$

${\mathcal{F}}_{\psi}\left({\tau}_{i}\right)$

${\widehat{\mathcal{F}}}_{\psi}\left({\tau}_{i}\right)$

${\widehat{\mathcal{F}}}_{\psi \mathrm{,}\omega}\left(w\mathrm{,}{\tau}_{i}\right)$

${\tau}_{1}$

$\langle \mathrm{1,3,1,0}\rangle $

$\langle \frac{1}{3}\mathrm{,}\frac{2}{3}\mathrm{,}\frac{1}{3}\mathrm{,}\frac{1}{6}\rangle $

$\langle \mathrm{0.2,0.4,0.2,0.2}\rangle $

${\tau}_{2}$

$\langle \mathrm{1,2,2,0}\rangle $

$\langle \frac{1}{3}\mathrm{,}\frac{1}{2}\mathrm{,}\frac{1}{2}\mathrm{,}\frac{1}{6}\rangle $

$\langle \mathrm{0.2,0.3,0.3,0.2}\rangle $

${\tau}_{3}$

$\langle \mathrm{0,3,2,0}\rangle $

$\langle \frac{1}{6}\mathrm{,}\frac{2}{6}\mathrm{,}\frac{1}{2}\mathrm{,}\frac{1}{6}\rangle $

$\langle \mathrm{0.1,0.4,0.3,0.2}\rangle $

${\tau}_{4}$

$\langle \mathrm{0,0,0,5}\rangle $

$\langle \frac{1}{6}\mathrm{,}\frac{1}{6}\mathrm{,}\frac{1}{6}\mathrm{,1}\rangle $

$\langle \mathrm{0.067,0.067,0.67,0.8}\rangle $

${\tau}_{5}$

$\langle \mathrm{0,2,1,1}\rangle $

$\langle \frac{1}{6}\mathrm{,}\frac{1}{2}\mathrm{,}\frac{1}{3}\mathrm{,}\frac{1}{3}\rangle $

$\langle \mathrm{0.1,0.3,0.2,0.4}\rangle $

${\tau}_{6}$

$\langle \mathrm{2,0,0,0}\rangle $

$\langle \frac{1}{2}\mathrm{,}\frac{1}{6}\mathrm{,}\frac{1}{6}\mathrm{,}\frac{1}{6}\rangle $

$\langle \mathrm{0.429,0.143,0.143,0.286}\rangle $

3.4. Divergence Measuring
After abstracting and transforming the traffic into smoothed distributions, the next step involves measuring the divergence between adjacent slices within each sequence. To achieve this, we employ a divergence function such as the KLDivergence.
Definition 12 (Divergence Function). A divergence measuring function, denoted by Δ, can be any function with the following signature: $\Delta \mathrm{:}{\left[\mathrm{0,1}\right]}^{n}\times {\left[\mathrm{0,1}\right]}^{n}\to \mathbb{R}$
.
Examples of divergence measuring functions are given in Table 8.
Table 8. Examples of divergence functions.
Divergence 
Δ 
$::=$

KLDivergence [23]  Cosine [24] TFIDF [25] … 
Notice that, since the KLDivergence, usually denoted by ${D}_{KL}$
, between two distributions $P=\left({p}_{1}\mathrm{,}\cdots \mathrm{,}{p}_{n}\right)$
and $Q=\left({q}_{1}\mathrm{,}\cdots \mathrm{,}{q}_{n}\right)$
is not commutative (i.e., ${D}_{KL}\left(P\mathrm{\left\right}Q\right)\ne {D}_{KL}\left(Q\mathrm{\left\right}P\right)$
as shown by Equations (1) and (2)), we can consider $\Delta \left(P\mathrm{,}Q\right)=KL\left(P\mathrm{,}Q\right)={D}_{KL}\left(P\mathrm{\left\right}Q\right)+{D}_{KL}\left(Q\mathrm{\left\right}P\right)$
as the divergence value.
${D}_{KL}\left(P\left\rightQ\right)={\displaystyle \sum _{i=1}^{n}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{p}_{i}\times {\mathrm{log}}_{2}\left(\frac{{p}_{i}}{{q}_{i}}\right)$
(1)
${D}_{KL}\left(Q\left\rightP\right)={\displaystyle \sum _{i=1}^{n}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{q}_{i}\times {\mathrm{log}}_{2}\left(\frac{{q}_{i}}{{p}_{i}}\right)$
(2)
Example 5. We apply the KLDivergence to the trace of Example 4. The result is shown by Table 9.
Table 9. Slice distribution.
${\mathcal{S}}_{\phi}\left(\tau \right)$

${\mathcal{F}}_{\psi}\left({\tau}_{i}\right)$

${u}_{i}={\widehat{\mathcal{F}}}_{\psi \mathrm{,}\omega}\left(w\mathrm{,}{\tau}_{i}\right)$

$PKL\left({u}_{i}\mathrm{,}{u}_{i+1}\right)$

$PKL\left({u}_{i+1}\mathrm{,}{u}_{i}\right)$

$KL\left({u}_{i}\mathrm{,}{u}_{i+1}\right)$

${\tau}_{1}$

$\langle \mathrm{1,3,1,0}\rangle $

${u}_{1}=\langle \mathrm{0.2,0.4,0.2,0.2}\rangle $

0.049 
0.51 
1 
${\tau}_{2}$

$\langle \mathrm{1,2,2,0}\rangle $

${u}_{2}=\langle \mathrm{0.2,0.3,0.3,0.2}\rangle $

0.0755 
0.066 
0.142 
${\tau}_{3}$

$\langle \mathrm{0,3,2,0}\rangle $

${u}_{3}=\langle \mathrm{0.1,0.4,0.3,0.2}\rangle $

1.3435 
1.2440 
2.597 
${\tau}_{4}$

$\langle \mathrm{0,0,0,5}\rangle $

${u}_{4}=\langle \mathrm{0.067,0.067,0.67,0.8}\rangle $

0.5107 
0.6265 
1.137 
${\tau}_{5}$

$\langle \mathrm{0,2,1,1}\rangle $

${u}_{5}=\langle \mathrm{0.1,0.3,0.2,0.4}\rangle $

0.4024 
0.5388 
0.941 
${\tau}_{6}$

$\langle \mathrm{2,0,0,0}\rangle $

${u}_{6}=\langle \mathrm{0.429,0.143,0.143,0.286}\rangle $

 
 
 
3.5. Divergence Clustering
After quantifying the divergence between successive slices of traces, the next step is to ascertain if significant abrupt changes have occurred. To accomplish this, we estimate the number of clusters generated by the divergence values. If this count exceeds one, we infer that the trace contains suspicious traffic.
Definition 13 (Clustering). Let ${\mathcal{C}}_{n}{\mathrm{:2}}^{\mathbb{R}}\to \text{true}\mathrm{,}\text{false}$
be a clustering algorithm that estimates the optimal number of clusters N associated with a dataset in ${2}^{\mathbb{R}}$
. It returns true if the number $N\ge n$
, indicating that the threshold for suspicious activity has been surpassed, and false otherwise.
We are particularly interested in ${\mathcal{C}}_{2}$
. When ${\mathcal{C}}_{2}$
returns true, it indicates that the traffic is considered suspicious. Examples of the ${\mathcal{C}}_{2}$
function are provided in Table 10.
Table 10. Examples of clustering functions.
Clustering 
${\mathcal{C}}_{2}$

$::=$

HC [26]  KM [27]  EM [28] … 
Example 6. Let’s apply the Kmeans algorithm with the Elbow Method to compute ${\mathcal{C}}_{2}$
on the trace from the previous example, as illustrated in Table 11.
Table 11. Kmeans results.
Cluster 1 
Cluster 2 
0.100, 0.142, 0.941, 1.137 
2.587 
3.6. Suspicious Traffic Detection
Now, we have all the necessary ingredients to define a suspicious traffic.
Definition 14 (Suspicious Traffic)
Let τ be a trace.
Let $\phi =\langle {p}_{1}\mathrm{,}\cdots \mathrm{,}{p}_{n}\rangle $
be a n sequences of propositions.
Let $\psi =\langle {q}_{1}\mathrm{,}\cdots \mathrm{,}{q}_{m}\rangle $
be a m sequences of propositions.
Let $\text{w}=\langle {\text{w}}_{1}\mathrm{,}\cdots \mathrm{,}{\text{w}}_{n}\rangle $
be a weight vector in ${\mathbb{R}}^{n}$
.
Let $\Delta \mathrm{:}{\left[\mathrm{0,1}\right]}^{n}\times {\left[\mathrm{0,1}\right]}^{n}\to \mathbb{R}$
be a divergence measuring function such as KLDivergence.
Let ${\mathcal{C}}_{2}{\mathrm{:2}}^{\mathbb{R}}\to \left\{\text{true}\mathrm{,}\text{false}\right\}$
be a clustering algorithm that estimated the best number of clusters N related to the set of data in ${2}^{\mathbb{R}}$
and returns true if $N\ge 2$
, false otherwise.
We define $Suspiciou{s}_{\Delta \mathrm{,}{\mathcal{C}}_{2}}^{\sigma \mathrm{,}\omega}\left(\tau \mathrm{,}\phi \mathrm{,}\psi \mathrm{,}\text{w}\right)$
, a generic function designed to detect suspicious traffic within an analyzed trace τ, as follows:
$\overline{)\overline{)Suspiciou{s}_{\Delta \mathrm{,}{\mathcal{C}}_{2}}^{\omega}\left(\tau \mathrm{,}\phi \mathrm{,}\psi \mathrm{,}\text{w}\right)={\mathcal{C}}_{2}\left(\Delta \left({\widehat{\mathcal{F}}}_{\psi \mathrm{,}\omega}\left(\text{w}\mathrm{,}{\mathcal{S}}_{\phi}\left(\tau \right)\right)\right)\right)}}$
The Suspicious function integrates various analyses, conducted in the sequence depicted in Figure 3, and returns true if the traffic is deemed suspicious, and false otherwise. It requires three functions, ω, Δ, and ${\mathcal{C}}_{2}$
, as well as four parameters: τ, w, φ, and ψ.
Figure 3. Steps involved in detecting suspicious traffic.
Example 7. Let’s apply the Suspicious function to the trace provided in Example 1 to ascertain if there exists a sudden change. Based on the results shown in Table 11, where ${\mathcal{C}}_{2}$
generates more than one cluster, we deduce that:
$Suspiciou{s}_{\Delta \mathrm{,}{\mathcal{C}}_{2}}^{\omega}\left(\tau \mathrm{,}\phi \mathrm{,}\psi \mathrm{,}\text{w}\right)=\text{true}$
The suspicious traffic is triggered on slice ${\tau}_{3}$
.
4. Case Study
In this section, we present three cases of detecting suspicious activities using two distinct datasets comprising real traffic. The first case involves detecting suspicious activities based on daily patterns in the dataset from [29]. The second and third cases utilize the UNSWNB15 dataset [30] to detect suspicious traffic by analyzing TCP and DNS traffic, respectively.
4.1. Detecting Suspicious Activities Based on Days of the Week
An example of an interesting dataset with a real traffic is available at [29]. It contains 21,000 rows and covers the traffics related to 10 workstations with local IP addresses over a period of three months. Half of these local IP addresses were hacked at some point during this period and became members of different botnets and generated abnormal traffic.
A screenshot of a part of the dataset is shown in Figure 4, where:
date: yyyymmdd (from 20060701 through 20060930);
l_ipn: local IP address (coded as an integer from 09);
r_asn: remote ASN (an integer which identifies the remote Autonomous System Network);
f: flows (number of connections during the corresponding day).
Figure 4. A part of the dataset provided by [29].
We try to detect the infected computer based on the following assumption for each workstation of the network: the nature of traffic may vary across different days of the week. For instance, weekend traffic could differ significantly from that of weekdays. However, when we consider a specific day, such as Monday, there is no compelling reason for it to undergo substantial changes from one week to another. This implies that Monday’s traffic should remain relatively consistent across all weeks. A similar pattern is expected for other days of the week, such as Tuesday, Wednesday, and so forth.
Based on the assumption, we proceed as follows: we segregate the traffic associated with each workstation and day of the week into distinct files. With ten workstations and seven days a week, this results in a total of 70 files. Subsequently, each of these files undergoes analysis to identify any abrupt changes.
Here are the values of the parameters required used within the function GSuspicious allowing to detect suspect traffic.
τ (trace): the dataset available at [29].
The trace is scattered into various slices, each exclusively comprising traffic linked to a specific IP address and a designated day of the week. To illustrate, for IP address 0, distinct slices are allocated for Mondays, Tuesdays, and so forth. Similar slices are build for IP addresses 1 to 9. By doing this division, we are implicitly making the assumption that for any IP address, the traffic of different Mondays should be quite similar and this should be the same for the other days of the week. More formally, the slicing will be based on the following set of propositions:
$\phi ={\displaystyle \underset{\begin{array}{c}1\le i\le 9\\ 0\le j\le 6\end{array}}{\cup}}\left\{{p}_{i\mathrm{,}j}\right\}$
where
${p}_{i\mathrm{,}j}={\langle \left(IP=i\right)\wedge \left(date\mathrm{.}dd=j\right)\rangle}_{j\mathrm{,}j+7}^{N}$
and $N=21000$
represents the number of events in the trace.
Let $\psi =\langle \left({q}_{1}={v}_{1}\right)\mathrm{,}\cdots \mathrm{,}\left({q}_{n}={v}_{n}\right)\rangle $
where ${v}_{1}\mathrm{,}\cdots ,{v}_{n}$
are the different values that appears in the column r_asn presented in ascending order.
$\text{w}=\left({\text{w}}_{1}\mathrm{,}\cdots \mathrm{,}{\text{w}}_{n}\right)=\left(\mathrm{1,}\cdots \mathrm{,1}\right)$
. This captures the fact that each element of the partition has the same weight.
Δ is the KLDivergence.
Let ${\mathcal{C}}_{2}$
is the composition of Kmeans and Elbow Methods. The Kmeans do the clustering and the Elbow Methods estimate the best number of clusters.
All these fixed parameters will be the input of our Suspicious function to conclude whether the traffic is suspicious or not. This function proceed as follows:
After applying the function ${\mathcal{S}}_{\phi}$
to the dataset, we obtain a separate file for each IP address and each day of the week. For instance, for IP address 0 and Monday, we generate a file that will be analyzed independently for suspicious traffic. This file aggregates traffic not only from a single Monday but from multiple Mondays, and our objective is to detect any sudden changes in the distribution of traffic from one Monday to another. We repeat this process for the other days of the week and for the remaining IP addresses.
The traffic from each IP address and each day of the week undergoes transformation through the function ${\mathcal{F}}_{\psi}$
, resulting in a point in ${\mathbb{R}}^{n}$
, where each dimension represents the number of connections related to every r_asn, and n is the total number of r_asn.
Thanks to the function Δ, we quantify the divergence between every two successive Mondays for each IP address, and we repeat this process for the other days of the week as well.
Using the function ${\mathcal{C}}_{2}$
(composition of the Kmeans and the Elbow Method), we estimate the number of clusters generated by the previous steps.
If we observe two or more clusters for any analyzed sequence, we infer that the traffic is suspicious.
Below, we present the results obtained from the Elbow Method corresponding to the different days and IP addresses.
1) Monday: Based on the analysis of Monday traffic depicted in Figure 5, we identify five nonsuspicious machines (3, 5, 6, 7, and 9) and five suspicious machines (0, 1, 2, 4, and 8).
Figure 5. Elbow results for every Monday.
2) Tuesday: Based on the analysis of Tuesday traffic depicted in Figure 6, we identify five nonsuspicious machines (3, 5, 6, 7, and 9) and five suspicious machines (0, 1, 2, 4, and 8).
3) Wednesday: Based on the analysis of Wednesday traffic shown in Figure 7, we observe five nonsuspicious machines (3, 5, 6, 7, and 9) and five suspicious machines (0, 1, 2, 4, and 8).
4) Thursday: According to the analysis depicted in Figure 8, we identify five nonsuspicious machines (3, 5, 6, 7, and 9) and five suspicious machines (0, 1, 2, 4, and 8) on Thursday.
5) Friday: Based on the analysis presented in Figure 9, we observe five nonsuspicious machines (3, 5, 6, 7, and 9) and five suspicious machines (0, 1, 2, 4, and 8) on Friday.
Figure 6. Elbow results for every Tuesday.
Figure 7. Elbow results for every Wednesday.
Figure 8. Elbow results for every Thursday.
Figure 9. Elbow results for every Friday.
6) Saturday: According to the analysis shown in Figure 10, we can identify five nonsuspicious machines (3, 5, 6, 7, and 9) and five suspicious machines (0, 1, 2, 4, and 8) on Saturday.
Figure 10. Elbow results for every Saturday.
7) Sunday: Based on the analysis presented in Figure 11, we observed five nonsuspicious machines (3, 5, 6, 7, and 9) and five suspicious machines (0, 1, 2, 4, and 8) on Sunday.
Here are the conclusions extracted from Figures 511:
There are five clear elbows showing that the number of clusters related to the traffics of the machines l_ipn values 0, 1, 2, 4, and 8 is greater than one and then they are the origins of the suspicious traffics shown in Table 12.
There are five machines l_ipn values 3, 5, 6, 7, and 9 with no elbow, meaning that the number of their clusters is one, then they are not associated with any suspicious traffic showed in Table 12.
Figure 11. Elbow results for every Sunday.
Table 12. Detecting suspicious traffic based on days of the week.

Monday 
Tuesday 
Wednesday 
Thursday 
Friday 
Saturday 
Sunday 
Unsuspicious 
3, 5, 6, 7, 9 
3, 5, 6, 7, 9 
3, 5, 6, 7, 9 
3, 5, 6, 7, 9 
3, 5, 6, 7, 9 
3, 5, 6.7, 9 
3, 5, 6, 7, 9 
Suspicious 
0, 1, 2, 4, 8 
0, 1, 2, 4, 8 
0, 1, 2, 4, 8 
0, 1, 2, 4, 8 
0, 1, 2, 4, 8 
0, 1, 2, 4, 8 
0, 1, 2, 4, 8 
Total Suspicious 
0, 1, 2, 4, 8 

Predicted 

Negative 
Positive 
Actual 
Negative 
True Negative (TN) 
False Negative (FN) 

Positive 
False Positive (FP) 
True Positive (TP) 
Our approach predicted that 5/10 local IPs are botnets. Actually, only 5/10 local IPs are real botnets. Therefore:

Predicted 


Negative 
Positive 
Total 
Actual 
Negative 
5 
0 
5 

Positive 
0 
5 
5 

Total 
5 
5 
10 
It follows that:
$\text{Accuracy}=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}}=\frac{10}{10}=\mathrm{100\%}$
.
$\text{Precision}=\frac{\text{TP}}{\text{TP}+\text{FP}}=\frac{5}{5}=\mathrm{100\%}$
.
$\text{Recall}=\frac{\text{TP}}{\text{TP}+\text{FN}}=\frac{5}{5}=\mathrm{100\%}$
.
False Negative (FN) = 0%.
False Positive (FP) = 0%.
Performance: Our code was executed on a Ubuntu virtual machine with a 2.3 GHz Intel Core i9 processor, equipped with 2 cores and 4GB of RAM. The total execution time to process the entire dataset, consisting of 21,000 rows covering 10 workstations over a threemonth period, was approximately 51.7 seconds.
4.2. Detecting Suspicious Activities Based on DNS and HTTP Traffic
The UNSWNB15 dataset [30] was generated using the IXIA PerfectStorm tool. It encompasses nine categories of modern attack types and incorporates realistic behaviors of normal traffic. Comprising 49 features across various categories, some of them are illustrated in Figure 12. Utilized as an attack tool, IXIA dispatches both benign and malicious traffic to different network nodes. A segment of certain fields from this traffic is demonstrated in Table 13.
Figure 12. UNSWNB15: example of features.
The network contains three sub networks as shown by Figure 13.
1) Sub network1 (server1): contains nodes with source IP addresses from 59.166.0.0 to 59.166.0.9.
2) Sub network2 (Server2): contains nodes with source IP addresses from 175.45.176.0 to 175.45.176.3.
Table 13. UNSWNB15 samples.
Figure 13. UNSWNB15 network.
3) Sub network3 (Server3): contains nodes with source IP addresses from 149.171.126.0 to 149.171.126.19.
Subnetwork 1 (servers1) and subnetwork 3 (server3) are configured to exhibit normal traffic patterns, whereas subnetwork 2 (server2) is associated with abnormal or malicious activities.
We employ our approach across various source IP addresses within all subnetworks. Our assumption is that the nature of the outbound traffic should not undergo sudden changes.
4.2.1. Detecting Suspicious Activities Based on DNS Traffic
Below are the values of the required parameters used within the Suspicious function for detecting suspicious traffic:
τ (trace): it is the dataset available at [30].
Let $\phi =\langle \left(\text{IP}\text{.SourceAdd}=ip{s}_{1}\right)\mathrm{,}\cdots \mathrm{,}\left(\text{IP}\text{.SourceAdd}=ip{s}_{n}\right)\rangle $
where $\left\{ip{s}_{1}\mathrm{,}\cdots \mathrm{,}ip{s}_{n}\right\}$
are the different IP source addresses appearing in τ.
UDP.SourcePort
Let $\begin{array}{l}\psi =\langle \left(\text{IP}\text{.DestAdd}=ip{d}_{1}\right)\wedge \left(\text{UDP}\text{.DestPort}=DNS\right)\mathrm{,}\cdots \mathrm{,}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left(\text{IP}\text{.DestAdd}=ip{d}_{n}\right)\wedge \left(\text{UDP}\text{.DestPort}=DNS\right)\rangle \end{array}$
, where $\left\{ip{d}_{1}\mathrm{,}\cdots \mathrm{,}ip{d}_{n}\right\}$
are the different IP destination addresses appearing in τ.
$\text{w}=\left({\text{w}}_{1}\mathrm{,}\cdots \mathrm{,}{\text{w}}_{n}\right)=\left(\mathrm{1,}\cdots \mathrm{,1}\right)$
.
Δ is the KLDivergence.
Let ${\mathcal{C}}_{2}$
is the composition of Kmeans and Elbow Methods. The Kmeans do the clustering and the Elbow Methods estimate the best number of clusters.
Below, we present the results obtained from the Elbow Method corresponding to the different subnetworks (servers).
1) Sub network1 (Server1): the Source IP addresses from 59.166.0.0 to 59.166.0.9: shown in Figure 14.
Figure 14. Elbow results for sub network1 (server1) based on DNS services.
2) Sub network2 (Server2): Source IP addresses from 175.45.176.0 to 175.45.176.3: shown in Figure 15.
Figure 15. Elbow results for sub network2 (server2) based on DNS services.
3) Sub network3 (Server3): Source IP addresses from 149.171.126.0 to 149.171.126.19: shown in Figure 16.
Figure 16. Elbow results for sub network3 (server3) based on DNS services.
From Figure 14, the maximum distortion value for subnetwork 1 (server1) is above 8. In Figure 15, the maximum distortion value for subnetwork 2 (server2) exceeds 80. Meanwhile, in Figure 16, the maximum distortion value for subnetwork 3 (server3) is above 70, but for only one IP address, whereas more than 20 IP addresses have a distortion value of zero.
Based on these findings, our approach predicts that subnetwork 2 (server2) is suspicious, as it exhibits abnormal or malicious activities in the network traffic. Therefore:

Predicted 


Negative 
Positive 
Total 
Actual 
Negative 
2 
0 
2 

Positive 
0 
1 
1 

Total 
2 
1 
3 
It follows that:
$\text{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}=\frac{3}{3}=100\%$
.
$\text{Precision}=\frac{TP}{TP+FP}=\frac{1}{1}=100\%$
.
$\text{Recall}=\frac{TP}{TP+FN}=\frac{1}{1}=\mathrm{100\%}$
.
False Negative (FN) = 0%.
False Positive (FP) = 0%.
4.2.2. Detecting Suspicious Activities Based on HTTP Traffic
Below are the values of the required parameters used within the Suspicious function for detecting suspicious traffic:
τ (trace): it is the dataset available at [30].
Let $\phi =\langle \left(\text{IP}\text{.SourceAdd}=ip{s}_{1}\right)\mathrm{,}\cdots \mathrm{,}\left(\text{IP}\text{.SourceAdd}=ip{s}_{n}\right)\rangle $
where $\left\{ip{s}_{1}\mathrm{,}\cdots \mathrm{,}ip{s}_{n}\right\}$
are the different IP source addresses appearing in τ.
UDP.SourcePort
Let $\begin{array}{l}\psi =\langle \left(IP\mathrm{.}DestAdd=ip{d}_{1}\right)\wedge \left(TCP\mathrm{.}DestPort=HTTP\right)\mathrm{,}\cdots \mathrm{,}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\left(IP\mathrm{.}DestAdd=ip{d}_{n}\right)\wedge \left(TCP\mathrm{.}DestPort=HTTP\right)\rangle \end{array}$
, where $\left\{ip{d}_{1}\mathrm{,}\cdots \mathrm{,}ip{d}_{n}\right\}$
are the different IP destination addresses appearing in τ.
$\text{w}=\left({\text{w}}_{1}\mathrm{,}\cdots \mathrm{,}{\text{w}}_{n}\right)=\left(\mathrm{1,}\cdots \mathrm{,1}\right)$
.
Δ is the KLDivergence.
Let ${\mathcal{C}}_{2}$
is the composition of Kmeans and Elbow Methods.
Below, we present the results obtained from the Elbow Method for different subnetworks (servers).
1) Sub network1 (Server1): IP source addresses from 59.166.0.0 to 59.166.0.9 shown in Figure 17.
2) Sub network2 (Server2): IP source addresses from 175.45.176.0 to 175.45.176.3 shown in Figure 18.
3) Sub network3 (Server3): IP source addresses from 149.171.126.0 to 149.171.126.19 shown in Figure 19.
From Figure 17, the maximum distortion value for subnetwork 1 (server1) is above 4. In Figure 18, the maximum distortion value for subnetwork 2 (server2) exceeds 2. Meanwhile, in Figure 19, the maximum distortion value for subnetwork 3 (server3) is above 0.008, but only for one IP address, while more than 20 IP addresses have a distortion value of zero.
Based on these findings, our approach predicts that subnetworks 1 (server1) and 2 (server2) are suspicious, as they exhibit abnormal or malicious activities in the network traffic. Therefore:
Figure 17. Elbow results for sub network1 (server1) based on HTTP services.
Figure 18. Elbow results for sub network2 (server2) based on HTTP services.
Figure 19. Elbow Results for sub network3 (server3) based on HTTP services.

Predicted 


Negative 
Positive 
Total 
Actual 
Negative 
1 
1 
2 

Positive 
0 
1 
1 

Total 
1 
2 
3 
It follows that:
$\text{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}=\frac{2}{3}=67\%$
.
$\text{Precision}=\frac{TP}{TP+FP}=\frac{1}{1}=100\%$
.
$\text{Recall}=\frac{TP}{TP+FN}=\frac{1}{2}=50\%$
.
False Negative (FN) = 0%.
False Positive (FP)$=\frac{1}{3}=33.33\%$
.
4.3. Discussion
Table 14. Evaluation of the proposed approaches.
Techniques 
Attacks types 
Target 
Learning types 
Logic rules 
Training is not required 
Multitarget 
Detection rate 
KullbackLeibler (KL) Divergence. Cosine Similarity TFIDF kmean algorithm. 
suspicious attacks for different target 
IPAddresses TCP/UDPPorts HTTP.Url others 
unsupervised learning 
✓ 
✓ 
✓ 
100% 
Table 14 resumes the main features of the proposed approach. Although it has shown the best detection rate (100%), our experimental dataset remains small and we need to apply it on further representative datasets to have better precision on this parameter and other metrics.
5. Conclusions
This paper introduces a promising new technique for incident detection, leveraging differential analysis. Initially, the traffic undergoes dispersion via a slicing function ${\mathcal{S}}_{\phi}$
, partitioning it into sequences of slices based on propositional logical formulas φ, which are specified by the enduser. Subsequently, each slice undergoes transformation through a measuring function ${\mathcal{F}}_{\psi}$
, mapping it to a point in ${\mathbb{R}}^{n}$
by quantifying select characteristics defined by the enduser via a formula ψ. following this, the distances between successive values returned by ${\mathcal{F}}_{\psi}$
, associated with the same sequence, are evaluated using a designated function Δ (e.g., KLDivergence). Lastly, employing a clustering technique (e.g., Kmeans), the values produced by Δ are clustered, and the number of clusters is estimated. If any sequence yields more than one cluster, it indicates suspicious activity.
The experimental results demonstrate significant promise, with a 100% accuracy achieved across both datasets used in the experiments. However, it’s essential to note that this level of accuracy may not be guaranteed with other datasets and is contingent upon the parameters selected for analysis, such as φ and ψ.
In addition to its remarkable efficiency, the approach exhibits versatility in tackling a wide array of attacks spanning various activities, including those targeting networks, operating systems, and applications. Notably, it operates without necessitating any learning step or data.
Looking ahead, our future endeavors entail applying this methodology to diverse datasets encompassing log files that capture a spectrum of activities across networks, operating systems, and applications. Furthermore, we aspire to integrate this approach into an opensource Security Information and Event Management (SIEM) tool like Wazuh, thereby extending its accessibility and practicality within cybersecurity frameworks.