Skip to main content

Table 3 A brief introduction of UML methods

From: Blockchain-oriented approach for detecting cyber-attack transactions

Type

Description

Related methods

Distance-based techniques

These techniques apply the distance of an observation to its kth nearest neighbor and compute an anomaly score based on the density of each observation

KNN/AvgKNN. It assigns an anomaly score to an observation based on the distance to the kth nearest neighbor and the average distance to k nearest neighbors respectively, according to Angiulli and Pizzuti (2002). The higher the anomaly score, the more abnormal it is

Local Outliers Factors (LOF). It is similar with KNN. It assigns an anomaly score to a given instance based on computing its local density relative to the density of its neighbors (Breunig et al. 2000)

Clustering-based techniques

It assumes that the normal observations belong to large or dense clusters, whereas the abnormal observations belong to sparse or small clusters. It operates on the output of clustering methods

Cluster-Based Local Outlier Factor (CBLOF). It classifies a given observation as an outlier if the size or the density of its cluster is below a threshold (He et al. 2003)

Histogram-based techniques

The first step is the construction of a histogram based on the value of feature, and the second step is to compute an anomaly score for a given observation according to the height of bin in which it falls

Histogram-based Outlier Score (HBOS). It assigns an anomaly score by building a histogram with a fixed or a dynamic bin width for each attribute, and computes the outlier score based on the height of bin where the data point locates (Goldstein and Dengel 2012)

Kernel-based techniques

These algorithms apply a linear classifier to solve a non-linear problem, which is conducted by transforming a linearly inseparable data to a linearly separable one

One-class SVM (OCSVM): It is an unsupervised algorithm that learns a decision function for novelty detection by learning a distinct boundary around all normal observations (Schölkopf et al. 2001). By this way, any observation that does not fall inside the learned boundary is detected as outliers

Neural network-based techniques

It assumes that normal observations fall in high probability regions of the model, while the abnormal data points lie in the low-probability regions

Variational Autoencoder (VAE). Its architecture consists of both an encoder and a decoder with the optimization goal of reducing the reconstruction error between the encoded–decoded data and the initial data (Kingma and Welling 2013)

Deep Support Vector Data Description (DeepSVDD). It is inspired by one-class SVM. Ruff et al. (2018) introduced a novel approach of anomaly detection by training a neural network and minimizing the volume of a hypersphere that encloses the network representations of normal samples

Ensemble-based techniques

These techniques combine multiple estimators in an anomaly detection via reducing the variance of model accuracy and making the algorithm to be more robust

Feature Bagging (FB). It combines the results from multiple individual outlier detection models trained by a small subset of features that are randomly selected from the original feature set (Lazarevic and Kumar 2005)

Isolation Forest (IF). This technique isolates anomalies fast instead of normal points, which builds trees by splitting the randomly chosen features based on the random value from the range of the features (Liu et al. 2012)