Anomaly detection is a data partitioning algorithm which separates outliers from normative data points. An unsupervised learning approach to this problem does not assume any prior information. Anomaly detection is a primary data analysis task with diverse applications and has been studied under many models and assumptions.Spectral methods such as spectral clustering have been widely used to solve the clustering problem. These methods make use of the leading K eigenvectors of the graph Laplacian matrix to detect K clusters, if the graph has a clear community structure.
In a setting where the data consists of unbalanced clusters, as in anomaly detection, the spectral properties are determined by a dominating component. In such cases, traditional graph clustering methods fail, while the spectral embedding norm was found to overcome this challenge.
This thesis generalizes the spectral embedding norm definition, formerly used, by introducing the capability of a weighted norm. With just one simple natural condition: allowing limited connectivity between the clusters and the background, we prove that this quantity can be used to detect the clusters of interest.
Experiments on both synthetic data sets and real-world defect detection images demonstrate the effectiveness of the algorithm and its performance was found to be stable, with respect to parameter choices.