The main focus of this thesis is using machine learning and data mining techniques to solve challenging problems. Three problems from different subject areas are discussed: nuclear material detection, drug ranking and target tracking in video sequences. The techniques of the three problems described are all based on an efficiently solvable variant of normalized cut, Normalized Cut Prime (NC').
The first problem concerns detecting concealed illicit nuclear material, an important part of strategies preventing and deterring nuclear terrorism. What makes this an extremely difficult task are physical limitations of nuclear radiation detectors (arising from energy resolutions and efficiency) and shielding materials terrorists would presumably use to surround the radioactive nuclear material and absorb some of the radiation, thereby reducing the strength of the detected signal. This means the central data analysis problem is identifying a potentially very weak signal, and distinguishing it from both background noise arising from the detector characteristics and naturally occurring environmental radiation. We aim at enhancing the capabilities of detection with algorithmic methods specifically tailored to nuclear data. A novel graph-theory-based methodology based on NC' is used, called Supervised Normalized Cut (SNC). This data mining method classifies measurements obtained from very low resolution plastic scintillation detectors. The accompanying computational study, comparing SNC method with several alternative classification methods shows that in terms of accuracy, the SNC method is on par with alternative approaches, yet SNC is computationally more efficient.
The second subject area is in the field of drug ranking. This problem refers to placing in rank order, according to their effectiveness, several drugs treating the same disease, using data derived from cell images. Current technologies use the recently developed high-throughput drug profiling (high content screening or HCS). Despite the potential of HCS for accurate descriptions of drug profiles, it produces a deluge of data of quantitative and multidimensional nature, posing analytical challenges in the data mining process. Our new framework is designed to alleviate these difficulties, in the way of producing graph theoretic descriptors and automatically ordering the performance of drugs, called fractional adjusted bi-partitional score (FABS), a way of converting classification to scores. We experimented with the FABS framework by implementing different algorithms and assessing the accuracy of results by a comparative study, which includes other four baseline methods. The conclusion is encouraging: FABS implemented with NC' consistently outperforms other implementations of FABS and alternative methods currently used for ranking that are unrelated to FABS.
The third problem is target tracking in video sequences -- it can be framed as an unsupervised learning problem: the goal is to delineate a target of interest in a video from background. The tracking task is cast as a graph-cut, incorporating intensity and motion data into the formulation. Tests on real-life benchmark videos show that the developed technique, NC-track, based on NC', is more efficient than many existing techniques, and that it delivers good quality results.