Post-Deployment runtime verification (RV) has recently emerged as a complementary technology to extend coverage of conventional software verification and testing methods. This thesis is an attempt to tackle three major barriers that need to be surmounted before RV technologies become in widespread use:
Barrier-1: Lack of an expressive, yet efficiently monitorable, specification language. Distributed software behavior is projected onto an observation interface consisting of data-carrying (or parameterized) events, such as Linux system calls including argument values, and self-replicating deterministic finite automata (SR-DFAs) are introduced for RV purposes as well as anomaly-based intrusion detection in embedded and general-purpose software systems based on these parametric traces.
Barrier-2: The substantial performance and power overhead of pure software RV frameworks. NUVA, which stands for nonuniform verification architecture, a distributed automata-based RV architecture for software specifications in the form of SR-DFAs. NUVA has been implemented over a cache-coherent nonuniform-memory-access (ccNUMA) multiprocessor and can be deployed on the FPGA fabric that will reside on all next-generation processor chips. The core of NUVA is a coherent distributed automata transactional memory (ATM) that efficiently maintains states of a dynamic population of automata checkers organized into a rooted dynamic directed acyclic graph (DAG) concurrently shared among all processor nodes.
Barrier-3: Formal specifications are hard to formulate and maintain for evolving complex embedded and general-purpose software systems. Therefore, specification mining has long ago been envisioned to play a key role in software verification, modification and documentation. However, in order to scale beyond simple, library/API-level properties having short temporal spans, specification mining tools need to support more expressive specification languages that can capture complex, application-level properties. This thesis introduces a bio-inspired complete specification mining methodology for SR-DFAs using an iterative and interactive mining tool, called ParaMiner. ParaMiner relies on novel mining algorithms invoking multiple-sequence alignment (MSA) techniques to enable learning specifications from temporal slices of software behavior while overcoming the initial-state uncertainty problem.
SR-DFAs and ParaMiner have been leveraged in a new specification-based intrusion detection (ID) framework that protects distributed, reactive computing systems against cyberattacks having very sparse signatures, arbitrarily long time spans and wide attack fronts. Such attacks lie outside the scope of conventional anomaly-based ID methods which typically work with short event windows and ignore manipulated data objects, such as files and sockets. We demonstrate the effectiveness of the constructed SR-DFAs at classifying as well as resolving subtle behaviors typical of cyberattacks with varying evasion parameter values.