Specification Mining: New Formalisms, Algorithms and Applications
Specification is the first and arguably the most important step for formal verification and correct-by-construction synthesis. These tasks require understanding precisely a design's intended behavior, and thus are only effective if the specification is created right. For example, much of the challenge in bug finding lies in finding the specification that mechanized tools can use to find bugs. It is extremely difficult to manually create a complete suite of good-quality formal specifications, especially given the enormous scale and complexity of designs today. Many real-world experiences indicate that poor or the lack of sufficient specifications can easily lead to misses of critical bugs, and in turn design re-spins and time-to-market slips.
This dissertation presents research that mitigates this manual and error-prone process through automation. The overarching theme is specification mining - the process of inferring likely specifications by observing a design's behaviors. We explore formalisms and algorithms to mine specifications from different sources, and demonstrate that the mined specifications are useful if not essential for a variety of applications such as verification, diagnosis and synthesis. The first part of the dissertation presents two approaches to mine specifications dynamically from simulation or execution traces. The first approach offers a simple but effective template-based remedy to the aforementioned problem. The second approach presents a novel formalism of specification mining based on the notion of sparse coding, which can learn latent structures in an unsupervised setting, and thus are not restricted by predefined templates. Additionally, we show that the mined specifications from both approaches can be used to localize bugs effectively.
In the second part of the dissertation, we study the problem of synthesis from temporal logic specifications. This synthesis approach offers an attractive proposition - one can automatically construct a functionally correct system from its behavioral description. The downside, however, is that it completely relies on the user to not only specify the intended behaviors of the system but also the assumptions on the environment. The latter is especially tricky in practice as environment assumptions are often implicit knowledge and seldom documented. We propose a framework that learns assumptions from the counterstrategies of an unrealizable specification to systematically guide it towards realizability. We further show that, the proposed counterstrategy-guided assumption mining approach enables the automatic synthesis of a new class of semi-autonomous controllers, called human-in-the-loop (HuIL) controllers. A crucial component of such a controller is an advisory that determines when to switch control from the autonomous controller to the human operator. We formalize the criteria that characterize a HuIL controller, by taking into account of human factors such as response time, and describe how to construct the advisory using assumption mining.
Human inputs are still critical in specification. In the last part of this dissertation, we describe two efforts on broadening the scope of specification mining with creative use of human inputs. The first is the design of a crowdsourced specification mining game called CrowdMine. The main idea of CrowdMine is to transform a design's traces into images and leverage the human ability to recognize patterns in images to assist the process of mining specifications. The second effort examines the feasibility of converting natural language specifications to formal specifications, with a focus on how specification mining encapsulated in a natural language processing (NLP) layer may assist non-expert users of formal methods at the requirement stage of a design.