Skip to main content
eScholarship
Open Access Publications from the University of California

Constructing Parsers by Example via Interactive Program Synthesis

  • Author(s): Leung, Alan
  • Advisor(s): Lerner, Sorin
  • et al.
Abstract

Parsers – programs that extract structure from strings – are fundamental components of many software systems. Although parsing theory may have found its roots in early work on programming languages, the growth of computing during the intervening decades has expanded the role of parsers into many systems one might not immediately expect: email clients, video games, spreadsheet programs, and relational databases are only a few among myriad examples of systems that extract structured information from input text. As a result, the construction of parsers has become a ubiquitous programming task that is performed by developers across a large spectrum of domains. It is not just a task for programming language experts anymore.

Unfortunately, despite over forty years of research on parsing, writing parsers remains a painstaking, manual process that is prone to subtle bugs and pitfalls. Existing tools for generating parsers assume a great deal of background knowledge in parsing and formal language theory, so the learning curve is high.

In this dissertation, we argue that it is possible to make parsing more accessible by combining interactive visual feedback with the programming-by-example paradigm, wherein users synthesize programs simply by providing example inputs and outputs demonstrating the result of the intended computation. Towards this aim, we present novel algorithms for (1) constructing syntactic specifications by example, (2) constructing lexical analyses by example, and (3) visualizing progress toward parser completion. We instantiate these algorithms in two graphical development environments we have implemented, Parsify and its successor Parsimony, whose central user interaction paradigm is that of programming-by-example. Finally, via user study we demonstrate that non-expert users indeed show significantly better performance when using our system.

Main Content
Current View