Skip to main content
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Democratizing Web Automation: Programming for Social Scientists and Other Domain Experts


We have promised social scientists a data revolution, but it has not arrived. What stands between practitioners and the data-driven insights they want? Acquiring the data. In particular, acquiring the social media, online forum, and other web data that was supposed to help them produce big, rich, ecologically valid datasets. Web automation programming is resistant to high-level abstractions, so end-user programmers end up stymied by the need to reverse engineer website internals—DOM, JavaScript, AJAX. Programming by Demonstration (PBD) offered one promising avenue towards democratizing web automation. Unfortunately, as the web matured, the programs became too complex for PBD tools to synthesize, and web PBD progress stalled.

This dissertation describes how I reformulated traditional web PBD around the insight that demonstrations are not always the easiest way for non-programmers to communicate their intent. By shifting from a purely Programming-By-Demonstration view to a Programming-By-X view that accepts a variety of user-friendly inputs, we can dramatically broaden the class of programs that come in reach for end-user programmers. Our Helena ecosystem combines (i) usable PBD-based program drafting tools, (ii) learnable programming languages, and (iii) novel programming environment interactions. The end result: non-coders write Helena programs in 10 minutes that can handle the complexity of modern webpages, while coders attempt the same task and time out in an hour. I conclude with a discussion of the abstraction-resistant domains that will fall next and how hybrid PL-HCI breakthroughs will vastly expand access to programming.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View