The usefulness of today's websites is limited by their form and ease of access. Even though the web contains an ever-expanding wealth of information, much of it exists in a form that is not directly useful. How can end-users access the web in a way that meets their needs?
We present record and replay (R+R) as a way to bridge the gap between a website's functionality and the end-user's goal. R+R leverages an interface the user knows and is stable --- that is, the webpage --- in order to automate repetitive tasks. A R+R system observes a user interacting with a website and produces a script which, when executed, repeats the original interaction. End-users can use R+R to automate a sequence of actions and programmers can use these recordings as an API to execute more complicated tasks. Unfortunately, as websites become more complex, R+R becomes increasingly difficult.
The challenge with modern websites is that a single demonstration of the interaction has limited information, making scripts fragile to changes in the website. For past R+R systems, this was less of an issue because of the static nature of websites. But as the web becomes more dynamic, it becomes difficult to produce a robust script that mimics the interactivity of the user and can adapt to changes on the page.
To solve this problem, we developed Ringer, a R+R system for the web. Ringer is built on three key abstractions --- actions, triggers, and elements. Ringer takes a user demonstration as input and synthesize a script that interacts with the page as a user would. To make Ringer scripts robust, we develop novel methods for web R+R. In particular, Ringer uses the following features:
Inferring triggers automatically which synchronize the script with the state of the webpage
Monitoring the replay execution to ensure actions faithfully mimic the user
Identifying elements on the replay-time page using a similarity metric
To evaluate our work, we run Ringer on a suite of real-world benchmarks by replaying interactions on Alexa-ranked websites. We compare Ringer against a current state-of-the-art replay tool and find that Ringer is able to replay all 29 benchmark interactions, compared to only 5 benchmarks for the previous approach. Additionally, our benchmarks show that a replayer needs to synchronize with the state of a webpage in order to replay correctly, motivating Ringer's use of triggers. We show that our trigger inference algorithm can synthesize sufficient synchronization, while also having the added benefit of speeding up the replay execution. Finally, we show that R+R is useful as a building block for end-user applications by building two such tools using Ringer. One allows end-users to scrape structured data from a website simply through demonstration. The other allows end-users to aggregate real-time data from various websites in the form of live tiles, by specifying the data they want on a website through demonstration.