Skip to main content
Open Access Publications from the University of California

Teaching People and Machines to Enhance Images

  • Author(s): Berthouzoz, Floraine Sara Martianne
  • Advisor(s): Agrawala, Maneesh
  • et al.

Procedural tasks such as following a recipe or editing an image are very common. They require a person to execute a sequence of operations (e.g. chop onions, or sharpen the image) in order to achieve the goal of the task. People commonly use step-by-step tutorials to learn these tasks. We focus on software tutorials, more specifically photo manipulation tutorials, and present a set of tools and techniques to help people learn, compare and automate photo manipulation procedures. We describe three different systems that are each designed to help with a different stage in acquiring procedural knowledge.

Today, people primarily rely on hand-crafted tutorials in books and on websites to learn photo manipulation procedures. However, putting together a high quality step-by-step tutorial is a time-consuming process. As a consequence, many online tutorials are poorly designed which can lead to confusion and slow down the learning process. We present a demonstration-based system for automatically generating succinct step-by-step visual tutorials of photo manipulations. An author first demonstrates the manipulation using an instrumented version of GIMP (GNU Image Manipulation Program) that records all changes in interface and application state. From the example recording, our system automatically generates tutorials that illustrate the manipulation using images, text, and annotations. It leverages automated image labeling (recognition of facial features and outdoor scene structures in our implementation) to generate more precise text descriptions of many of the steps in the tutorials. A user study finds that our tutorials are effective for learning the steps of a procedure; users are 20-44% faster and make 60-95% fewer errors when using our tutorials than when using screencapture video tutorials or hand-designed tutorials.

We also demonstrate a new interface that allows learners to navigate, explore and compare large collections (i.e. thousands) of photo manipulation tutorials based on their command-level structure. Sites such as or collect tens of thousands of photo manipulation tutorials. These collections typically contain many different tutorials for the same task. For example, there are many different tutorials that describe how to recolor the hair of a person in an image. Learners often want to compare these tutorials to understand the different ways a task can be done. They may also want to identify common strategies that are used across tutorials for a variety of tasks. However, the large number of tutorials in these collections and their inconsistent formats can make it difficult for users to systematically explore and compare them. Current tutorial collections do not exploit the underlying command-level structure of tutorials, and to explore the collection users have to either page through long lists of tutorial titles or perform keyword searches on the natural language tutorial text. We present a new browsing interface to help learners navigate, explore and compare collections of photo manipulation tutorials based on their command-level structure. Our browser indexes tutorials by their commands, identifies common strategies within the tutorial collection, and highlights the similarities and differences between sets of tutorials that execute the same task. User feedback suggests that our interface is easy to understand and use, and that users find command-level browsing to be useful for exploring large tutorial collections. They strongly preferred to explore tutorial collections with our browser over keyword search.

Finally, we present a framework for generating content-adaptive macros (programs) that can transfer complex photo manipulation procedures to new target images. After learners master a photo manipulation procedure, they often repeatedly apply it to multiple images. For example, they might routinely apply the same vignetting effect to all their photographs. This process can be very tedious especially for procedures that involve many steps. While image manipulation programs provide basic macro authoring tools that allow users to record and then replay a sequence of operations, these macros are very brittle and cannot adapt to new images. We present a more comprehensive approach for generating content-adaptive macros that can automatically transfer operations to new target images. To create these macro, we make use of multiple training demonstrations. Specifically, we use automated image labeling and machine learning techniques to to adapt the parameters of each operation to the new image content. We show that our framework is able to learn a large class of the most commonly-used manipulations using as few as 20 training demonstrations. Our content-adaptive macros allow users to transfer photo manipulation procedures with a single button click and thereby significantly simplify repetitive procedures.

Together these tools provide an automated system to create high quality step-by-step tutorials, allow users to explore and compare large collections of online tutorials, and facilitate repetitive procedures by automating the transfer of photo manipulations. As more and more instructional material appears online, we believe that providing such tools for learning, comparing and automating procedures will be essential to help people work efficiently with software tools.

Main Content
Current View