Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

stability.gpt: Combining GPT-4 and Stable Diffusion to Generate Storybooks that are Textually and Visually Cohesive

Abstract

Generative artificial intelligence is strong at creative tasks. Image generation models like Stable Diffusion, Dall-E, and Midjourney generate images given a prompt. Language learning models like ChatGPT can generate text. Language learning models’ strengths are generating creative content like stories and poems. While current systems that combine the power of language learning models and image generation models to create storybooks exist, they lack image consistency - both in terms of style and visually depicting the text on the page.

This paper introduces stability.gpt - A new modern system using the latest version of GPT (GPT-4) and Stable Diffusion to generate storybooks that have consistency in image style. Upon having some users create stories with stability.gpt and analyzing ten stories from a diverse set of genres, it becomes clear that while character consistency has improved and that the images support the story text, there is still room for improvement in generating consistent animals, objects, and styles.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View