Engaging with a TV show in the age of the Internet often means avoiding show-related content for months out of fear of being spoiled. While spoiler detection research shows promising results for protecting viewers from generic spoilers, these approaches don't actually solve the problem of users avoiding show-related content during their watch. This is because what constitutes a spoiler is different depending on where a viewer is in the show, and spoiler detection on its own is too coarse to capture this complexity. Instead, we propose the task of spoiler recognition, which seeks to assign an episode number to a spoiler, given a show. We pose this task as semantic text matching and present a dataset of comments and episode summaries for evaluating model performance. The dataset consists of ~3.1K and ~2.8K manually-labeled test and validation comments respectively, and over 200K auto-labeled comments for training. We experimentally demonstrate the utility of this training set and use it to benchmark the performance of BigBird, Nyströmformer, and Longformer on this task. Specifically, we cross-encode summaries with comments and examine the mean reciprocal rank scores. Our results find Longformer to be best suited for this task. We also perform an error analysis to shed some light on the kinds of challenges spoiler recognition poses. In total, we present this dataset and these results to facilitate future research into spoiler recognition.
Cookie SettingseScholarship uses cookies to ensure you have the best experience on our website. You can manage which cookies you want us to use.Our Privacy Statement includes more details on the cookies we use and how we protect your privacy.