Skip to main content
eScholarship
Open Access Publications from the University of California

Stick to your Role! Stability of Personal Values Expressed in Large Language Models

Abstract

Standard Large Language Models (LLMs) evaluation contains many different queries from similar minimal contexts (e.g. multiple choice questions). Conclusions from such evaluations are little informative about models' behavior in different new contexts (e.g. in deployment). We argue that context-dependence should be studied as a property of LLMs. We study the stability of value expression over different contexts (conversation topics): Rank-order stability on the population (interpersonal) level, and Ipsative stability on the individual (intrapersonal). We observe consistent trends - Mixtral, Mistral, Qwen, and GPT-3.5 model families being more stable than LLaMa-2 and Phi - over those two types of stability, two different simulated populations, and even on a downstream behavioral task. Overall, LLMs exhibit low Rank-Order stability, highlighting the need for future research on role-playing LLMs, as well as on context-dependence in general. This paper provides a foundational step in that direction, and is the first study of value stability in LLMs.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View