Skip to main content
eScholarship
Open Access Publications from the University of California

Neuro-Symbolic Models of Human Moral Judgment

Creative Commons 'BY' version 4.0 license
Abstract

There has been exciting recent progress in computational modeling of moral cognition. Work in this area tends to describe the cognitive mechanisms of human moral judgment using symbolic models, which are interpretable and written in terms of representations that carry meaning. However, these models fall short of capturing the full human capacity to make moral judgments in that they fail to process natural language inputs but instead rely on formal problem specifications. The inability to interface with natural language also limits the usefulness of symbolic models. Meanwhile, there have been steady advances in conversational AI systems built using large language models (LLMs) that interface with natural language. However, these systems fall short as models of human reasoning, particularly in the domain of morality. In this paper we explore the possibility of building neuro-symbolic models of human moral cognition that use the strengths of LLMs to interface with natural language (specifically, to extract morally relevant features from it) and the strengths of symbolic approaches to reason over representations. Our goal is to construct a model of human moral cognition that interfaces with natural language, predicts human moral judgment with high accuracy, and does so in a way that is transparent and interpretable.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View