Skip to main content
eScholarship
Open Access Publications from the University of California

Does Explainable AI Need Cognitive Models?

Abstract

Explainable AI (XAI) aims to explain the behavior of opaque AI systems, and in this way, increase their trustworthiness. However, current XAI methods are explanatorily deficient. On the one hand, "top-down" XAI methods allow for global and local prediction, but rarely support targeted internal interventions. On the other hand, "bottom-up" XAI methods may support such interventions, but rarely provide global behavioral predictions. To overcome this limitation, we argue that XAI should follow the lead of cognitive science in developing cognitive models that simultaneously reproduce behavior and support interventions. Indeed, novel methods such as mechanistic interpretability and causal abstraction analysis already reflect cognitive modeling principles that are familiar from the explanation of animal and human intelligence. As these methods might serve as best practices for trustworthy AI, they deserve closer philosophical scrutiny.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View