Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Unveiling Covert Threats: Towards Physically Safe and Transparent AI Systems

Abstract

This thesis examines the ethical implications of artificial intelligence through the lens of physical safety. It scrutinizes the various ways AI systems can instigate unsafe behavior in users, emphasizing the underexplored domain of covertly unsafe language. To improve the safety-related reasoning ability of large language models, we propose FARM to systematically generate rationales attributed to credible sources for physical safety scenarios. Lastly, we close with a broader discussion on AI transparency, delineating its differing research threads and associated considerations such as safety, and call toward a human-centered approach to evaluate future research, centering on the foundational debate of whether humans should trust intelligent systems.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View