- Main
Unveiling Covert Threats: Towards Physically Safe and Transparent AI Systems
- Mei, Alex
- Advisor(s): Wang, William Y
Abstract
This thesis examines the ethical implications of artificial intelligence through the lens of physical safety. It scrutinizes the various ways AI systems can instigate unsafe behavior in users, emphasizing the underexplored domain of covertly unsafe language. To improve the safety-related reasoning ability of large language models, we propose FARM to systematically generate rationales attributed to credible sources for physical safety scenarios. Lastly, we close with a broader discussion on AI transparency, delineating its differing research threads and associated considerations such as safety, and call toward a human-centered approach to evaluate future research, centering on the foundational debate of whether humans should trust intelligent systems.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-