Testing the Extents of AI Empathy: A Nightmare Scenario
Too Long; Didn't Read
This document describes an evaluation of how various AI assistants handle empathetic conversations. The AIs evaluated include Claude, Gemini, ChatGPT, Willow, Pi.ai, Mistral, and a customized version of Claude. Each AI was prompted with scenarios involving being sad, happy, or having nightmares. Their responses were assessed based on expression of sympathy, attempts to understand the user, provision of space for emotions, advice quality, affirmative conversation, manifestation of empathy, and escalation of serious issues. Willow and Pi.ai demonstrated the most empathy, while base Claude and Gemini needed prompting and API use was required for Mistral. The customized Claude performed well compared to the benchmarks.
Share Your Thoughts