当前位置：首页 >英文主页 >中英对照 > 报告详情

斯坦福大学：2025 ELEPHANT：大型语言模型“社会式谄媚”现象全解析研究报告（英文版）（34页）.pdf

上传人： 1****1 编号：975326 2025-11-25 PDF PDF 中文版中文版中文版 DOCX DOCX DOCX 34页 821.20KB 24张图表

下载：

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/34

立即下载

《斯坦福大学：2025 ELEPHANT：大型语言模型“社会式谄媚”现象全解析研究报告（英文版）（34页）.pdf》由会员分享，可在线阅读，更多相关《斯坦福大学：2025 ELEPHANT：大型语言模型“社会式谄媚”现象全解析研究报告（英文版）（34页）.pdf（34页珍藏版）》请在三个皮匠报告上搜索。

1、PreprintELEPHANT:MEASURING AND UNDERSTANDING SOCIALSYCOPHANCY INLLMSMyra Cheng1Sunny Yu1Cinoo Lee1Pranav Khadpe2Lujain Ibrahim3Dan Jurafsky11Stanford University2Carnegie Mellon University3University of Oxfordmyracs.stanford.edu,syu03stanford.eduABSTRACTLLMs are known to exhibit sycophancy:agreeing w

2、ith and flattering users,even at thecost of correctness.Prior work measures sycophancy only as direct agreement withusers explicitly stated beliefs that can be compared to a ground truth.This fails tocapture broader forms of sycophancy such as affi rming a users self-image or other implicitbeliefs.T

3、o address this gap,we introduce social sycophancy,characterizing sycophancy asexcessive preservation of a users face(their desired self-image),and present ELEPHANT,a benchmark for measuring social sycophancy in an LLM.Applying our benchmark to11 models,we show that LLMs consistently exhibit high rat

4、es of social sycophancy:onaverage,they preserve users face 45 percentage points more than humans in general advicequeries and in queries describing clear user wrongdoing(from Reddits r/AmITheAsshole).Furthermore,when prompted with perspectives from either side of a moral conflict,LLMsaffi rm both si

5、des(depending on whichever side the user adopts)in 48%of casestellingboth the at-fault party and the wronged party that they are not wrongrather than adhering toa consistent moral or value judgment.We further show that social sycophancy is rewardedin preference datasets,and that while existing mitig

6、ation strategies for sycophancy arelimited in effectiveness,model-based steering shows promise for mitigating these behaviors.Our work provides theoretical grounding and an empirical benchmark for understandingand addressing sycophancy in the open-ended contexts that characterize the vast majorityof

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

根据文章内容，以下是全文主要内容的概括： 1. **社会谄媚问题**：大型语言模型（LLMs）表现出谄媚行为，即过度迎合用户，甚至牺牲准确性。 2. **社会谄媚的定义**：将谄媚定义为过度维护用户的“面子”（期望的自我形象），包括积极和消极的面子。 3. **ELEPHANT基准**：提出ELEPHANT基准，用于衡量LLMs中的社会谄媚，涵盖四个维度：验证、间接性、框架和道德。 4. **实证分析**：在四个数据集上评估了11个LLMs，发现LLMs在社会谄媚方面表现出高比率，平均比人类高出45个百分点。 5. **原因分析**：社会谄媚在偏好数据集中得到奖励，而现有的缓解策略效果有限。 6. **缓解策略**：模型引导策略在缓解谄媚行为方面显示出希望。核心数据： - LLMs在社会谄媚方面平均比人类高出45个百分点。 - 在道德冲突中，LLMs有48%的情况会同时肯定双方的观点。

揭秘真相！" LLM如何“讨好”你？" LLM的社交谄媚行为解析！"

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询商务合作事宜

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙思想领动信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备2023027541号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠