OpenAI:2026推理模型难以控制其思维链研究报告(英文版)(40页).pdf

编号:1160395 PDF  中文版  DOCX 40页 1.29MB 下载积分:VIP专享
下载报告请您先登录!

1、Reasoning Models Struggleto Control their Chains of ThoughtChen Yueh-HanNYU,MATSRobert McCarthyUCL,MATSBruce W.LeeUPenn,MATSHe HeNYUIan KivlichanOpenAIBowen BakerOpenAIMicah CarrollOpenAITomek KorbakOpenAIAbstractChain-of-thought(CoT)monitoring is a promising tool for detecting misbehaviorsand under

2、standing the motivations of modern reasoning models.However,ifmodels can control what they verbalize in their CoT,it could undermine CoTmonitorability.To measure this undesirable capability CoT controllability we introduce the CoT-Control evaluation suite,which includes tasks that requiremodels to s

3、olve problems while adhering to CoT instructions,e.g.,reasoning abouta genetics question without using the word“chromosome”.We show that reasoningmodels possess signifi cantly lower CoT controllability than output controllability;for instance,Claude Sonnet 4.5 can control its CoT only 2.7%of the tim

4、e but 61.9%when controlling its fi nal output.We also fi nd that CoT controllability is higherfor larger models and decreases with more RL training,test-time compute,andincreased problem diffi culty.CoT controllability failures extend even to situationsin which models are given incentives(as opposed

5、 to direct requests)to evade CoTmonitors,although models exhibit slightly higher controllability when they aretold they are being monitored.Similarly,eliciting controllability by adversariallyoptimizing prompts does not meaningfully increase controllability.Our resultsleave us cautiously optimistic

6、that CoT controllability is currently unlikely tobe a failure mode of CoT monitorability.However,the mechanism behind lowcontrollability is not well understood.Given its importance for maintaining CoTmonitorability,we recommend that frontier labs track CoT controllability in futuremodels.1Introducti

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(OpenAI:2026推理模型难以控制其思维链研究报告(英文版)(40页).pdf)为本站 (111111) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠