1、Reasoning Models Struggleto Control their Chains of ThoughtChen Yueh-HanNYU,MATSRobert McCarthyUCL,MATSBruce W.LeeUPenn,MATSHe HeNYUIan KivlichanOpenAIBowen BakerOpenAIMicah CarrollOpenAITomek KorbakOpenAIAbstractChain-of-thought(CoT)monitoring is a promising tool for detecting misbehaviorsand under
2、standing the motivations of modern reasoning models.However,ifmodels can control what they verbalize in their CoT,it could undermine CoTmonitorability.To measure this undesirable capability CoT controllability we introduce the CoT-Control evaluation suite,which includes tasks that requiremodels to s
3、olve problems while adhering to CoT instructions,e.g.,reasoning abouta genetics question without using the word“chromosome”.We show that reasoningmodels possess signifi cantly lower CoT controllability than output controllability;for instance,Claude Sonnet 4.5 can control its CoT only 2.7%of the tim
4、e but 61.9%when controlling its fi nal output.We also fi nd that CoT controllability is higherfor larger models and decreases with more RL training,test-time compute,andincreased problem diffi culty.CoT controllability failures extend even to situationsin which models are given incentives(as opposed
5、 to direct requests)to evade CoTmonitors,although models exhibit slightly higher controllability when they aretold they are being monitored.Similarly,eliciting controllability by adversariallyoptimizing prompts does not meaningfully increase controllability.Our resultsleave us cautiously optimistic
6、that CoT controllability is currently unlikely tobe a failure mode of CoT monitorability.However,the mechanism behind lowcontrollability is not well understood.Given its importance for maintaining CoTmonitorability,we recommend that frontier labs track CoT controllability in futuremodels.1Introducti