1、Monitoring MonitorabilityMelody Y.GuanMiles WangMicah CarrollZehao DouAnnie Y.WeiMarcus WilliamsBenjamin ArnavJoost HuizingaIan KivlichanMia GlaeseJakub PachockiBowen BakerOpenAIAbstractObservability into the decision making of modern AI systems may be required to safelydeploy increasingly capable a
2、gents.Monitoring the chain-of-thought(CoT)of today s reasoningmodels has proven effective for detecting misbehavior.However,this“monitorability”may befragile under different training procedures,data sources,or even continued system scaling.Tomeasure and track monitorability,we propose three evaluati
3、on archetypes(intervention,process,and outcome-property)and a new monitorability metric,and introduce a broad evaluationsuite.We demonstrate that these evaluations can catch simple model organisms trained to haveobfuscated CoTs,and that CoT monitoring is more effective than action-only monitoring in
4、practical settings.We compare the monitorability of various frontier models and find that mostmodels are fairly,but not perfectly,monitorable.We also evaluate how monitorability scaleswith inference-time compute,reinforcement learning optimization,and pre-training model size.We find that longer CoTs
5、 are generally more monitorable and that RL optimization does notmaterially decrease monitorability even at the current frontier scale.Notably,we find thatfor a model at a low reasoning effort,we could instead deploy a smaller model at a higherreasoning effort(thereby matching capabilities)and obtai
6、n a higher monitorability,albeit at ahigher overall inference compute cost.We further investigate agent-monitor scaling trends andfind that scaling a weak monitor s test-time compute when monitoring a strong agent increasesmonitorability.Giving the weak monitor access to CoT not only improves monito