当前位置:首页 >英文主页 >中英对照 > 报告详情

OpenAI:2025 GDPVAL:AI模型在实际经济价值任务中的性能评估报告(英文版)(29页).pdf

上传人: 1****1 编号:925258 2025-09-28 29页 11.28MB

下载:

1、GDPVAL:EVALUATINGAI MODELPERFORMANCEONREAL-WORLDECONOMICALLYVALUABLETASKSTejal PatwardhanRachel DiasElizabeth ProehlGrace KimMichele WangOlivia WatkinsSim on Posada FishmanMarwan AljubehPhoebe ThackerLaurance FauconnetNatalie S.KimPatrick ChaoSamuel MiserendinoGildas ChabotDavid LiMichael SharmanAle

2、xandra BarrAmelia GlaeseJerry TworekOpenAIABSTRACTWe introduce GDPval,a benchmark evaluating AI model capabilities on real-world economically valuable tasks.GDPval covers the majority of U.S.Bureauof Labor Statistics Work Activities for 44 occupations across the top 9 sectorscontributing to U.S.GDP(

3、Gross Domestic Product).Tasks are constructed fromthe representative work of industry professionals with an average of 14 years ofexperience.We fi nd that frontier model performance on GDPval is improvingroughly linearly over time,and that the current best frontier models are approach-ing industry e

4、xperts in deliverable quality.We analyze the potential for frontiermodels,when paired with human oversight,to perform GDPval tasks cheaper andfaster than unaided experts.We also demonstrate that increased reasoning effort,increased task context,and increased scaffolding improves model performance on

5、GDPval.Finally,we open-source a gold subset of 220 tasks and provide a pub-lic automated grading service at to facilitate future research inunderstanding real-world model capabilities.1INTRODUCTIONThere is growing debate about how increasingly capable AI models could affect the labor marketwhether b

6、y automating specifi c tasks,replacing entire occupations,or creating entirely new kindsof work(Brynjolfsson et al.,2025;Chen et al.,2025).Current approaches to measure the economicimpact of AI focus on indicators such as adoption rates,usage patterns,and GDP growth attributedto AI(Chatterji et al.,

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据《GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks》一文,以下为全文关键点: 1. GDPval是一个评估AI模型在现实世界经济任务上表现的新基准,涵盖美国44个职业的9个主要经济部门。 2. 每个任务基于具有平均14年经验的行业专家的实际工作成果构建。 3. 研究发现,前沿模型在GDPval上的性能随时间大致线性提高,当前最佳前沿模型在交付质量上接近行业专家。 4. 分析表明,前沿模型在人类监督下可能比无辅助专家更便宜、更快地完成GDPval任务。 5. 增加推理努力、任务背景和支架可以提高模型在GDPval上的性能。 6. 开源了220个任务的黄金子集,并提供了一个公开的自动评分服务,以促进未来对现实世界模型能力的研究。
超越行业专家?" AI经济价值评估" 工作市场新变革?"
客服
商务合作
小程序
服务号
折叠