当前位置:首页 > 报告详情

LLM 缩放定律前沿的最新消息.pptx

上传人: 一*** 编号:653513 2025-05-01 17页 9.25MB

1、Update from the LLM scaling laws frontier,Jason Clinton,CISOApril 2025,Leading intelligence increases andcybersecurity implications,Our perspective,2,Research lab,Think tank,Startup,4,Benchmarks double-click,Graduate-level reasoningGPQA Diamond3,Agentic codingSWE-bench Verified2,Agentic tool useTAU-

2、bench,Multilingual Q&AMMMLU,Visual reasoningMMMU(validation),OpenAI o3(high),Gemini 2.5 Pro,83.3%,82.9%,81.7%,63.8%,69.1%,84.0%,Retail70.4%,Airline52%,Airline,Retail,Claude 3.7 Sonnet64K extended thinking,Claude 3.7 SonnetNo extended thinking,68.0%,83.2%,86.1%,75%,71.8%,Retail81.2%,Airline58.4%,78.2

3、%/84.8%,62.3%/70.3%,Anthropic models secure all top positions on the MASK leaderboard*a benchmark designed to measure AI honesty when pressured to make false statements.Anthropics models demonstrate superior alignment with facts under pressure,setting the standard for trustworthy AI.*MASK(Model Alig

4、nment between Statements and Knowledge)evaluates models resistance to providing false information,even when prompted to do so.,Claude models lead on honesty,Claude 3.7 Sonnet with thinking,82.13+1.25,MASK LeaderboardMeasures model honesty under pressure to lie,Claude 3 Opus,79+1.31,Claude 3.5 Sonnet

5、,o1-Pro,61.60+0.86,gpt 4o,60.0+2.07,GPT 4.5 Preview,56.93+4.02,Deepseek R1,57.32+2.58,Gemini 2.5 Pro Experimental,55.93+3.49,72.33+2.45,What is Claudes role in our lives?,7,2027,2024,Claude assistsClaude helps individuals do their current work better,making each person the best version of themselves

6、,2025,Claude collaboratesClaude does hours of independent work for you,on par with experts,expanding what every person or team is capable of,Claude pioneersClaude finds breakthrough solutions to challenging problems that would have taken teams years to achieve,Agents are AI syste

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
客服
商务合作
小程序
服务号
折叠