当前位置:首页 >英文主页 >中英对照 > 报告详情

兰德公司:2025面向前沿大型语言模型生物学知识的综合基准测试研究报告(英文版)(67页).pdf

上传人: 1****1 编号:977933 2025-11-28 67页 4.95MB

下载:

1、SUNISHCHAL DEV,CHARLES TEAGUE,GRANT ELLISON,KYLE BRADY,YING-CHIANG JEFFREY LEE,SARAH L.GEBAUER,HENRY ALEXANDER BRADLEY,DAWID MACIOROWSKI,BRIA PERSAUD,JORDAN DESPANIE,BARBARA DEL CASTELLO,ALYSSA WORLAND,MICHAEL MILLER,ADRIAN SALAS,DAVE NGUYEN,JAMES LIU,JASON JOHNSON,ANDREW SLOAN,WILL STONEHOUSE,TRAVI

2、S MERRILL,THOMAS GOODE,GREG MCKELVEY,JR.,ELLA GUESTToward Comprehensive Benchmarking of the Biological Knowledge of Frontier Large Language ModelsResearch ReportFor more information on this publication,visit www.rand.org/t/RRA3797-1.About RANDRAND is a research organization that develops solutions t

3、o public policy challenges to help make communities throughout the world safer and more secure,healthier and more prosperous.RAND is nonprofit,nonpartisan,and committed to the public interest.To learn more about RAND,visit www.rand.org.Research IntegrityOur mission to help improve policy and decisio

4、nmaking through research and analysis is enabled through our core values of quality and objectivity and our unwavering commitment to the highest level of integrity and ethical behavior.To help ensure our research and analysis are rigorous,objective,and nonpartisan,we subject our research publication

5、s to a robust and exacting quality-assurance process;avoid both the appearance and reality of financial and other conflicts of interest through staff training,project screening,and a policy of mandatory disclosure;and pursue transparency in our research engagements through our commitment to the open

6、 publication of our research findings and recommendations,disclosure of the source of funding of published research,and policies to ensure intellectual independence.For more information,visit www.rand.org/about/research-integrity.RANDs publications do not necessarily reflect the opinions of its rese

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据《面向前沿大型语言模型生物知识的全面基准测试》报告,以下为全文主要内容概括: 1. 研究背景:随着人工智能(AI)的发展,AI系统在生物和化学领域的知识能力日益增强,可能被恶意分子用于生物和化学武器开发。 2. 研究方法:评估了39个前沿和较老的大型语言模型(LLM)在6个生物和化学知识基准和2个拒绝基准上的表现,包括无安全微调和生物微调模型。 3. 研究结果: - 前沿LLM在生物实验协议和研究生水平生命科学问题上的表现超过人类专家。 - 许多公开的生物和化学基准已被最新一代模型饱和,可能不再适用于未来模型能力评估。 - 无安全微调模型在拒绝有害请求方面有效,但也在知识基准上降低了性能。 4. 建议: - 基准创建者应包括人类基线,以使模型性能结果具有可解释性。 - 基准创建者应创建更具挑战性和专业化的评估,以减轻饱和问题。 - 基准实施者应报告关键基准实施细节,以提高可重复性和可比性。
**前沿AI,生物武器风险?** **AI知识基准,如何评估?** **LLM模型,安全风险知多少?**
客服
商务合作
小程序
服务号
折叠