当前位置:首页 > 报告详情

CSET:2023年控制大型语言模型输出入门(英文版)(19页).pdf

上传人: AG 编号:605593 2023-12-01 19页 8.05MB

1、Issue BriefDecember 2023Controlling Large Language Model Outputs:A PrimerAuthorsJessica JiJosh A.GoldsteinAndrew J.Lohn Center for Security and Emerging Technology|1 Executive Summary Concerns over risks from generative artificial intelligence(AI)systems have increased significantly over the past ye

2、ar,driven in large part by the advent of increasingly capable large language models(LLMs).Many of these potential risks stem from these models producing undesirable outputs,from hate speech to information that could be put to malicious use.However,the inherent complexity of LLMs makes controlling or

3、 steering their outputs a considerable technical challenge.This issue brief presents three broad categories of potentially harmful outputsinaccurate information,biased or toxic outputs,and outputs resulting from malicious usethat may motivate developers to control LLMs.It also explains four popular

4、techniques that developers currently use to control LLM outputs,categorized along various stages of the LLM development life cycle:1)editing pre-training data,2)supervised fine-tuning,3)reinforcement learning with human feedback and Constitutional AI,and 4)prompt and output controls.None of these te

5、chniques are perfect,and they are frequently used in concert with one another and with nontechnical controls such as content policies.Furthermore,the availability of open modelswhich anyone can download and modify for their own purposesmeans that these controls or safeguards are unevenly distributed

6、 across various LLMs and AI-enabled products.Ultimately,this is a complex and novel problem that presents challenges for both policymakers and AI developers.Todays techniques are more like sledgehammers than scalpels,and even the most cutting-edge controls cannot guarantee that an LLM will never pro

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文主要讨论了如何控制大型语言模型(LLM)的输出,以减少其可能产生的有害内容。文章首先指出,LLM由于其复杂性,控制其输出是一个技术挑战。然后,文章提出了三种可能产生有害输出的类别:不准确的信息、有偏见或有害的输出,以及恶意使用产生的输出。接着,文章介绍了四种控制LLM输出的技术:编辑预训练数据、监督式微调、强化学习结合人类反馈和宪法AI,以及提示和输出控制。文章还指出,这些技术并非完美,通常需要结合使用,并且由于开源模型的存在,这些控制措施在不同的LLM和AI产品中分布不均。最后,文章讨论了开源模型与私有模型在控制输出方面的差异,指出这是一个复杂且新颖的问题,对政策制定者和AI开发者都提出了挑战。
如何控制大型语言模型输出? 大型语言模型有哪些潜在风险? 如何确保大型语言模型的安全性?
客服
商务合作
小程序
服务号
折叠