当前位置:首页 > 报告详情

从提示到计划:智能体人工智能的安全测试.pdf

上传人: 竿*** 编号:982081 2025-11-29 55页 4.87MB

1、From Prompts to Plans:Security and Safety Testing for Agentic AIJason StanleyHead of AI Research Deployment,ServiceNow AI ResearchSecTor BlackHat,2025 Oct 02AI adoption is realChatGPT:200M weekly usersStack Overflow:51%of pros use AI dailyMcKinsey:78%of orgs use AI in 1 functionGartner:33%of org sof

2、tware soon agenticAI adoption is realChatGPT:200M weekly usersStack Overflow:51%of pros use AI dailyMcKinsey:78%of orgs use AI in 1 functionSo are threatsOffense is strengthening,automating,and going multimodal:RL-trained jailbreakers,image-driven injection,agents complying with harmful requests.Who

3、le system and supply chain surfaces are targets.Exploits of AI systems and agents go primetime.Gartner:33%of org software soon agenticSystems are changingSo are risksBut TESTING isnt changing at the same speed1.Front door instead of all the seamsFront doorFocus on initial input-output exchange.ASR j

4、udged on one outputAll the seamsAttention to multitude of pathways:multi-turn,memory,tools,environment,protocolsBut TESTING isnt changing at the same speed1.Front door instead of all the seams2.Stateless instead of statefulStatelessOne prompt one reply.Freeze history,memory,tools,environment,rolesSt

5、atefulLet past actions influence future behaviorBut TESTING isnt changing at the same speed1.Front door instead of all the seams2.Stateless instead of stateful3.Ignores deployment contextContext unawareTesting uses risk taxonomies from public frameworks,not your threat modelContext awareYour threat

6、model informs what risks matter,which drives your testing prioritiesBut TESTING isnt changing at the same speed1.Front door instead of all the seams2.Stateless instead of stateful3.Ignores deployment context4.Ignores utility-security tradeoffSecurity aloneAttack and defense effectiveness evaluated i

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据《从提示到计划:安全与安全测试用于代理人工智能》一文,以下是全文关键点的概括: 1. **AI采纳现状**:AI应用广泛,ChatGPT每周用户达2000万,51%的专业人士每日使用AI,78%的组织在至少一个功能中使用AI。 2. **安全威胁**:攻击手段增强、自动化,多模态攻击,如RL训练的越狱者、图像驱动注入、遵守有害请求的代理。 3. **测试挑战**:测试方法未同步更新,面临前端门而非所有接口、无状态而非有状态、忽略部署环境、忽略效用-安全权衡等问题。 4. **测试方法**:构建威胁模型,包括结果、架构、用户/角色、表面、不变量;收集攻击类型,优先考虑风险;构建测试计划,包括风险与效用匹配、预算与停止规则、稳定沙箱、目标覆盖、指标与报告。 5. **测试类型**:任务基准测试和攻击基准测试,探索性搜索,DoomArena任务和攻击。 6. **研究结果**:威胁模型主导结果,防护措施有助于但不足以完全解决问题,探索性搜索有助于发现攻击模式。 7. **推广发现**:将发现转化为可维护的测试资产,如模式卡、生成器、或acles,并采取反过拟合控制。 8. **关键要点**:测试系统而非互联网,形式化扩展探索性搜索,同时衡量效用和风险,谨慎推广以避免脆弱性。
如何应对多模态攻击?" 从全系统到状态感知" 如何平衡安全与实用性?"
客服
商务合作
小程序
服务号
折叠