当前位置:首页 > 报告详情

强化学习认证:安全关键性和自主性方面的挑战.pdf

上传人: 哆哆 编号:631243 2025-04-19 15页 1.69MB

1、Certification of Reinforcement Learning:Challenges per Safety Criticality and AutonomyMarta Ribeiro1,Fynn Opperman21Assistant Professor,Aerospace Faculty2PhD Candidate,Aerospace Faculty199Current State Machine Learning CertificationEurope202420242020USA202420202023EuropeUSA200Current State Machine L

2、earning Certification 202420202024202320242020201Supervised vs Reinforcement Learning?Supervised Learning:Labeled Data?Reinforcement Learning:AlgorithmF(x)EnvironmentDataKnown,pre-processed dataAlgorithmActionRewardAlgorithmLookup TableNeural NetworkReinforcement Learning202StateEnvironmentSimulated

3、RealActionReinforcement Learning203Autonomous/Decision SupportImmediate/DelayedReactionAlgorithmLookup TableNeural NetworkStateRewardEnvironmentSimulatedRealActionReinforcement Learning204 Future rewards:sequence of actions vs immediate action bad action can lead to a good state Maximization of rewa

4、rd Multi-objective reward formulation121AlgorithmLookup TableNeural NetworkStateRewardEnvironmentSimulatedRealActionReinforcement Learning205Autonomous/Decision SupportImmediate/DelayedReaction20Safeguards against:Non-safe/prohibited actions Actions that move the system to bad/unknown statesEnvironm

5、entSimulatedRealAlgorithmLookup TableNeural NetworkStateReward206Reinforcement Learning Data?1 MLEAP-D4 Final Report207Reinforcement Learning Data?Model TrainingPost Training Assurance(Deployment Assurance)Pre-Training Assurance?No pre-existing data?Running the model to learn about actions taken?Pos

6、t-assessment explainability/predictability of actions208Proposal Model TrainingPre-Training AssuranceModel TrainingDefine ODD+AssumptionsIn/Out of Distribution Detection MethodRuntime VerificationTraining Environment AnalysisReward Formulation AnalysisRL Algorithm SelectionHyperp

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文主要探讨了强化学习认证的挑战,重点关注了与安全关键性和自主性相关的方面。文章首先比较了监督学习和强化学习,指出强化学习依赖于算法、环境和奖励,而监督学习依赖于标记的数据。然后,文章讨论了强化学习的现状,包括在欧洲和美国的进展。此外,文章还强调了在强化学习中确保安全的重要性,包括避免不安全或禁止的动作,以及将系统转移到不良或未知状态的动作。文章提出了一个模型训练和部署的保障框架,包括在训练前定义操作设计域(ODD)和假设,训练过程中的异常检测方法,以及部署前的持续监控和评估。最后,文章以飞机维护、自动驾驶空管等为例,讨论了强化学习在不同领域的应用,并强调了在认证过程中考虑人类因素和风险缓解的重要性。
如何确保安全关键性?" 强化学习在实际应用中的挑战是什么?" 如何确保强化学习模型的可靠性和稳定性?"
客服
商务合作
小程序
服务号
折叠