《Trustworthy Policy Learning under the Counterfactual No-Harm Criterion.pdf》由会员分享,可在线阅读,更多相关《Trustworthy Policy Learning under the Counterfactual No-Harm Criterion.pdf(34页珍藏版)》请在三个皮匠报告上搜索。
1、Counterfactual No-Harm Criterion:Individual Risk and Trustworthy Policy LearningPeng WuJoint work with Zhi Geng,Yue Liu,Haoxuan Li,and Chunyuan Zheng.Beijing Technology and Business UniversityOctober 17,2023Peng Wu(DataFun 因果推断在线峰会 23)Counterfactual No-Harm Criterion1/34Introduction1Introduction2Sha
2、rp Bounds of the No-Harm Criterion3No-Harm Trustworthy Policy Learning4ExperimentsPeng Wu(DataFun 因果推断在线峰会 23)Counterfactual No-Harm Criterion2/34IntroductionBackground1Policy learning determines the individuals who should be treated based on theircovariates,and it is important that humans can trust
3、 a decision made by analgorithm.2A trustworthy algorithm is expected to meet various advanced requirements,including fairness,diversity,explainability,accountability,safety,etc.3In this talk,we discuss the”harmlessness”of policy learning.Peng Wu(DataFun 因果推断在线峰会 23)Counterfactual No-Harm Criterion3/
4、34IntroductionWhat is No-Harm?Hippocratic oath:”First do no harm”.Isaac Asimovs Laws of Robotics:”A robot may not injure a human being or,through inaction,allow a human being to come to harm.”Peng Wu(DataFun 因果推断在线峰会 23)Counterfactual No-Harm Criterion4/34IntroductionWhat is No-Harm?Peng Wu(DataFun
5、因果推断在线峰会 23)Counterfactual No-Harm Criterion5/34IntroductionWhat is No-Harm?A Toy ExampleConsider two policies,a treatment policy that is useful for 70%of patients but will harm 30%of patients.the second policy can be useful for 40%of patients but no harm.The two policies have the same average causa
6、l effect(40%).Clearly,the second policyis preferable.However,if the second policy can be useful for only 30%of patients but no harm.which policy is preferred?the first or the second?Peng Wu(DataFun 因果推断在线峰会 23)Counterfactual No-Harm Criterion6/34IntroductionNotationNotation:Observed data(Xi,Ti,Yi):i