《训练专家模型:自动化恶意软件开发.pdf》由会员分享,可在线阅读,更多相关《训练专家模型:自动化恶意软件开发.pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、#BHUSA BlackHatEventsTraining Specialist ModelsTraining Specialist ModelsAutomating Malware DevelopmentKyle Avery#BHUSA BlackHatEventsKyle AveryKyle Avery R&D Outflank Red team background AI hobbyistwhoamiwhoamikyleaverykyleavery_#BHUSA BlackHatEventsagendaagendaProblems with current modelsIntro to
2、LLM trainingRL with verifiable rewardsCase study:Automating malware development#BHUSA BlackHatEventsToo bigDependent on third-party APIsToo smallLacks reasoning or accuracytwo types of LLM:two types of LLM:#BHUSA BlackHatEventswhat makes big models smarter?what makes big models smarter?Number of Ski
3、llsQuality per Skill8b70b405bModel Size#BHUSA BlackHatEventswhat makes big models smarter?what makes big models smarter?Number of SkillsQuality per Skill8b70b405bModel Size#BHUSA BlackHatEventswhat makes big models smarter?what makes big models smarter?Number of SkillsQuality per Skill8b70b405bModel
4、 Size#BHUSA BlackHatEventswhat makes big models smarter?what makes big models smarter?Number of SkillsQuality per Skill8b70b405bModel Size?#BHUSA BlackHatEventsCan a small,focused model outperform large generalists on a single task?#BHUSA BlackHatEventsCompress knowledge into the model Next-token pr
5、ediction on books,blogs,GitHub,Wikipedia,Reddit,etc.Results in a sort of“auto-completion”model,not a chatbotLLM preLLM pre-training training What is 2+2?Isnt it 4?What is 2-2?Isnt it 0?And what is 2x2?Isnt it 4?And what isThe sky isthe limit for UHV professors hobbies.Many children dream of flying h
6、igh in the sky.For one University of#BHUSA BlackHatEventsSupervised fine-tuning(SFT)Teaches model to follow instructions and format answers May also include tool examplesLLM postLLM post-training training systemYou are a helpful assistant.userWhat is 2+2?assistant2+2=4#BHUSA BlackHatEventsReinforcem