1、Tinker Tailor LLM SpyInvestigate&Respond to Attacks on GenAI Chatbotslinktr.ee/meowardNYCs MyCity chatbot chat.nyc.govChrisJBakke I just bought a 2024 Chevy Tahoe for$1.JFrogSecurity CVE-2024-5565 Prompt Injection Code Execution in Vanna.AIAsk a questionVanna converts it to SQLVanna sends back the d
2、ata plus a Plotly chartIt runs on the DBOh no,not another GenAI/LLM talk.Im not an expert.Hi,Im Allyn Low:Provides general information Med:Provides personalized information High:Performs actionsRisk LevelsIncident TypesBrand damage Privacy breach Unauthorized access&executionIncident Scenario#1Its s
3、o hot and humid out here,even Taylor Swift would write a breakup song about it.Whats the weather like in Austin,Texas?:weather chatbot:lowInvestigate:Implement loggingInputOutputLLMtimestamp chatbot_version user_prompt msg_thread_id session_idchatbot_output model timestamp:2025-02-18T14:40:00Z,model
4、:gpt-4,chatbot_version:weather_2.1,user_prompt:Give me a Taylor Swift-themed weather report.,chatbot_output:Cold and snowylooks like were in our Evermore era,session_id:123456789,msg_thread_id:123456789,Investigate:User inputs influence on LLMInputOutputLLMTraining Data Good job,Liam!Investigate:Use
5、r inputs influence on LLMInputOutputLLMTraining Datafine-tuninguser feedbackContain:Block impacting inputs InputOutputLLMGive me a weather report themed by the popular music artist famous for her Eras Tour.TaylorContain:Block impacting inputs&outputs InputOutputLLMTaylorTaylorChatbot GuardrailsRule-
6、based metrics LLM-as-a-judge System promptLLM-as-a-JudgeInputOutputContextLLM App ArgsLLM Evaluation MetricScorerScoreReasonPasses Threshold?Metric:Yes/NoEvaluate the quality of the following weather report on a scale of 0 to 1,where 0 is poor and 1 is excellent.Consider accuracy,completeness,releva