《研讨会 - 扩展人工智能:利用缓存算法最大化代理投资回报率.pdf》由会员分享,可在线阅读,更多相关《研讨会 - 扩展人工智能:利用缓存算法最大化代理投资回报率.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、 Scaling AI:Leverage Caching Algorithms to Maximize Agentic ROI Agentic AI innovation is creating cost pressures that are crushing profit margins across industries.Todays AI architects face a critical challenge:balancing cost efficiency with rapid development cycles and competitive time-to-market ad
2、vantages.Key-value caching(KV Cache)offers a powerful solution.When implemented effectively,KV Cache can transform your AI economics by dramatically reducing time to first token and slashing cost per tokenall without sacrificing performance or time to market.WEKAs Val Bercovici and Betsy Chernoff wi
3、ll demonstrate how to architect and deploy caching algorithms that optimize your token economics,sharing practical strategies to achieve cost-effective agentic AI innovation at scale.Session Abstract2 WEKA 2025 Val Bercovici|Chief AI Officer Betsy Chernoff|AI Product Marketing LeadScaling AI:Leverag
4、ing Caching Algorithms to Maximize Agentic ROILeave with concrete understanding of design patterns around KV cache optimizations.Workshop GoalChallengesKey MetricsImpact of Caches in LLMsImplementationAgenda WEKA 20253Insights from Our LabsBest Practices123456 Achieving Profitable Inference is Extre
5、melyDifficult Common pattern of a modern inference systemChoice of GPU servers(not just NVIDIA)List of services the system offers to customers(API consumption)A mechanism for knowledge securityThe system that interfaces directly with the GPU servers and schedules the work DeepSeek showed that workin
6、g at this level enables efficienciesPrompt and answer caching to prevent the need to do inference at allA way to ingest knowledge with RAGOption to rent or buy the equipment in COLO or CloudTeam of wicked smart people to put it all together!A system that understands inference sessions and can route