杨珂_Mooncake:解耦式架构和以存换算优化大模型推理.pdf

编号:1268205 PDF 50页 12.42MB 下载积分:VIP专享
下载报告请您先登录!

杨珂_Mooncake:解耦式架构和以存换算优化大模型推理.pdf

1、杨珂 趋境科技技术专家|Mooncake 核心贡献者MooncakeMooncake:Ke Yang,Approaching.AI Tech Expert|Mooncake Core Contributor解耦式架构和以存换算,优化大模型推理解耦式架构和以存换算,优化大模型推理目 录CONTENTS Background:LLM Inference in Long-contex xt EraMooncake:A KVCache-centric Disaggregated ArchitectureMooncake LL M Ecosystem CollaborationCurrent Parad

2、igm:Data+Algorithm+Hardware=IntelligenceAlgorithm-Transformer is all we need?Data Big Data is EverywhereHardware Huangs Law Take OverIntelligence AI Become Everywhere TooThe Old Scaling Law is Slowing downBUT,who use it?LargerModelMoreDataGrowingComputing PowerThe Old Scaling LawThe Old Scaling LawP

3、erformance gains from adding more parameters are increasingly limited.It is becoming difficult to gather enough high-quality data to feed ultra-large models.Everyone is Talking about Scaling Law But the Real Question is What to Scale?https:/ Data+Larger Model+Longer Context=Higher IntelligenceIn Jan

4、uary 2025,DeepSeek R1 quickly rose to become one of the most renowned large model services for its strong reasoning(long-output)capability.Long input-KimiLong output DeepSeek R1In March 2024,Kimi became one of the leading large model services thanks to its strong long-context(long-input)processing c

5、apability.More Data+Larger Model+Longer Context=Higher IntelligenceChain-of-ThoughtMore Data+Larger Model+Longer Context=Higher IntelligenceAI applications are evolving from simple chat to complex agent-based systems.Single-turn,short inputs/outputsMulti-turn,complex execution topologies,long inputs

6、/outputs.More Data+Larger Model+Longer Context=heavier workloadHiger Inference CostLonger Response TimeLack of Computing and Memory ResourcesOne of the key bottlenecks in the long-context era:Inference costs are skyrocketingAmazon reports that over 90%of costs come from inference rather than trainin

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(杨珂_Mooncake:解耦式架构和以存换算优化大模型推理.pdf)为本站 (柒柒) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠