报告预览

2893 - IBM Storage Ceph 在 AI_ML 工作负载领域的应用.pdf

编号：983011

PDF 16页 441.09KB 下载积分：VIP专享

下载报告请您先登录！

2893 - IBM Storage Ceph 在 AI_ML 工作负载领域的应用.pdf

1、Orlando,FLOctober 69IBM TechXchange 2025Session 1408Kyle BaderChief Architect,Data and AI,CephIBM StorageIBM Storage Ceph in theWorld of AI/ML WorkloadsAgenda hetero inferenceprefill/decode disaggkv caching(enabler)-optimize tcoIBM TechXchange|2025 IBM Corporation3What is inference KV caching?IBM Te

2、chXchange|2025 IBM Corporation4GPT,Can you summarize this document?PDFIBM Granite 3 paper:30k tokensAttention is All You Need paper:8k tokensPrefillDecodeBuild tensors representing prompt contextGenerate response to promptPrefill rate decayIBM TechXchange|2025 IBM Corporation5PrefillDecodePrefill Ra

3、te(tokens/second)TensorTokensUpdate weightsacross all layersWith each newtoken,reducingprefill rateO(n2)attentioncomplexityTime to first tokentLMCache BlockIBM TechXchange|2025 IBM Corporation6Long contextCache block 1Cache block 2LMCache defaultcache block sizeis 256 tokensSelected Model:Qwen/Qwen3

4、-32BHidden Size:5120Number of Attention Heads:64Number of Hidden Layers:64Number of Key-Value Heads:8Head Size:80(Hidden Size/Attention Heads)Data Type Size:2 bytesTotal Elements:2 64 256 8 80=20971520Total Bytes:20971520 2=41943040 bytesKV Cache Size:41943040/(1024)0.0391 GBKV Cache Size Calculator

5、Space for timeIBM TechXchange|2025 IBM Corporation7Prefill(compute)DecodecachetSpeedupIf cache blocks canbe loaded from storagefaster than they can becomputed we reducetime-to-first-tokenParallelIBM TechXchange|2025 IBM Corporation8Prefill(compute)DecodetSpeedupWe can load smallercache blocks in par

6、allelto further reduce thetime-to-first-tokenComputed prefillprogresses sequentiallycachecacheArchitectureIBM TechXchange|2025 IBM Corporation9vLLMDynamoNIXLCeph RGWRequests cacheblocksManage KV Cacheblocks,cache logicHigh performanceIO layerCache blockpersistenceS3 via obj backe

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（2893 - IBM Storage Ceph 在 AI_ML 工作负载领域的应用.pdf）为本站（竿头日上）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。