《基于 IOWN APN 的 KV 缓存共享:构建可持续的高性能全国分布式人工智能用于 LLM 推理.pdf》由会员分享,可在线阅读,更多相关《基于 IOWN APN 的 KV 缓存共享:构建可持续的高性能全国分布式人工智能用于 LLM 推理.pdf(22页珍藏版)》请在三个皮匠报告上搜索。
1、KV Cache Sharing over IOWN AllKV Cache Sharing over IOWN All-Photonics Network:Photonics Network:Building a Sustainable and HighBuilding a Sustainable and High-Performance NationPerformance Nation-Wide Distributed AI for LLM InferenceWide Distributed AI for LLM InferenceNTTKV Cache Sharing over IOWN
2、 AllKV Cache Sharing over IOWN All-Photonics Network:Photonics Network:Building a Sustainable and HighBuilding a Sustainable and High-Performance NationPerformance Nation-Wide Distributed AI for LLM InferenceWide Distributed AI for LLM InferenceKenji TanakaResearch Engineer/NTT Device Innovation Cen
3、terOPTICAL COMMUNICATION NETWORKSDC(Green Energy)DC(Green Energy)GreenGreen-Aware Routing:Aware Routing:A front-end router distributes requests based on CO,DC load,and KV Cache state.Compensating with Cache Sharing:Compensating with Cache Sharing:To offset potential performance loss from green routi
4、ng,we share massive KV Caches between DCs.APN is WellAPN is Well-Suited:Suited:The IOWN APN is ideal for transferring KV caches,due to its low-latency and high-bandwidth.DC(Fossil Fuel)DC(Fossil Fuel)Our Vision:Sustainable Nation-Wide Distributed AIGreenGreen-Aware RouterAware RouterKV Cache Sharing
5、KV Cache Sharing(via APN)(via APN)100s KmLoad:80%Load:80%Load:20%Load:20%KV match:20%KV match:20%KV match:80%KV match:80%DC(Green Energy)DC(Green Energy)GreenGreen-Aware Routing:Aware Routing:A front-end router distributes requests based on CO,DC load,and KV Cache state.Compensating with Cache Shari
6、ng:Compensating with Cache Sharing:To offset potential performance loss from green routing,we share massive KV Caches between DCs.APN is WellAPN is Well-Suited:Suited:The IOWN APN is ideal for transferring KV caches,due to its low-latency and high-bandwidth.DC(Fossil Fuel)DC(Fossil Fuel)Our Vision:S