面向人工智能后端网络的精确时间协议:建模与部署洞见.pdf

编号:1012029 PDF 16页 1.08MB 下载积分:VIP专享
下载报告请您先登录!

面向人工智能后端网络的精确时间协议:建模与部署洞见.pdf

1、Ahmad Byagowi,Turba.aiAmit Oren,BroadcomBhaskar Chinni,BroadcomPTP in AI NetworksPTP in AI NetworksAhmhad Byagowi,Turba.aiAmit Oren,BroadcomBhaskar Chinni,BroadcomOCP Special Focus:Artificial Intelligence(AI)Introduction:Need for time synchronization in AI networks Phantom Jam,Phantom Traffic and Ph

2、antom DelayHow TCP determines channel capacityUse cases&benefits of delay awarenessTest dataConclusionAgendaAn emerging behavior of cascading controllersPhantom JamSource:https:/ Increase,Multiplicative DecreaseHow TCP Determines Channel Capacity?Source:https:/ Open Loop backed with Time Slices inst

3、ead of independent controllersPotential SolutionImportance of network for AI workloadsXPUXPUHBMHBMHBMHBMXPUXPUHBMHBMHBMHBM4 x HBM3E(9.6Tbps)38.4Tbps8 x HBM4(12.8Tbps)102.4TbpsBesides improvements in the network speeds,efficiency is also importantEfficiency means effective traffic schedulingOne way l

4、atency(OWL)can be an effective tool for traffic schedulerOWL requires precision time in all the nodesPrecision time is a product of time synchronizationPTP for Network Efficiency(for OWL capability)OWL from host A to host B is the time between As NIC transmit timestamp and Bs NIC receive timestamp f

5、or the same packet.Unlike RTT/2,OWL captures asymmetry(different paths/queuing in each direction)which is common in Clos/leaf-spine fabrics.Why OWL matters in AI workloads:Collectives(e.g.,ring/tree all-reduce)and MoE token routing are barrier-sensitive;tail OWL(p99/p99.9)often controls step time ev

6、en when average latency is low.Microbursts(incast to a single ToR egress)can create millisecond-class queueing spikes that dominate p99 OWL.Production-grade measurement patterns(hardware-assisted):Clock sync:Use PTP(IEEE 1588/802.1AS)with boundary/transparent clocks so both endpoints NIC PHCs are al

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(面向人工智能后端网络的精确时间协议:建模与部署洞见.pdf)为本站 (明日何其多) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠