《人工智能规模化网络.pdf》由会员分享,可在线阅读,更多相关《人工智能规模化网络.pdf(19页珍藏版)》请在三个皮匠报告上搜索。
1、Ram VelagaSVP/GMBroadcomNetworking for AI ScalingCOMPUTERIS THETHE NETWORKScale-upScale-OutIn RackAcross RacksDatacenter1Datacenter2Across RacksSpineLeafSpineLeafAcross Data CentersAI Scale-up and Scale-out NetworkingXPU Scale-up:High Bandwidth Memory Sharing Across XPUsXPUHBMHBMHBMHBMXPUHBMHBMHBMHB
2、M4 x HBM3E(9.6Tbps)38.4Tbps8 x HBM4(12.8Tbps)102.4TbpsKey requirements:High networking bandwidth,efficient data transfer,reliable transportFocus AreasL2/L3 framingEfficient headersError recoveryLossless networkXPU AXPU BXPU COCP SUE TransportOther Accelerator Transport Other Accelerator TransportExe
3、cute at your own paceFreedom to innovate/implementPush vs pull memory accessOrdering modelLoad balancingSchedulingEthernet PHYEthernet Data LinkEthernet HeaderEthernet for Scale-up Networking(ESUN)Ethernet Scale-up:High Performance,Open,Existing SpecificationsEthernet for Scale-Up Networking(ESUN)Ke
4、y Focus Areas&Contributions Memory semantics/RDMA transactions Congestion control(ECN,CBFC)Reliability(LLR,PFC)Efficiency(Optimized headers-AFH)86 members&growingEthernet for Scale-up Networking-SAI/SONiC WorkgroupTomahawk Ultra:Ultra-low Latency,High-performance and Reliable EthernetXPU transaction
5、interfaceXPU transactioninterfacePacking/MappingTransportNetwork headerEthernet MACLinkPHYPacking/MappingTransportNetwork headerEthernet MACLinkPHYTomahawk Ultra 150 ns(Tx+Rx)250 nsEthernet Scale-up Networking:End-to-end ultra-low latency 70%Power ReductionOpen CPO Ecosystem:NTT1M Link Flap Free Dev
6、ice Hours*Source:Siamak Amiralizadeh,D.Alduino,W.Zhang,V.Lowalekar,G.Wang,N.Ge,R.Zhu,N.Hoang,J.Stever,J.Xu,J.Pruitt,A.John,S.Agrawal,F.Mercado,O.Moeller,D.Young.“CPO Technology Evaluation for Hyperscale Data Center Fabric Switches.”2025.Industrys First 100T CP