1、Zhiping Yao,Senior Director,Alibaba GroupEnabling Scalable Infrastructure:Alibabas High-Performance Network 8.0 and 102T Ethernet SwitchEnabling Scalable Infrastructure:Alibabas High-Performance Network 8.0 and 102T Ethernet SwitchZhiping Yao,Senior Director,Alibaba GroupNetworkingHigh Performance N
2、etwork of Alibaba CloudTopology-aware Collective communication;flowlet&adaptive routingHigh reliability:dual-uplink bonding,RDMA uninterrupted on single-link failureMulti-tenant isolation,RDMA virtualization End-to-end self-developed switch hardware/software and optical modulesHPN 7.0(ACM SIGCOMM24)
3、is the bedrock of Alibabas AI FabricAlibaba Cloud started to deploy HPN7.0 based on 51.2T TH5 in 09/2023Openness paves the way to greatness7.0 Fabric Switch 51.2Tbps 128x 400G QSFP112 Optic Module 400G/800G/1.6TMove fast and build ourselfCustomized in-house built switch using merchant silicon such a
4、s TH5,TH6 Open platform UNP for multisource and multi-ODMOpen source collaboration:SONiC,SAI,FRR,gRPC/gNMI,openBMC,etcDisaggregationNetwork hardware and software disaggregation,open NOS such as SONICMerchant silicon whitebox switch and routerPluggable optic and MSACost-efficientCompetition will acce
5、lerate cost optimization.Pay-as-you-go model provides flexibility8.0 Fabric Switch 102.4Tbps 128x 800G OSFP800 64x1.6T OSFP1600 6.0 Fabric Switch 25.6Tbps 128x 200G QSFP56 Access Switch 8.0Tbps 24x200G+8x400GHPN 8.0:era of training and inference convergence Back-end Cluster HPN8.010k xxxK clusterDSW
6、DSWHPN POD13k 100KHPN POD13K 100KBack-endClusterHPN7.xBack-end ClusterHPN7.xFront-endDSWDSWGPU HeadnodeCPFS/GPFSGPU HeadnodeCPFS/GPFSVPC、OSS、CPFS,DC cluster at AZ scaleDC CoreInter-DC RDMATraining and Inference Converged ArchitectureKey building block upgrade:102.4T Switch,800G NIC,800G/1.6T OpticMu