《前沿分享:AI网络定义AI超级工厂.pdf》由会员分享,可在线阅读,更多相关《前沿分享:AI网络定义AI超级工厂.pdf(16页珍藏版)》请在三个皮匠报告上搜索。
1、AI 网络定义 AI 超级工厂AI Fabric Unlocks The Full Potential Of AI FactoryNVIDIA NetworkQingchun SongThe Data Center Is The Computer,The Network Defines The Data CenterSuperNICInfiniBand/EthernetSpectrum-XEthernet AI SwitchQuantum-XInfiniBand SwitchBlueField-3 DPUSecured User Access Storage/ManagementNVLink
2、SwitchCompute Scale-Up Compute Scale-OutSpectrum-X Ethernet Brings High Performance to Ethernet For TrainingThe Network Defines The Data CenterOTS Ethernet-Hyperscale CloudsSpectrum-X Ethernet AI FactoriesTCP(Low Bandwidth Flows and Utilization)RoCE(High Bandwidth Flows and Utilization)High Jitter T
3、oleranceLow Jitter Tolerance(Long Tail Kills Performance)Heterogeneous Traffic Average Multi-PathingBursty Network Capacity Predictable PerformanceLoosely Coupled ApplicationsDistributed Tightly-Coupled ProcessingXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXThe New Inference Architecture-Distributed InferenceSpe
4、ctrum-X Ethernet Expands To InferenceSingle GPU,Scale means n x single GPUScales to 100s GPUs Spectrum-X Ethernet East-WestPrefill/Decode using the same GPUKVCache Disaggregation Spectrum-X Ethernet East-West“Thinking models”Spectrum-X Ethernet StorageSingle agentsMulti-agents Spectrum-X Ethernet St
5、orage2024 Inference2025 InferenceMulti-turn,Global KVCache Spectrum-X Ethernet StorageApplication Compromises“Knowledge-models”SpectrumX Ethernet Low Jitter Communications for AISwitch-To-SuperNIC,End-To-End Network Processing,Bringing High Performance To Ethernet020406080100120140012345678910111213
6、141516171819202122232425Number of OperationsTime(msec)Spectrum-X EthernetOTS EthernetUltra-High-Speed Traffic Monitoring Distribute Data Across All Switch Ports Ignoring Data OrderingSpectrum-X Ethernet SwitchReordering-Receive Data and Place it Back in OrderSpectrum-X Ethernet SuperNICSchedule Data