1、PublicRecoNIC:RDMA-enabled Compute Offloading on FPGA-based SmartNICGuanwen(Henry)Zhong,Senior ResearcherAMD Research and Advanced Development2024 OFA Virtual Workshop2|PublicML model size and GPU performance over the past 10 yearsML model size over 10 years:8600 xExponential growth from 61M in 2012
2、 to 530B in 2021AMD GPU performance over 10 years:50 xML model size has outpaced the growth in single GPU performance over the past 10 years3|PublicEthernet speeds over the past 40 yearsML model size over 10 years:8600 xAMD GPU performance over 10 years:50 xEthernet speed over 10 years:10 xSignifica
3、ntly slower than GPU advancement and ML model size growthEmergence of scale-out architecturesA sea of heterogeneous nodes connected via the high-speed network*Source from Ethernet Roadmap 2023 by Ethernet Alliance 4|PublicEmergence of scale-out architecturesA sea of nodes connected via high-speed an
4、d low-latency network interconnectHeterogeneity within a nodeCPUs,FPGAs,GPUs,ASICs(such as TPUs),SmartNICs SmartNIC acts as an intermediate hub for various componentsRegular“NIC”functions:protocol handling,vSwitch,crypto,Value-add“NIC”functions:TOE,RDMA,security,telemetry,Upper layer processing:tran
5、sport-layer and above,accelerate streaming and lookaside applicationsHigh-speed and low-latency networking:RDMA5|PublicData communication in scale-out setupsTraditional way incurs multiple data copiesProgrammable SmartNIC-enabled system zero copy1.Enable direct memory access among peers2.Bring data
6、as close to compute as possible6|PublicWhat kind of programmable SmartNIC features do we need in a scale-out system?Normal network packetsTCP,UDP,DCCP,SCTP,QUIC,Remote direct memory access(RDMA)RoCEv2 packetsShared by host,GPU and FPGABring data as close to accelerators as possible for fast and adap