1、Short-reach Optical Interface(SRIO)Scale Connectivity for Optimized Data-movement Through Memory for AI/ML ApplicationsSiamak Tavallaei,Sr.Principal Engineer,Samsung Semiconductor,Inc.FTI Workshop:Short Reach Optical InterconnectsTrending requirement and building blocks to helpStargate ChallengeHow
2、a tool may guide the focus on architectural decisionsHow short-reach optics may helpCall-to-action for the SROI teamOutline Baseline Server NodeBaseline Server NodeSRAM/Cache T0CPU-Mem T1Local Node Storage T3Storage on DC Network T4 M:Local DDRx MemoryC:CPUS:NVMe/PCIe SSD StorageN:NICAI/ML-optimized
3、 Memory HierarchySRAM/Cache T0GPU-HBM T1CPU-Mem(+CXL)T2(T2+)Storage on SO T3-SOGPU-HBM-SU T1-SUCPU-Mem-SU(+CXL)T2(T2+)-SUStorage T3Storage on SU T3-SU Storage on DC Network T4 AI Infrastructure Memory SO T2-SOScale-up(SU)Scale-out(SO)Data-movement is through memoryPhysically DisaggregateLogically Co
4、mposeTraditional scale-up fabrics couple CPUs to build large symmetric multi-processing systems(SMP)Run large,parallel processing workloads under one OS with efficient protocols for Load/StoreMemory/cache-coherence in hardware with low-latency for small payloadsCXL fulfills these requirements(Major
5、CPU manufacturers Root Ports)Emerging scale-up fabrics for AI/ML natively couple xPUs to switchesUse the same protocol as native xPUs for distributed processing paradigmNot requiring the hardware-based cache-coherence or the low-latency featuresRequire high-throughput interconnects for emerging soft
6、ware to move data ahead of useHybrid fabric switches emerge to“bridge”between native scale-up fabric protocolsBridge xLink used by xPUs and the standard CXL protocolScale-up disaggregated memory pooling and sharingStretched PyramidIncrease Capacity at each TierSRAM/Cache T0GPU-HBM T1CPU-Mem(+CXL)T2(