1、Introduction to Co-Packaged Optics(CPO)Architecture,Use Cases,and Operational,Software ImplicationsNTT Innovative DevicesIntroduction to Co-Packaged Optics(CPO)Architecture,Use Cases,and Operational,Software ImplicationsWataru IshidaNTT Innovative DevicesCHIPLETS AND ADVANCED PACKAGING/PHOTONICSWhy
2、NVLink Uses Copper,Not OpticsPower Consumption“If we had to use optics,we would have had to use transceivers and retimer and those transceivers and retimer alone would have cost 20,000 watts”-Jensen Huang-GTC 2024Reliability“co-packaged optics,a promising new chip technology designed to reduce energ
3、y consumption,is not yet reliable enough for deployment in the companys flagship graphics processing units(GPUs).”-Jensen Huang-GTC 2025GPUNICGPUNICscale-upnetworkGPUNICGPUNICscale-upnetworkScale-up(NVLink/SUE)vs Scale-out(IB/Ethernet)networkscale-outnetworkBandwidth7.2Tbps(NVLinkGen5)vs 800Gbps(Con
4、nectX-8)Massive bandwidth needed for scale-upScalability72 vs more than 10kScale-up network size is limited by 1.5m copper reachcopperIn Synchronous Scale-Up,One Link Down Halts All051015202505000100001500020000Impact of Transceiver Failures(FIT 1000)on GPU trainingGPU CountRollback Overhead(%)Scale
5、-up networks use tightly synchronized collective communications(e.g.,AllReduce)Even one link failure breaks the operation -no failover or retryThis causes costly rollbacks in large-scale GPU trainingWhy Optical Modules Fail:Dust and LaserEnvironmental contamination(dust,debris)is the leading cause o
6、f transceiver failuresThe next major cause is internal laser failuresLaser Lifetime Is All About Temperature0100020003000400050006000020406080100120Estimated FIT vs Temperature(Arrhenius Model,normalized at 40C=100FIT)Estimated FITTemperature(C)Retimers,Heat,and the Limits of LPO at 200G/laneRetimer