《利用共封装铜_光插座解决人工智能规模化应用的运营障碍.pdf》由会员分享,可在线阅读,更多相关《利用共封装铜_光插座解决人工智能规模化应用的运营障碍.pdf(21页珍藏版)》请在三个皮匠报告上搜索。
1、Peter WinzerResolving Operational Barriers to AI Scale-Up with Co-Packaged Copper/Optics SocketsResolving Operational Barriers to AI Scale-Up with Co-Packaged Copper/Optics SocketsPeter WinzerHow Much I/O Does AI Need?Operational intensity Flops/Byte0.11101001000System Performance Flops/s100 G1 T10
2、T100 T1 P10 P100 P100 EXPUs could use orders of magnitude more I/O BWVery hungrySimple calculationsNot so hungryComplex calculationsHow data-hungry is your algorithmSources:1 S.Williams,A.Waterman,and D.Patterson“Roofline:An Insightful Visual Performance Model for Floating-Point Programs and Multico
3、re Architectures,”Communications of the ACM,52(4),65-76(2009).2 N.P.Jouppi et al.,“In-Datacenter Performance Analysis of a Tensor Processing Unit TM,”Proc.44th annual int.symp.on computer architecture,1-12(2017).3 H.Ltaief et al.,“Scaling the“Memory Wall”for Multi-Dimensional Seismic Processing with
4、 Algebraic Compression on Cerebras CS-2 Systems,”ACM/IEEE Int.Conf.High Performance Computing,Networking,Storage,and Analysis(SC23)(2023).4 Nvidia NVL72;online:https:/ limitedHow Much I/O Does AI Need?FugakuCondor GalaxyGoogle TPUv1Nvidia K80 GPUIntel Haswell CPUOperational intensity Flops/Byte0.111
5、01001000System Performance Flops/s100 G1 T10 T100 T1 P10 P100 P100 EXPUs could use orders of magnitude more I/O BWVery hungrySimple calculationsNot so hungryComplex calculationsHow data-hungry is your algorithmSources:1 S.Williams,A.Waterman,and D.Patterson“Roofline:An Insightful Visual Performance
6、Model for Floating-Point Programs and Multicore Architectures,”Communications of the ACM,52(4),65-76(2009).2 N.P.Jouppi et al.,“In-Datacenter Performance Analysis of a Tensor Processing Unit TM,”Proc.44th annual int.symp.on computer architecture,1-12(2017).3 H.Ltaief et al.,“Scaling the“Memory Wall”