《塑造人工智能开放基础设施的未来.pdf》由会员分享,可在线阅读,更多相关《塑造人工智能开放基础设施的未来.pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、Ian BuckVP of HPC and HyperscaleNVIDIAShaping the Future of Open Infrastructure for AIGiga-Scale AI is Transforming Data CentersDriving extreme co-design from chip to grid with open collaborationNVIDIA Giga-Scale Reference DesignsPowerCoolingNetworkingComputeMechanicalScale-Up Spectrum-X EthernetOpe
2、n CollaborationCPXPower Smoothing45C Liquid CoolingMGX010,00020,00030,00040,00050,00060,0000100200300400500600GPT-OSS LaunchInferenceMAXTensorRT-LLM+Spec DecodeAug 2025GPT-OSS LaunchTodayCost per Million TokensBlackwell Optimizations Achieve 5X Throughput in 2 MonthsMulti-fold reduction in token cos
3、tsThroughputTPS per GPUInteractivityTPS per UserGPT-OSS-120B$0.11$0.02 5X100030,000 TPS/GPU5x Throughput in 2 monthsH200 NVL8GB200 NVL72Non-GPU CostsGPU CostsProfitExtreme Hardware-Software Co-Design for Inference Performance$5M GB200 NVL72 investment can generate$75M token revenue02,5005,0007,50010
4、,00012,500105090Measured DeepSeek-R1ThroughputTPS per GPUInteractivityTPS per User15xNVL72FP4DynamoTRT-LLMTRT Model OptimizerCUDA GraphsH200GB200AI Factory ROI$75M Revenue$5M$5M CostRevenue estimates assume 3-year operation on 72 GPUs at 50 TPS/User with DeepSeek R1 and$1.45/M token cost,based on In
5、ferenceMAX results and SemiAnalysis TCO model;actual ROI may vary.Inference Complexity is ExplodingMore parameters,experts,reasoning,kernels&shapes,and contextDS-R1,GPT OSS,Kimi K2,Llama4,Qwen3,Cosmos,Gemini,LTM-2-mini,Sora2Mixture of ExpertsDense TransformersDense LLMsInferenceComplexityBERTLlama32
6、024201820232025Massive Context(Video generation,software application development)1Expert10KKernels,Shapes300+Experts10MKernels,Shapes1M+Context Tokens(2,000 x vs.BERT)Next Generation Vera Rubin for Giga-Scale AIOCP MGX compatible infrastructureVera Rubin NVL144Vera Rubin CPXComputeMemoryBandwidthNVL