面向大规模人工智能集群的液冷解决方案.pdf

上传人：明****

编号：1011550

2025-12-21

PDF 15页 3.69MB

《面向大规模人工智能集群的液冷解决方案.pdf》由会员分享，可在线阅读，更多相关《面向大规模人工智能集群的液冷解决方案.pdf（15页珍藏版）》请在三个皮匠报告上搜索。

1、Liquid Cooling Solutions for Large-Scale AI ClustersSupermicroLiquid Cooling Solutions for Large-Scale AI ClustersDaniel KapesaProduct Manager/SupermicroAI CLUSTERSOutline 54321AI Cluster Workloads ChallengesLiquid Cooling FundamentalsFacility-Level Heat RejectionActionable Strategies for Deployment

2、Call to ActionAI Cluster Workloads ChallengesxPUsPower TrendAI power demand requires new power delivery&cooling approachesAI vs Compute PowerRubin NVL576 AI workloads generate unprecedented heat densities(multi-kilowatt GPUs per node).Managing heat efficiently is critical to maintaining performance

3、and reliability.Traditional air-cooling faces physical and efficiency limits at scale.Thermal Challenges in AI ClustersDesignPowerThermalsDirect liquid cooling removes heat at the source(cold plates on CPUs/GPUs)Higher heat transfer efficiency than air coolingKey parameters:coolant temperature,flow

4、rate,pressure,redundancyLiquid Cooling FundamentalsAdditional cold plates:Remove90%+of system heatCovers:DIMMs,VRMs,PCIe,PSUsLeveraging OCP CollaborationModular building blocks for scalable liquid-cooled AI clustersModular components:cold plates,coolant distribution units(CDUs),manifoldsScalability

5、and serviceability considerations for hyperscale deploymentsImportance of balanced coolant flow and temperature controlSystem Architecture OverviewVertical CDMs Increased server density per rack Enhanced serviceability&maintenance Front I/O for cold aisle access Front NIC cabling;rear liquid cooling

6、 and power cablesMechanical and fluidic interface complexity in dense racksLeak prevention and maintenance accessibilityMonitoring coolant quality,temperature,and flow in real timeIntegration ChallengesRack Level leakage MechanismFactory-tested hose kits with pre-installed sealsLeak detection sensor

面向大规模人工智能集群的液冷解决方案.pdf

相关报告