《HC2022.NVIDIA Grace.JonathonEvans.v5.pdf》由会员分享,可在线阅读,更多相关《HC2022.NVIDIA Grace.JonathonEvans.v5.pdf(20页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIA GRACEJONATHON EVANS-NVIDIA|HOT CHIPS 34NVIDIA GRACE NVIDIAs First Server CPU 72 Arm v9.0 cores SVE2 support Virtualization Extensions:Nested Virtualization,S-EL2 support RAS v1.1 GIC v4.1 SMMU v3.1 Built on TSMC 4N process nodeDatacenter ReadyNVIDIA GRACEDesigned from the Ground-Up to be a Su
2、perchipNVLINK-C2C Used to create the Grace Hopper,and Grace Superchips Removes the typical cross-socket bottlenecks Up to 900GB/s of raw bidirectional BW Same BW as GPU to GPU NVLINK on Hopper Low power interface-1.3 pJ/bit More than 5x more power efficient than PCIe Enables coherency for both Grace
3、 and Grace Hopper superchipsHigh Speed Chip to Chip Interconnect900 GB/sCPULPDDR5xCPULPDDR5xGRACECPUCPULPDDR5xCPULPDDR5xGRACECPUGRACE SUPERCHIP Targets Arm standards for off the shelf OS compatibility Arm Server Base System Architecture(SBSA)Arm Server Base Boot Requirements(SBBR)Arm Memory Partitio
4、ning and Monitoring(MPAM)Arm Performance Monitoring Units(PMUs)Standards Compliant Platform900 GB/sCPULPDDR5xCPULPDDR5xGRACECPUCPULPDDR5xCPULPDDR5xGRACECPUGRACE HOPPER Unified Memory with shared page tables Shared CPU and GPU virtual address space GPU access to pageable memory System allocator suppo
5、rt for GPU memory Yes,malloced and mmaped pointers!Native atomics,including standard C+atomic supportHeterogenous Coherency900 GB/sCPULPDDR5xGPUHBMHOPPERGPUGPUHBMCPULPDDR5xGRACECPUCPULPDDR5xGPUHBMGPUHBMCPULPDDR5xHOPPERGPUGRACECPUNVLINK-C2CSuperchip Scaling|CPU/GPU|Extended GPU MemoryEnables remote N
6、VLINK connected GPUs,to access Graces memory at native NVLINK speedsNVSWITCHGPUHBMCPULPDDR5xGRACECPUCPULPDDR5xGPUHBMHOPPERGPUNVIDIA GRACE NVIDIA fabric and distributed cache design 3,225.6 GB/s Bi-section BW Scalable to 72+cores 117MB of L3 cache Arm Memory Partitioning and Monitoring(MPAM)Supports