1、Samantika SuryLarry Kaplan,Matt Turner,Eric Borch,Bob Wisniewski,Aalap Tripathy,Tushar Krishna*Hewlett Packard Enterprise*Georgia Institute of TechnologyEnabling Macroheterogeneity through a“System-of-Systems”ApproachSERVER:AI HW SW CO-DESIGN/NIC/HPCMacroheterogeneity enables building a system-of-sy
2、stemsData center macroheterogeneityContains many loosely-coupled systemsHPC,AI,general cloud,private cloud,and I/O as separate components with estate-like fabric across data centerCloud-like technologies for multi-tenancy,virtualization,and containerizationSystem macroheterogeneitySingle system with
3、 main compute partition and specialized partitions tightly connected by a high-performance data and network fabric Couple mod-sim with AI and/or quantum accelerators to enable workflows and provide a pathway to extensible and modular systemsHigh-performance fabric as a unified network across system
4、partitionsMacroheterogeneity Main system of many nodesSpecialized accelerator(s)CPU partition for non-accelerated codesData partitionUEC based network connects partitionsSystemPrivate/Science CloudData centerPotential future system architecture framework for Hybrid HPC:HPC+AI+QuantumSW and HW define
5、d partitions that work together as one systemProvide a way to integrate new architecture at scale or run workflow across the systemTight coupling of partitions so data can be shared without needing to write to diskShared system,network(e.g.HPE Slingshot),and resource managementChallenge is to enable
6、 a tightly-coupled system of systems with high utilizationGeneral use case categories(details later):Inner kernels being replaced by surrogates with inference queries within mod/simAI steering of mod/sim:often in an ensemble environment;need to couple and sync a large number of jobsUse mod/sim to ge