当前位置:首页 > 报告详情

人工智能:推动超越边界-记忆是关键因素.pdf

上传人: a****d 编号:185017 2024-10-07 20页 1.60MB

1、1|2024 SNIA.All Rights Reserved.AI:Pushing Infra boundariesMemory is a key factorPresented byManoj Wadekar,MetaAI Systems Technologist2|2024 SNIA.All Rights Reserved.Meta Community Statisticspeople use at least one of Meta services monthly,approximately3.98BFamily Daily active users3.19BRef:Meta 4Q2

2、3 Results3|2024 SNIA.All Rights Reserved.Ranking and Recommendations Personalized Recommendations Deep Learning Recommendation Models(DLRM)Training and Inference Generative AI:Large Language Models and more Llama2 Open access to LLMs for research and commercial use Training and Inference(Prefill and

3、 Decode)AI Use Cases at Meta4|2024 SNIA.All Rights Reserved.AI Challenging DC Infra5|2024 SNIA.All Rights Reserved.AI needs for DC Infra CPU-centric Scale-out applications Millions of small stateless applications Failure handling through redundancy Scale performance through large number of nodes Acc

4、elerator-centric AI Apps AI job spread across 1000s of GPUs Failure penalty of large job restart Performance scaling depends on all the components in the cluster(GPU/Accel,memory,network.)6|2024 SNIA.All Rights Reserved.AI Jobs:Scaling the performanceGPU4GPU12GPU20GPU28GPU0GPU8GPU16GPU24Pipeline Par

5、allelTensor/Context ParallelData Parallel7|2024 SNIA.All Rights Reserved.AI Jobs:Scaling the performanceGPU4GPU12GPU20GPU28GPU0GPU8GPU16GPU24Pipeline ParallelTensor/Context ParallelData ParallelScale-Out Network(High Bandwidth)Scale-Up Network(Highest Bandwidth,lowest latency)8|2024 SNIA.All Rights

6、Reserved.Diversity of AI system requirements Difficult to serve all classes of models with a single system design point AI use cases are pushing all the design points through software/hardware co-design Need for innovation in all the design points:compute,network,memory,packaging,connectivity,coolin

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文介绍了Meta AI在推动基础设施边界方面的最新进展。主要内容包括: 1. Meta服务的月活跃用户数约为3.98亿,日活跃用户数为3.19亿。 2. Meta AI在数据中心基础设施方面的挑战,包括CPU-centric Scale-out应用、加速器中心AI应用等。 3. AI作业的性能扩展需求,例如GPU数量从4个增加到28个。 4. 内存需求的增长,模型大小增加,推动内存容量和带宽需求。 5. 内存技术的创新,如集成内存创新和层次化内存设计。 6. 内存扩展方案,包括节点本地内存、内存扩展卡、以及通过主机CPU的内存控制器连接的二级内存等。 本文呼吁在内存技术、系统架构、软件和硬件协同设计、互联技术等方面进行创新,以满足AI不断推动的边界需求。
"AI如何推动内存技术发展?" "Meta如何应对AI带来的内存挑战?" "未来AI内存技术的发展趋势是什么?"
客服
商务合作
小程序
服务号
折叠