当前位置：首页 > 报告详情

使用 Ray 扩展批量推理.pdf

上传人：竿*** 编号：981499 2025-11-29 PDF PDF 36页 3.19MB

该报告所属合集： 2024年旧金山QCon大会（QCon San Francisco 2024）嘉宾演讲PPT合集

打包下载报告合集

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/36

立即下载

《使用 Ray 扩展批量推理.pdf》由会员分享，可在线阅读，更多相关《使用 Ray 扩展批量推理.pdf（36页珍藏版）》请在三个皮匠报告上搜索。

1、Scale Out Batch Inference with RayCody Yu,Staff Software EngineerHello Cody YuTech Lead,LLM Performance AnyscalevLLM/SGLang/Apache TVM committerex-Founding Engineer BosonAIex-Senior Applied Scientist AWS AIPhD,Computer Science,UCLA 19We are in a GenAI eraImages generated by OpenAI GPT-4oBatch infere

2、nce is getting high demandMulti-ModalityData SourcesCameraMicPDFSensorTabularTextAudioImageVideoStructured and Unstructured Multi-Modality DataEmbedding ModelsLarge Language ModelsVector DBModel TrainingClassificationKnowledge RetrievalRead(CPU)Pre-process(CPU)Model(GPU)Post-process(CPU)Cloud Storag

3、eChallenges with Batch InferenceScale:Large data scale(100s of GBs,TBs,or more)Reliability:Spot+On demand InstancesCompute:Multi stage+Heterogeneous ComputeFlexibility:Bring any OSS Model&CustomizeSLAs:Focus on high throughput/low cost vs low latencyMulti Layer ApproachRay CoreA scalable AI compute

4、engineRay DataAn efficient and scalable data processing pipeline on RayLLM Inference Engine power by Open Source vLLMThe most popular open source LLM inference frameworkRayScalable AI Compute EngineRay OverviewRay(Distributed)Libraries(Core):A general-purpose distributed execution layerRay Tune/Trai

5、n:TrainingRLlib:Reinforcement learningRay Serve:Online inferenceRay Data:Data processingRemote functions(tasks)and classes(actors)Head nodeWorkerWorker nodeWorkerRayletWorkerWorker nodeWorkerWorkerDriverGlobal ControlService(GCS)RayletRayletDashboard serverPushing Scalability to 1000s of NodesHead n

6、odeWorkerWorker nodeWorkerRayletWorkerWorker nodeWorkerWorkerDriverGlobal ControlService(GCS)RayletRayletDashboard serverPushing Scalability to 1000s of NodesGlobal ControlService(GCS)Used for:Actor schedulingPlacement group schedulingNode resource viewsHead nodeWorkerWorker nodeWorkerRayletWorkerWo

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

根据报告的内容，全文主要内容概括如下： - **大规模批推理需求增长**：随着多模态数据源和大型语言模型（LLM）的兴起，批推理需求日益增加。 - **批推理挑战**：包括大规模数据、可靠性、计算异构性、灵活性和SLA要求。 - **Ray平台**：提供可扩展的AI计算引擎和数据处理管道，支持大规模批推理。 - **Ray Data**：优化数据预处理，减少GPU闲置时间，提高资源利用率。 - **vLLM**：最受欢迎的LLM推理框架，支持高吞吐量。 - **RayLLM-Batch**：Anyscale库，用于大规模、成本优化的LLM批推理。 - **案例研究**：展示RayLLM-Batch在处理大规模数据集时的性能和成本效益。 - **关键点**： - 支持数千节点。 - 提高LLM批推理的吞吐量。 - 降低成本。

"LLM批量推理挑战" "RayData如何扩展批量推理？" "揭秘vLLM在LLM推理中的优势"

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询商务合作事宜

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙思想领动信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备2023027541号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠