当前位置:首页 > 报告详情

使用 Ray 扩展批量推理.pdf

上传人: 竿*** 编号:981499 2025-11-29 36页 3.19MB

1、Scale Out Batch Inference with RayCody Yu,Staff Software EngineerHello Cody YuTech Lead,LLM Performance AnyscalevLLM/SGLang/Apache TVM committerex-Founding Engineer BosonAIex-Senior Applied Scientist AWS AIPhD,Computer Science,UCLA 19We are in a GenAI eraImages generated by OpenAI GPT-4oBatch infere

2、nce is getting high demandMulti-ModalityData SourcesCameraMicPDFSensorTabularTextAudioImageVideoStructured and Unstructured Multi-Modality DataEmbedding ModelsLarge Language ModelsVector DBModel TrainingClassificationKnowledge RetrievalRead(CPU)Pre-process(CPU)Model(GPU)Post-process(CPU)Cloud Storag

3、eChallenges with Batch InferenceScale:Large data scale(100s of GBs,TBs,or more)Reliability:Spot+On demand InstancesCompute:Multi stage+Heterogeneous ComputeFlexibility:Bring any OSS Model&CustomizeSLAs:Focus on high throughput/low cost vs low latencyMulti Layer ApproachRay CoreA scalable AI compute

4、engineRay DataAn efficient and scalable data processing pipeline on RayLLM Inference Engine power by Open Source vLLMThe most popular open source LLM inference frameworkRayScalable AI Compute EngineRay OverviewRay(Distributed)Libraries(Core):A general-purpose distributed execution layerRay Tune/Trai

5、n:TrainingRLlib:Reinforcement learningRay Serve:Online inferenceRay Data:Data processingRemote functions(tasks)and classes(actors)Head nodeWorkerWorker nodeWorkerRayletWorkerWorker nodeWorkerWorkerDriverGlobal ControlService(GCS)RayletRayletDashboard serverPushing Scalability to 1000s of NodesHead n

6、odeWorkerWorker nodeWorkerRayletWorkerWorker nodeWorkerWorkerDriverGlobal ControlService(GCS)RayletRayletDashboard serverPushing Scalability to 1000s of NodesGlobal ControlService(GCS)Used for:Actor schedulingPlacement group schedulingNode resource viewsHead nodeWorkerWorker nodeWorkerRayletWorkerWo

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,全文主要内容概括如下: - **大规模批推理需求增长**:随着多模态数据源和大型语言模型(LLM)的兴起,批推理需求日益增加。 - **批推理挑战**:包括大规模数据、可靠性、计算异构性、灵活性和SLA要求。 - **Ray平台**:提供可扩展的AI计算引擎和数据处理管道,支持大规模批推理。 - **Ray Data**:优化数据预处理,减少GPU闲置时间,提高资源利用率。 - **vLLM**:最受欢迎的LLM推理框架,支持高吞吐量。 - **RayLLM-Batch**:Anyscale库,用于大规模、成本优化的LLM批推理。 - **案例研究**:展示RayLLM-Batch在处理大规模数据集时的性能和成本效益。 - **关键点**: - 支持数千节点。 - 提高LLM批推理的吞吐量。 - 降低成本。
"LLM批量推理挑战" "RayData如何扩展批量推理?" "揭秘vLLM在LLM推理中的优势"
客服
商务合作
小程序
服务号
折叠