当前位置：首页 > 报告详情

3323 - 使用 vLLM 和 Red Hat AI 优化大型语言模型以进行推理.pdf

上传人：竿*** 编号：982933 2025-11-29 PDF PDF 27页 1.97MB

该报告所属合集： IBM TechXchange 2025嘉宾演讲PPT合集-杂项

打包下载报告合集

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/27

立即下载

《3323 - 使用 vLLM 和 Red Hat AI 优化大型语言模型以进行推理.pdf》由会员分享，可在线阅读，更多相关《3323 - 使用 vLLM 和 Red Hat AI 优化大型语言模型以进行推理.pdf（27页珍藏版）》请在三个皮匠报告上搜索。

1、Orlando,FLOctober 69IBM TechXchange 2025Session code 3323Carlos Condado,Sr.Product Marketing ManagerChristopher Nuland,Principal Technical Marketing ManagerRed HatOptimizing LLMs for Inference with vLLM and Red Hat AIWhat you will learn in this session0102030405Inference optimization principlesMaxim

2、ize performance and cut costsToken-based distributed inference predictable performanceTrack and meet inference SLOsRed Hat AI:open,enterprise AI platformIBM TechXchange|2025 IBM CorporationGenerative AI is transforming industries but inference-related processes increase complexities and costsIBM Tec

3、hXchange|2025 IBM Corporation3The AI promise vs.The operational realityIBM TechXchange|2025 IBM Corporation4The ripple effect across your teams and businessSlow innovationMissed revenue opportunitiesHigh costsManaging siloed solutionsLimited scalabilityDeployment frictionUnderperforming modelsUnreli

4、able experienceThe orchestrators and builders of AI apps and agents5Inference optimization principlesInference optimization principlesHigh-performant inference runtimeQuantized modelsFast and cost-effective inference6NeuronTPUGaudiInstinctGPULlamaQwenDeepSeekGemmaMistralMolmoPhiNemotronGraniteSpyrev

5、LLM is the inference runtime for the hybrid cloudEdgePrivate CloudPhysicalVirtual Public Cloud7OpenAI introduced gpt-ossOn Aug 5th,2025 vLLM had Day 0 support for gpt-oss,on NVIDIA&AMD GPUs8Meta introduced Llama 4On April 5th,2025 1.vLLM had Day 0 support for llama 4&2.Meta quantized the FP8 version

6、 using Red Hats open source LLM Compressor9vLLM is the inference runtime for the hybrid cloudNative Hugging Face integrationSimple APIs for online and offline inferenceOpenAI-compatible API protocolAdvanced algorithms for high QPS servingSingle server/GPU to distributed/multi GPUKV cache optimizatio

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

根据《Optimizing LLMs for Inference with vLLM and Red Hat AI》的内容，以下是全文关键点的概括： 1. **LLM推理优化**：通过vLLM和Red Hat AI平台，优化大型语言模型（LLM）的推理性能，降低成本。 2. **性能提升**：vLLM支持OpenAI的gpt-oss和Meta的Llama 4，实现快速、高效的推理。 3. **模型压缩**：使用Red Hat的LLM Compressor，通过量化模型减少内存和计算需求，提高效率。 4. **分布式推理**：支持基于token的分布式推理，优化性能和成本。 5. **Red Hat AI平台**：提供开放、企业级的AI平台，支持混合云环境。 6. **社区贡献**：Red Hat与UC Berkeley等机构合作，推动vLLM和LLM Compressor的发展。 7. **模型优化**：提供多种优化后的模型，如Llama 3.1 8B、70B和70B-FP8，平衡性能和成本。 8. **AI代理构建**：Red Hat AI支持使用Llama Stack构建AI代理，并集成到OpenShift AI中。 9. **可扩展性**：通过Kubernetes和vLLM，实现AI工作负载的动态扩展和资源管理。

揭秘性能提升秘诀" 如何降低AI成本？" 构建高效AI应用的利器"

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询商务合作事宜

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙思想领动信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备2023027541号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠