当前位置:首页 > 报告详情

07-goodput-shi-ni-suo-xu-de-yi-qie-ji-yu-slo-de-llm-tui-li-fu-wu-ji-zhun-ce-shi-zheng-yu-chen-.pptx

上传人: d*** 编号:1035853 2026-01-04 38页 11.93MB

1、闪电演讲Goodput 是你所需的一切基于 SLO 的 LLM 推理服务基准测试Lightning Talk:Goodput is All You Need:SLO-based LLM Inference Benchmarking,软件工程师,2025/11/15,郑宇宸,Yu-Chen Cheng,Software Engineer,郑宇宸 Yu-Chen ChengSoftware Engineer,Focus on building AI/LLM infrastructure based on KubernetesPreviously Heywhale WizardQuantIncomi

2、ng SDE Tencent Cloud CSIG TI-ONE,CONTENT,目录,01,Background,02,LLM Inference Benchmarking,03,Takeaways,BackgroundLLM Inference and Benchmarking,LLM Inference,(Source:https:/huggingface.co/blog/tngtech/llm-performance-prefill-decode-concurrent-requests),Prefill:Computation-intensive,Decode:Memory-inten

3、sive,LLM Inference Stack on Kubernetes,LLM Inference Stack on Kubernetes,Inference,Parameter TuningTensor ParallelismData ParallelismExpert Parallelism,FrameworksvLLMSGLangTersorRT LLM,OptimizationSpeculative DecodingFlash Attention,LLM Inference Stack on Kubernetes,KV Cache Management,KV Cache Offl

4、oading,KV Cache Sharing,LLM Inference Stack on Kubernetes,Orchestration,Routing and Load BalancingPrefix-Aware RoutingFairness RoutingSLO-Aware Routing,PD Disaggregation,Autoscaling,LLM Inference Metrics,TTFTTime to First Token.Initial response time when the first output token is generated.TTFT=time

5、_at_first_token-start_time,TPOTTime Per Output Token.The average time between two subsequent generated tokens.TPOT=(e2e_latency-TTFT)/(num_output_tokens-1),(Source:https:/,Service Level Objective(SLO),(Source:https:/,Chat ApplicationsP99 TTFT 200msP99 ITL 50ms,Retrieval-Augmented Generation(RAG)P99

6、TTFT 300msP99 ITL 100msP99 Request Latency 3s,Offline BatchMax Throughput 100 RPS,Code CompletionP99 Request Latency 2s,Pareto Frontier+SLO Attainment,Example:LLM Performance Explorer by BentoML,(Source:https:/https:/,SLO Violation,What is Goodput?,Goodput measures the number of completed requests t

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
客服
商务合作
小程序
服务号
折叠