当前位置:首页 > 报告详情

04-kubernetes-was-built-for-service-resource-orchestration.-maas-changes-everything-yu-wen-yuan-.pdf

上传人: d*** 编号:1035789 2026-01-04 15页 3.60MB

1、Kubernetes Was Built for Service-Resource Orchestration,MaaS Changes Everything.阿里云百炼平台技术负责人通义实验室系统工程团队负责人2025/11/15于文渊Kubernetess missionThe de-facto platform for service orchestrate over uniform resourcesKubernetes Original MissionControl-plane orchestration over uniform compute(Pods,Nodes,Deploym

2、ents)Predictable,long-running web or microservice workloads.Kubernetes Core AbstractionsPod,Deployment,ReplicaSet,Service,Ingress,Autoscaler.Scheduler optimizes placement.HPA manages scale by metrics.Targets:Design for manage web services/microservices(from Borg)Predictable workloads and uniform res

3、ource profiles(mainly HTTP/gRPC requests)Typical web services on KubernetesManage and orchestrate services over uniform resourcesIngress:Accept incoming requestsService:Route requests to podsPod Chain&Orchestration:An application is a service meshLoad-balancing,autoscaling,blue-green rollout control

4、-plane-centric.What makes MaaS different?The AI infra is more complex than web service infraVarious modelsLLMs,VLMs,AIGC ModelsThousands of models in different sizesDiverse resourcesDifferent GPUs and acceleratorsoptimized for different model typesComplex inference infrastructure:Cascade pipeline fo

5、r efficient execution;State(KV-Cache)persistence and session affinity.K8s Ecosystem DevelopmentsCommunity attempts for serving GenAI on K8sWG ServingDiscuss and enhance the support of inference serving in Kubernetes,including AutoScaling,MultiHost/MultiNode,orchestration and device resource manageme

6、nt.AIBrixProvides a Kubernetes CRD for GenAI model serving,encapsulating complexities in ML deployments on Kuberentes.llm-d&Gateway API Inference ExtensionInference Gateway,request scheduler,routing,and metrics about performance,availability and capabilities.Integrating vLLM as the default model inf

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
1. **Kubernetes定位**:原为统一资源(如Pods、Nodes)的服务编排平台,设计用于可预测的Web/微服务负载。 2. **MaaS挑战**:AI基础设施更复杂,需管理多样化模型(LLM、VLM等)、异构GPU资源及级联推理流水线,支持KVCache持久化。 3. **社区探索**:通过WG Serving、AIBrix、Gateway API等尝试在K8s上支持AI推理,但原生调度难以满足MaaS需求。 4. **阿里云实践**:全球最大MaaS提供商之一,部署数十万GPU、覆盖10+区域,每日处理数十亿请求及万亿Token。 5. **架构革新**:将部分控制平面逻辑(如路由、资源映射)下移至数据平面,实现请求级调度、KVCache感知及SLO驱动执行,提升 Serving Efficiency。
**MaaS如何革新K8s?** **AI网关如何优化调度?** **K8s能否支撑MaaS?**
客服
商务合作
小程序
服务号
折叠