04-kubernetes-was-built-for-service-resource-orchestration.-maas-changes-everything-yu-wen-yuan-.pdf-三个皮匠报告

1、Kubernetes Was Built for Service-Resource Orchestration，MaaS Changes Everything.阿里云百炼平台技术负责人通义实验室系统工程团队负责人2025/11/15于文渊Kubernetess missionThe de-facto platform for service orchestrate over uniform resourcesKubernetes Original MissionControl-plane orchestration over uniform compute(Pods,Nodes,Deploym

2、ents)Predictable,long-running web or microservice workloads.Kubernetes Core AbstractionsPod,Deployment,ReplicaSet,Service,Ingress,Autoscaler.Scheduler optimizes placement.HPA manages scale by metrics.Targets：Design for manage web services/microservices(from Borg)Predictable workloads and uniform res

3、ource profiles(mainly HTTP/gRPC requests)Typical web services on KubernetesManage and orchestrate services over uniform resourcesIngress：Accept incoming requestsService:Route requests to podsPod Chain&Orchestration：An application is a service meshLoad-balancing,autoscaling,blue-green rollout control

4、-plane-centric.What makes MaaS different?The AI infra is more complex than web service infraVarious modelsLLMs,VLMs,AIGC ModelsThousands of models in different sizesDiverse resourcesDifferent GPUs and acceleratorsoptimized for different model typesComplex inference infrastructure：Cascade pipeline fo

5、r efficient execution;State(KV-Cache)persistence and session affinity.K8s Ecosystem DevelopmentsCommunity attempts for serving GenAI on K8sWG ServingDiscuss and enhance the support of inference serving in Kubernetes,including AutoScaling,MultiHost/MultiNode,orchestration and device resource manageme

6、nt.AIBrixProvides a Kubernetes CRD for GenAI model serving,encapsulating complexities in ML deployments on Kuberentes.llm-d&Gateway API Inference ExtensionInference Gateway,request scheduler,routing,and metrics about performance,availability and capabilities.Integrating vLLM as the default model inf

04-kubernetes-was-built-for-service-resource-orchestration.-maas-changes-everything-yu-wen-yuan-.pdf

相关报告