1、Kubernetes Was Built for Service-Resource Orchestration,MaaS Changes Everything.阿里云百炼平台技术负责人通义实验室系统工程团队负责人2025/11/15于文渊Kubernetess missionThe de-facto platform for service orchestrate over uniform resourcesKubernetes Original MissionControl-plane orchestration over uniform compute(Pods,Nodes,Deploym
2、ents)Predictable,long-running web or microservice workloads.Kubernetes Core AbstractionsPod,Deployment,ReplicaSet,Service,Ingress,Autoscaler.Scheduler optimizes placement.HPA manages scale by metrics.Targets:Design for manage web services/microservices(from Borg)Predictable workloads and uniform res
3、ource profiles(mainly HTTP/gRPC requests)Typical web services on KubernetesManage and orchestrate services over uniform resourcesIngress:Accept incoming requestsService:Route requests to podsPod Chain&Orchestration:An application is a service meshLoad-balancing,autoscaling,blue-green rollout control
4、-plane-centric.What makes MaaS different?The AI infra is more complex than web service infraVarious modelsLLMs,VLMs,AIGC ModelsThousands of models in different sizesDiverse resourcesDifferent GPUs and acceleratorsoptimized for different model typesComplex inference infrastructure:Cascade pipeline fo
5、r efficient execution;State(KV-Cache)persistence and session affinity.K8s Ecosystem DevelopmentsCommunity attempts for serving GenAI on K8sWG ServingDiscuss and enhance the support of inference serving in Kubernetes,including AutoScaling,MultiHost/MultiNode,orchestration and device resource manageme
6、nt.AIBrixProvides a Kubernetes CRD for GenAI model serving,encapsulating complexities in ML deployments on Kuberentes.llm-d&Gateway API Inference ExtensionInference Gateway,request scheduler,routing,and metrics about performance,availability and capabilities.Integrating vLLM as the default model inf