《超越被动式扩展:优化 Amazon EKS 的成本和性能(由 ScaleOps 赞助).pdf》由会员分享,可在线阅读,更多相关《超越被动式扩展:优化 Amazon EKS 的成本和性能(由 ScaleOps 赞助).pdf(45页珍藏版)》请在三个皮匠报告上搜索。
1、 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.COP369-SSteven FeltnerPrincipal Architect at ScaleOpsBeyond Reactive Scaling:Optimizing AWS EKS Cost and PerformanceAgendaThe Impossible TriangleKubernetes Resource Management FoundationsContainer Rightsizing(CPU,Memory,GPU,Storage
2、etc.)Horizontal Scaling Node Scaling&Spot adoptionCRDs&Policies for ScaleA Holistic SolutionThe Impossible TriangleReliabilityPerformanceCost Efficiency10 pods-Manual tuning,achieve all 3 Performance&reliability:Cost Cost efficiency:Reliability&Performance1000 pods-waste lowers efficiency5000 pods-F
3、ragmentation lowers efficiencyScale Reality:Request vs UsageScale Reality:Node FragmentationWasted budget or degraded performance?The Scaling DilemmaReactive autoscaling forces you to pick oneKubernetes Resource Management Foundations Requests:Scheduler uses to schedule the pod to a node with suffic
4、ient capacityLimits:limits are applied by the kubelet and are enforced by the kernel using cgroups.-CPU limits are enforced by CPU throttling-Memory limits are enforced reactively and terminations(out of memory-OOM)only when the kernel detects memory pressureTogether:Determine QoS class and preempti
5、onQoS ClassesGuaranteedrequests=limits(evicted last)BestEffortno requests/limits(evicted first)Burstablerequests CPUBatch:Bursty,time-constrained GPU:Tensor cores CPUVPA Rightsizing Challenges at ScaleVPA performance has not been tested in large clusters.Safe Descheduling&Re-PackingVPA eviction(rest
6、arts pod,scheduler re-places)re-packing opportunityDeployment rollouts(natural pod recreation)re-packing opportunityDescheduler runs(identifies fragmentation)eviction API re-packingThe Safety Layer:PDBs(prevent mass evictions)Safe-to-evict annotations(protect system pods)Grateful shutdown(SIGTERM gr