《第一资本:从混沌测试到持续验证.pdf》由会员分享,可在线阅读,更多相关《第一资本:从混沌测试到持续验证.pdf(17页珍藏版)》请在三个皮匠报告上搜索。
1、 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.SPS328Fail Smarter:How Capital One Builds Cloud ResilienceTroy Koss(He/Him)Director,Reliability EngineeringCapital OneSheng Liao(She/Her)Sr.Enterprise Support Manage
2、rAWS 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.AgendaWhy continuous verification mattersCapital Ones transformation journeyAutomated reliability verification frameworkScale chaos engineering and measure outcomes 2025,Amazon Web Services,Inc.or its affiliates.All rights reser
3、ved.Chaos engineering proactively builds confidence by revealing hidden system weaknesses4-Observe&Learn From Outcomes Empirically understand if your hypothesis is correct1-Specify Starting PointHow is app performing at uninterrupted state-use metrics like SLOs2-State Your Assertions&HypothesisHow d
4、o you expect your system to handle failures or issues3-Execute Experiment(Chaos)To confirm your assertionsFix Identified Gaps&Expand Experiment Scope 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Resilience degrades in the gaps between periodic manual testsIn complex distributed
5、 systems,Entropy is a constant force.Configurations drift,dependencies update,and code changes daily.PredictabilityTime 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Change Happens(new or repeat gap)Time between Gamedays and point-in-time resilience testing show a need for chang
6、e Resilience TestGap Identified&Fix AppliedNext Resilience TestUnidentified Time 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.We had to fundamentally change how teams managed their reliability 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.There is a need fo