《SONiC 不稳定案例管理.pdf》由会员分享,可在线阅读,更多相关《SONiC 不稳定案例管理.pdf(22页珍藏版)》请在三个皮匠报告上搜索。
1、Microsoft SONiCSONiC Flaky Case ManagementAgendaPain points of issue triageSolutionFlaky Case ManagementSONiC Nightly Test ChallengesInfra 200 testbeds55 Topologies70 HardwareSKUs3000 test suitesNightly Run250 nightly pipelines 250,000 test cases weekly run 40,000 tests failure per week400 tasks in
2、last monthWhat we qualify 490 images in past 6 months202505,12 images202411,30 Images202405,25 ImagesMaster,80 imagesInternal,210 images2300 commits in sonic-buildimage repolast yearChallenges Manual triage bottleneck Fragmented data sources Limited early detection Repetitive inquiries Sonic Test Ci
3、rcleNightly TestPipelines run for many platformsAnalyzerCollect failures/analyze/upload IcMGenevaInternal tool to generate IcMCreation toolTransform IcM to work item,support auto assignmentNightly guardPeople who monitor IcM and do triage and decide to create work itemTriage meetingReview if the wor
4、k items are valid and assignment is correct,collect feedbackNightly Hawk Software InfrastructureLegacyNightly test is triggered by traditional pipelineNightly test is triggered by ElasticTest but disable retry mechanismConsistentNightly test is triggered by ElasticTest with enable retry mechanismAll
5、 failed for all retry attempts in one runFlakyNightly test is triggered by ElasticTest with retry mechanismIn one run,some attempts are success,some are failure/errorCommon summaryWith common but no meaningful summary(e.g AssertionError),not be able to aggregateAlready fixed most of those summaries,
6、standardize assertionFailure TypesData CollectionClassificationDifferent type different diagnosisCalculationCalculate pass rate on different levelGranularity FunnelingSurface issue at accurate levelDeduplicationReduce noise and be accurateAnalyzer Automation WorkflowBased on pass rate calculationFai