当前位置:首页 > 报告详情

左移以提高工程效率.pdf

上传人: 竿*** 编号:981544 2025-11-29 22页 600.63KB

1、Shifting Left for Better Engineering Efficiency-Migration stories along the wayYing DaiPrincipal Software Engineer Roblox IntroductionA little more about myselfWhat Ive done at RobloxMigrations:A trend to shift leftMetrics:reliability and productivity TestingProductionAgenda1.Background 2.The first

2、migration3.The second migration4.LearningsBackgroundI started at the Telemetry team at RobloxFast user growth!Exciting!Oncall,oncall&oncallThe worst part:Every Sev call,ppl would question Telemetry firstBad production reliability-Low Engineering ProductivityOur own DC.2K micro services.Billions Acti

3、ve Time Series.Take a closer look at the Telemetry problem In house Telemetry tool:from metrics collection&processing,to storage,to visualization Expensive.Inflexible.Slow.Inconsistent aggregation results with other standard tools Some teams have their own Telemetry setup with inconsistent metricsLe

4、ss than 99.8%availabilityWe need a better Telemetry solution!Raw Data CollectorTemp StorageProcessing K,VK,V storeStep 3:DeprecationRemove the old pipeline Claim victoryStep 1:New solutionBuy vs BuildGrafana Enterprise,VictoriaMetricsProductionizeStep 2:Transition Dual-write the underlying dataData

5、consistencyRegenerate alerts&dashboardsPlanOne quarter is enough!Reality Migration of basic tools is very hard.Technically:A lot of customizations were made to the in-house toolAnnotations,Player Globe View,latency differencesEngineers are used/attached to the old toolIt was used for almost 10 years

6、Even if we included a new link to every existing chart,engineers chose to stay with the old tool.If we force engineers to move,it would harm their productivityWhat happened thenInstead of one quarter,it took three quarters100%availability for multiple quarters ReliableMulti region,multi AZError isol

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,以下是全文主要内容的简明扼要概括: 1. **背景**:Roblox的快速用户增长导致生产可靠性低,工程生产力下降。 2. **迁移挑战**:从内部Telemetry工具迁移到Grafana和VictoriaMetrics,过程艰难,耗时三季。 3. **迁移成果**:实现100%可用性,多区域、多可用区部署,错误隔离和分片,支持3倍增长。 4. **改进措施**:自动化金丝雀测试,提高变更部署的可靠性。 5. **自动化金丝雀测试**:通过自动化测试和默认金丝雀实例,减少生产问题。 6. **未来计划**:继续左移,改进测试环境,支持持续交付。 核心数据: - 迁移耗时三季。 - 100%可用性。 - 3倍增长。 - 20%的事故由变更部署引起。
Roblox如何提升工程效率?" 从痛点到高效" Roblox如何实现左移?"
客服
商务合作
小程序
服务号
折叠