1、High-Resolution PlatformObservabilityBrian Martin(he/they)IOP Systems Martin(he/they)performance and optimizationOSS projects(Rezolus,Pelikan,RPC-Perf,)IOP Systems,ex-Twitter,ex- of ObservabilitylogsmetricstracingPlatform Observabilitylogs,metrics,and traceshealth,utilization,and performancevisualiz
2、e,analyze,and actionTypes of MetricsgaugescountersdistributionsGaugesinstantaneousreadingmay increase ordecreasehave no historyCountersintegrated readingmonotonically non-decreasingcapture historyDistributionsobserved propertysummary statisticssketches/histogramsMetric Usagehealthutilizationperforma
3、nceHealthMetricsSad ServerStorythe database is slowone backend hashigh-latencytra?c is normalSad ServerStorywrites slowSSD worn-outSad ServerStory?x metrics?eetwide reportremediationSad SwitchStorydata corruptionchecksum errorsa few racksSad SwitchStoryfaulty switches?reproducebuggy kernelSad Switch
4、Storyrack level metricsalertingkernel?xHealthMetricscoverageaggregationUtilizationMetrics%CPU Utilization%CPU Utilization%CPU UtilizationUtilizationMetricssampling intervalaggregated vsdisaggregatedPerformance MetricsPerformance Metric Dangershistogramssummary metricsbucket widthPlatform Metric Sour
5、cesHardware PerformanceCountersinstructions and cyclescache hits/missesfrequencyEnergy MonitoringRAPLNVMLEnhancedBerkeleyPacket FilterSEC(raw_tp/block_rq_complete)1 2 int BPF_PROG(3 block_rq_complete,4 struct request*rq,5 int error,6 unsigned int nr_bytes7 )8 .9 Other eBPF Samplersblock iosyscallstcp packet latencyeBPF Program Costcounter incrementshash lookupseBPF Performance Tricksuse arraysmake tables mmap()-ableuse plain arrays!The future is eBPFlower overheadbetter resolutionhistogramsThank You!Rezolus-Brian Martin(he/they)IOP Systems