1、Shashank Neelam,GoogleGreg Boudreau,CiscoDevice Local DiagnosisDevice Local DiagnosisShashank Neelam,GoogleGreg Boudreau,CiscoSONiC Workshop1.Overview2.Vendor Defined Rules/Schema3.NOS On-Device Monitor(SONiC)4.OpenConfig TelemetryCurrent State of Health Monitoring on the SwitchChallengeDescriptionL
2、imited Remediation&SpecificityCurrent monitoring for common FRUs(e.g.,PSUs,fans via SONiC psud,thermalctld),often lacks automated remediation steps and fails to monitor for specific,nuanced hardware failure scenarios.Inflexible Signal HandlingCurrent systems are often unable to interpret unique sign
3、als or anomalous system behaviors from components,limiting proactive fault detection.Over-reliance on Log ParsingCritical system health reporting is heavily dependent on parsing logs,which is a reactive,rather than proactive,approach to identifying failures.Incomplete Component CoverageMonitoring is
4、 frequently missing for smaller,less common,but still critical,components on a device,creating blind spots in overall system health.Issues for Network OperatorsA generic framework for hardware vendors to define discrete rules for monitoring and managing hardware health and failures respectively with
5、 the following characteristics:Supports granularity and flexibility to cover a wide range of sourcesDesigned to be structured for supporting multiple underlying HW platforms,SW versions,and HW versionsGeneric inputs and outputs between system and operator regardless of underlying software.What is de
6、vice local diagnosis?StandardizedWhy use device local diagnosis?Rapid Failure ResponseMinimizes lag time between detection of failure and beginning of recovery logic(both locally and from remote sources)Vendor Defined Device IntelligenceFully defined by HW vendor w/best insight into underlying behav