1、Beyond Durability:Enhancing Database Resilience and reducing the entropy Using Write-Ahead Logging at NetflixPrudhviraj Karumanchi&Vidhya ArvindNovember 19th,2024QConSF 2024Vidhya ArvindStaff Software EngineerData Platform NetflixPrudhviraj KarumanchiStaff Software EngineerData Platform NetflixThe d
2、ay we got lucky At 9AM on a regular weekday Table Schema Change One system error Result:Data Corrupt Reads are failing After data restore,we need to replay writes Phew!(Caches)(Dual writes to Kafka)That was a lucky save.Its not that simple,is it?Reality Check What if we werent lucky enough to have a
3、 solution?How do we guarantee protection for Critical applications How do we prepare for unknown failure modesAnd the next outage wont wait for luck.Scale amplifies every challenge.At Netflix ScaleData Reliability Challenges at Scale Production incident Data integrity Need bespoke solution Costed te
4、ams time and moneyAgenda Introduction Data Reliability Challenges WAL Architecture How WAL Addresses Challenges Failure domainsAgenda Introduction Data Reliability Challenges WAL Architecture How WAL Addresses Challenges Failure domainsNetflix Architecture-10,000 ft viewNetflix Architecture-10,000 f
5、t viewNetflix Architecture-10,000 ft viewNetflix Architecture-10,000 ft viewNetflix Architecture-10,000 ft viewNetflix Architecture-10,000 ft viewCloser look at stateful systemsAgenda Introduction Data Reliability Challenges WAL Architecture How WAL Addresses Challenges Failure domains Accidental da
6、ta loss/corruption System entropy Multi-partition mutations Data replicationClientMicroServiceDBWritesDurable&VisibleDurable:Indicates that the data is persistedVisible:Indicates that the newly written data is visible in the read pathClientMicroServiceDBWritesRetries and back-off does not work becau