1、Efficient Incremental Processing with Netflix Maestro and Apache IcebergNovember 19,2024,QConf San Francisco 2024Jun He Netflix2OutlineEfficient Incremental Processing with Netflix Maestro and Apache Iceberg01Introduction02Architectural design03Use cases&examples04Takeaways&future improvements3Effic
2、ient Incremental Processing with Netflix Maestro and Apache IcebergIntroduction4Efficient Incremental Processing with Netflix Maestro and Apache IcebergIntroductionLandscape of data insights at Netflix5Efficient Incremental Processing with Netflix Maestro and Apache IcebergData for Business NeedsExi
3、sting and new business initiativesStreamingGamesAdsLive6Efficient Incremental Processing with Netflix Maestro and Apache IcebergCommon ProblemsData AccuracyData FreshnessCost EfficiencyExabyte data warehouseBusiness needs for new initiativesMore than$150M per year7Efficient Incremental Processing wi
4、th Netflix Maestro and Apache IcebergLate Arriving DataKey challengeEvent timeProcessing timeTable Partition10:20PM8:20AMhour=22hour=8Late arriving event8Efficient Incremental Processing with Netflix Maestro and Apache IcebergBig Data Analytics PlatformBDAP tech stackand other BDAP internal services
5、9Efficient Incremental Processing with Netflix Maestro and Apache IcebergExisting SolutionsLookback windowIgnoring late arriving data Data accuracy Data freshness Cost efficiency Data accuracy Data freshness Cost efficiency10Efficient Incremental Processing with Netflix Maestro and Apache IcebergInc
6、remental ProcessingWhat is itIncremental processing is an approach to process data in batch but only on new or changed data.capturing incremental data changes tracking their states(i.e.whether a change is processed by a workflow or not).11Efficient Incremental Processing with Netflix Maestro and Apa