《开拓新领域:LinkedIn 早期押注 Flink Batch 以应对大规模工作负载.pdf》由会员分享,可在线阅读,更多相关《开拓新领域:LinkedIn 早期押注 Flink Batch 以应对大规模工作负载.pdf(33页珍藏版)》请在三个皮匠报告上搜索。
1、Charting New Territory:LinkedIns Early Bet onFlink Batch for Large-Scale WorkloadsVenkat Sowrirajan|Staff Engineer at LinkedInAbout MeVenkat Sowrirajan10+years building and scaling distributed data systemsUnifying batch+stream at LinkedIn with Apache FlinkPreviously,core contributor to Sparks Push-B
2、ased ShufflePassionate about solving complex infra challenges and contributing back to OSSTodays journey Problem Context Building a Unified Stream-Batch Platform Flink-Batch architecture overview Scaling a Real-World Ads Use Case Results,Summary and TakeawaysProblem Context0101SplitEcosystem&Tooling
3、Separate APIs&SemanticsInconsistent Data Semantics02OperationalOverheadDual InfraDebugging is Hard03Lack ofPipeline ReuseCode DuplicationNo Unified Testing or VersioningProblem Context:Same Logic,Twice the WorkVisionUnified Stream&Batch Platform02Vision:Unified Stream&Batch PlatformOperationsEasier
4、deploymentTestingWorkload ManagementStrategyOne Engine,All WorkloadsBenefitsConsistent Logic&ToolingWhy Flink Batch?0301UnifiedEngine02Efficient forLarge-Scale Data03Rich SQL&API EcosystemWhy Flink Batch?Key Components of Flink Batch platformComponentTechnologyEngineFlinkConnectorIceberg+Shuffle Ser
5、viceCelebornOrchestratorAirflowObservabilityFlink History Server+Flink Batch Architecture at LinkedIn AirflowBatch Control PlaneFlink ClientYarnResource ManagerApp MasterJob ManagerResource ManagerTMCeleborn ShuffleServiceFlink Batch on Yarn Application ModeHDFSFlink History ServerTMControl PlaneWor
6、kflowOrchestrationFlink SQLUDFConfUser App UserUse Case-Scaling Ads Model Training Data Generation04Use caseAds Model Training Data GenerationInner JoinLeft Outer JoinLeft Outer JoinImpressions(Medium)Scores(Large)Clicks(Small)Videos(Small)Ad TrainingData Reads/Writes-Iceberg tables Joins 4 tables I