1、 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.A N T 3 3 6Enterprise-scale ETL optimization for Apache SparkGiovanni Matteo Fumarola(he/him)Sr.Manager,Software Dev.AWSVivek Shrivastava(he/him)Principal Delivery A
2、rchitect,ProServeAWS 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.AgendaIntroductionETL Common ChallengesSpark on AWSSpark on AWS SecuritySpark on AWS UnifiedSpark on AWS PerformanceConclusion 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Todays businesses
3、move fast.Data Processing shouldnt slow you down.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.AmazonEMRAmazonAthenaAWSGlueSpark Based ETL Landscape in AWSAWSLakeFormationAWSLakeFormationGlueDataCatalogAmazon S3AWS ServiceComponentAmazon S3Storage LayerAWS Glue Data CatalogMetad
4、ata ManagementAWS Lake FormationAccess ControlAmazon AthenaInteractive AnalysisAmazon EMRData ProcessingAWS GlueData Processing 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Manual enforcement Fragmented data Data leaksInconsistent Governance&Control Metadata bottlenecks&ineffic
5、ient reads/writes Redundant data scans Complex schema evolution&encryptionInefficient Data Access and Performance Versioning Problems Compatibility issues Duplication of logic and maintenance overheadChallenges in Todays ETLFragmented Spark Ecosystem Across Services 2025,Amazon Web Services,Inc.or i
6、ts affiliates.All rights reserved.What if your jobs ran faster,read smarter,and wrote efficiently to S3?Optimized Performance with S3What if one Spark engine powered Glue,EMR and Athena alike?Unified Spark EngineGovernance Built Into SparkImagine an ETL Platform thats Secure,Unified,and FastWhat if