《深入了解 Apache Spark 3.5 的新功能.pdf》由会员分享,可在线阅读,更多相关《深入了解 Apache Spark 3.5 的新功能.pdf(70页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reserved1Explore the New Functionality of Apache Spark 3.5Data+AI Summit 20241Daniel Tenedoriodtenedor2024 Databricks Inc.All rights reserved2Transforming and Querying Data for Everyone!2024 Databricks Inc.All rights reserved3100+Data Sources1+BillionAnnual Downloads10
2、0K+Stack Overflow Questions41K+Commits3700+GitHub Contributors2024 Databricks Inc.All rights reserved4Still#1 in developer activity for over ten years!3,700 contributors,41,000 commits2024 Databricks Inc.All rights reserved52024 Databricks Inc.All rights reservedAbout UsDaniel Tenedorio(GitHub:dtene
3、dor)Wenchen Fan(GitHub:cloud-fan)Xiao Li(GitHub:gatorsmile)6The Spark team at 2024 Databricks Inc.All rights reserved7AgendaSpark Connect Deploy and update Spark clusters independently from their clients SQL FeaturesHyperLogLog aggregates based on Apache Datasketches,array manipulation functions,IDE
4、NTIFIER clause,and morePySpark FeaturesArrow-optimized Python UDFs,Python UDTFs,new testing API,improved error messages,and moreSpark StreamingSupport multiple stateful operators,checkpointing for RocksDB state store,dropDuplicatesWithinWatermark2024 Databricks Inc.All rights reserved8 8Spark Connec
5、t2024 Databricks Inc.All rights reservedHow to embed Spark in applications?Up until Spark Connect:Hard to support todays developer experience requirementsApplicationsIDEs/NotebooksProgramming Languages/SDKsNo JVM InterOpClose to REPLSQL onlySparks Monolith DriverApplication LogicAnalyzerOptimizerSch
6、edulerDistributed Execution EngineModern data application2024 Databricks Inc.All rights reservedSpark Connect General AvailabilityThin client,with full power of Apache SparkSpark Connect Client APISparks DriverApplication GatewayAnalyzerOptimizerSchedulerDistributed Execution EngineApplicationsIDEs/