《即将推出的 APACHE SPARK 4.0 的下一步是什么?.pdf》由会员分享,可在线阅读,更多相关《即将推出的 APACHE SPARK 4.0 的下一步是什么?.pdf(123页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reserved1 1Whats Next for the Upcoming Apache Spark 4.0 Release?Xiao LigatorsmileWenchen Fancloud-fanData+AI Summit 20242024 Databricks Inc.All rights reserved2Major FeaturesGAPythonStreamingMoreSQLPySpark UDF Unified ProfilingState Data Source Reader WIP Variant Data
2、TypesSpark K8S operatorWIP Stored ProceduresStructured LoggingANSI ModeArbitrary Stateful Processing V2New Streaming DocStreaming Python Data Sourcespandas 2 API parityArrow optimized Python UDFXML ConnectorsJava 21 Error Class EnhancementsPolymorphic Python UDTFUDF-level Dependency Control WIPWIPCo
3、llation SupportPython Data Source APIsApache Spark 4.0Spark ConnectapplyInArrowDF.toArrowSQL UDF/UDTFWIPExecute ImmediateView EvolutionAgendaNew FunctionalitiesSpark Connect,ANSI Mode,Arbitrary Stateful Processing V2,Collation Support,Variant Data Types,pandas 2.x SupportExtensionsPython Data Source
4、 APIs,XML/Databricks Connectors and DSV2 Extension,Delta 4.0Custom Functions and ProceduresSQL UDFs,SQL Scripting,Python UDTF,Arrow optimized Python UDF,PySpark UDF Unified ProfilerUsabilityStructured Logging Framework,Error Class Framework,Behavior Change Process2024 Databricks Inc.All rights reser
5、vedSpark Connect2024 Databricks Inc.All rights reservedHow to embed Spark in applications?Up until Spark 3.4:Hard to support todays developer experience requirementsApplicationsIDEs/NotebooksProgramming Languages/SDKsNo JVM InterOpClose to REPLSQL onlySparks Monolith DriverApplication LogicAnalyzerO
6、ptimizerSchedulerDistributed Execution EngineModern data application2024 Databricks Inc.All rights reservedConnect to Spark from Any App Thin client,with full power of Apache SparkSpark Connect Client APISparks DriverApplication GatewayAnalyzerOptimizerSchedulerDistributed Execution EngineApplicatio