《3740 - Apache Gluten 简介.pdf》由会员分享,可在线阅读,更多相关《3740 - Apache Gluten 简介.pdf(31页珍藏版)》请在三个皮匠报告上搜索。
1、Yuan Zhou,Binwei YangIBMOct/2025State of the Union:Apache Gluten(incubator)GitHub:https:/ Plugin to Turbocharging Spark SQL performance22Task ThreadGluten PluginNative LibraryJNI BindingsOperator 1Operator 2Operator 3Op.is native?JVM SQL EngineOperatorsExpressionJITWhole Stage Code GenFallbackNoYesV
2、eloxApache Arrow Computer EngineClickhouse3rdParty EngineDriver NodeWorker NodeWorker NodeExecutorTaskTaskBlock managerExecutorTaskTaskBlock managerExecutorTaskTaskBlock managerExecutorTaskTaskBlock managerSpark Scale Out Framework+Optimal Native Library3Community(as of 10/7/2025)3 Incubating in Apa
3、che.TLP graduation in 2025(TBD)170 contributors from 40 companies,580 fork34 committers,including 23 PMCs from 8 companies4 Velox Maintainer 1.5k Star and keep growing 7k+PRs,3k+issues.WeChat Group:Gluten Gluten 使用者社区使用者社区(415)(415)ASF channel:incubator-gluten(134)4Adoption4 Key Spark providers on c
4、loud:Ali Cloud Tencent MSFT Fabric Google Lighting engine IBM WX.dataTens of PRC companies US Pinterest(Zaheen)Uber(Arnav)More on the road5Coverage5 31 operators of 37 required by customers 268 functions of 289 required by customers All data types Spark 3.2,3.3,3.4,3.5,4.0(expeirmental)File format P
5、arquet,ORC,csv Data lake:Native support Iceberg(read,write)Deltalake(read,PoC for write)COW on Hudi SpillGluten+Velox Backend Journey6Sep.Sep.TPCDS passedDec.Dec.Gluten Github Repo Setup202120222019-2020Gazelle DevelopmentJun.Jun.TPCH Passed1.61x boost2023Jul.Jul.1.0 releaseTPCH 3.18xTPCDS 2.67xVCPK
6、GOct.Oct.Switch to upstream Velox3.4 pass H/DSDec.Dec.TPCH 2.34xTPCDS 1.35xApr.Apr.Spark3.2/3.3 UT passedDecimal AddedTPCH 2.81xTPCDS 2.1x0.5 beta release2024Mar.Mar.Contributed to Apache3.4 UT passedTPCH 3.23xTPCDS 2.73x229 funcs29 opsDec.Dec.DataLake supportSpill SupportUDF framework2025Oct.Oct.Ph