1、Memory Management in Apache GlutenHongze Zhang/IBMGluten OverviewSpark DriverSpark ExecutorSpark ExecutorSpark ExecutorSpark Physical Plan Gluten Physical Plan Gluten PartitionsGluten PartitionsGluten PartitionsSpark Logical Plan GlutenGluten PluginPluginMemory ManagerPlan ConversionColumnar Shuffle
2、Shim LayerFallbackMetricMemory Management OverviewSpark Unified Memory ManagerExecution Memory PoolOff-HeapOn-HeapStorage memory PoolOff-HeapOn-HeapVelox ExecutionVelox IOShuffle BuffersArrowVelox Boradcast(Experimental)Velox BoradcastSpark BroadcastShuffle BlocksCacheSpark ExecutionKey Takeaways Ex
3、ecution/storage ratio is dynamically adjusted Off-heap/on-heap ratio is fixed All task-wise memory allocations going to off-heap memory pool All process-wise memory allocations going to on-heap memory poolVelox GlobalMemory Consumer HierarchySpark Unified Memory ManagerGluten Memory ConsumerChild Ar
4、row ConsumerJava Arrow AllocatorChild Native ConsumerC+Arrow AllocatorC+Velox ArbitratorSpark Memory ConsumerJNI BridgeKey Takeaways1.All off-heap allocations are reported2.JNI call overhead minimized3.Observability4.Task-wise leak detectionJavaC+ObservabilityVeloxShuffleC2RVanilla SparkArrowPeak Us
5、agesCurrent UsagesOOM Error MessageLeak DetectionGluten detects memory leak when:Every Velox query ends Every native task ends(C2R/R2C/Shuffle,etc.)Every Spark task endsLeak detection is also integrated with Spark configuration:spark.unsafe.exceptionOnMemoryLeak.Leak Prevention:C+Objects(Executor)JN
6、I APIJNI RuntimeVelox QueryShuffleC2RR2CJNI Memory ManagerKey Takeaways1.Java code will be required to create a JNI runtime with a JNI memory manager for JNI invocations.2.Memory leak detection will be conducted when the JNI runtime is closed.3.Gluten will automatically close the managed resources w