1、WatsonX.DataNative Iceberg SupportUnifying Presto,Velox and Iceberg for fast,transactional lakehouseanalytics.Ying SuTechXchange Oct.6,2025Our Story Today Why:AI ecosystem&data lakehouse challenges.What:Iceberg V1V3 evolution and how Presto/Gluten/Velox solve these challenges.How:Engineering details
2、,performance work,and future roadmap.AI Ecosystem Anatomy A modern AI ecosystemis a layered environment that brings together data,compute,and model intelligence.User&Application LayerLLM frontends and notebooks:Databricks Notebooks,Jupyter,ChatGPT for interacting with data and models.Serving&inferen
3、ce systems:Ray Serve,Triton Inference Server,vLLM for scalable AI application backends.AI/ML InfrastructureTraining frameworks:PyTorch,TensorFlow for deep learning;Model orchestration:MLflow,Kubeflow,Ray Train,Databricks Model Serving for experiment tracking and deployment.Vector databases:Milvus,Pi
4、necone,Databricks Vector Searchfor storing embeddings used in retrieval-augmented generation(RAG),e.g.Store embeddings,perform similarity search for context retrieval.Data Processing&Query EnginesBatch and streaming frameworks like Apache Spark,Flink,and Presto for ETL,data transformation,and SQL an
5、alytics.Vectorized execution engine libraries like Velox for efficient in-memory compute.Data Foundation(Storage&Management)Data lakes/lakehouses(e.g.,Delta Lake,Iceberg,Hudi)for large-scale structured and unstructured data.Data warehouses(e.g.,Snowflake,BigQuery)for curated,analytics-ready data.Fea
6、ture stores(e.g.,Feast,Databricks Feature Store)for managing machine learning features.Lakehouse SQL engines like Presto and Gluten/Spark are the bridge between Data and AI.Workloads in Model Training PipelinesStageExample WorkloadsCommon Tools/EnginesData Ingestion&PreparationCollect,clean,transfor