《速度与规模架构.pptx》由会员分享,可在线阅读,更多相关《速度与规模架构.pptx(45页珍藏版)》请在三个皮匠报告上搜索。
1、Navigating Data Harmony by Exploring the Power of Apache Iceberg,Zoe Steinkamp,Agenda,Introduction to Apache Iceberg Why it was built+How it worksKey Benefits of Apache IcebergMigration+Integrations Use CasesWhy InfluxDB is using IcebergResources,Introduction to Apache Iceberg,3,Apache Iceberg,an op
2、en-source data table format,revolutionizes data management by addressing traditional catalog inefficiencies and enhancing query performance and storage costs.It supports ACID transactions,time travel,and SQL-like operations,integrating seamlessly with frameworks like Apache Spark and Apache Flink,ma
3、king it ideal for large-scale data lakes.,What Iceberg is and is not,Table Format specificationAPIs and libraries for interaction with that specification,A storage engineAn Execution Engine(for Query/Compute)A service,When Iceberg is not the right fit,Small datasetsUsing Iceberg for a small dataset
4、that doesnt necessitate a data lake might be excessive.,Real-time data ingestionOut of the box,Apache Iceberg does not support real-time data injection due to its reliance on batch processing.,Why it was built,8,Case Study-Netflix-Atlas Performance,Hive table-with Parquet filters:400k+splits per day
5、,not combinedExplain Query:9.6 minutes(planning time),Iceberg table-partition data filtering:15,218 splits,combined13 min(wall time)/10 sec(planning)Iceberg table-partition and min/max filtering:412 splits42 sec(wall time)/25 sec(planning),How it works,11,Iceberg Table Format,Metadata is stored as f
6、iles in object storage(just like data files).Read performance scales with low CPU cost.Hierarchical data statistics allow execution engines to efficiently prune metadata and data files.,Catalog,The catalog is the storehouse for current metadata pointers for each table.Multiple catalog backends exist