1、浮点时序数据压缩主讲人:李瑞远 副教授重庆大学时空实验室2025年7月Floating-Point Compression is Crucial Explosion of Floating-Point Time Series Data in a Streaming Fashion10k+records/s500GBdata/flight1Ttrajectories/day Efficient,Compact,Error-Bounded/Lossless Floating-Point Compression is VitalReduce bandwidth/storage cost,improv
2、e transmission/query efficiencyAvoid much information loss(e.g.,small errors lead to big problems)Network TransmissionData ManagementScientific CalculationComplex Floating-Point Layout Normal Numbers:i,j 1,11,ei=0 and ej=1IEEE 754 Double-Precision Floating-Point Layout Zero:i 1,11,ei=0 and j 1,52,mj
3、=0Normal NumbersSubnormal Numbers Infinity:i 1,11,ei=1 and j 1,52,mj=0 NaN:i 1,11,ei=1 and j 1,52,mj=1 Subnormal Number:i 1,11,ei=0 and j 1,52,mj=1Can be easily extended to these three TypesFocus on these two typesClassification of Floating-Point Compression朱明辉,李政,李瑞远*,陈超,郑宇.浮点时序数据压缩综述.软件学报,2025Floa
4、ting-Point CompressionLosslessError-BoundedTypeStreamingBatchedModeNetwork Transmission Data ManagementApplicationFramework of Our WorksFloating-Point CompressionLosslessError-BoundedStreamingBatchedTypeModeOur WorkElf(VLDB 2023)Ruiyuan Li,Zheng Li,Yi Wu,Chao Chen,Yu Zheng.Elf:Erasing-based Lossless
5、 Floating-Point CompressionC/in The 49th International Conference on Very Large Data Bases,PVLDB,16(7):1763-1776,2023.(VLDB 2023,CCF A)Shortcomings of Existing Solutions Existing SolutionsGeneral-purpose methods,e.g.,Gzip,Zstd,Xz,SnappyLossy floating-point methods,e.g.,ZFP,MDZLossless floating-point
6、 methods,e.g.,Gorilla,ChimpBatched,Low EfficiencyLoss Some InformationUnsatisfactory Effects Lossless XORing-Based Time Series Compression MethodCompression:vt vt-1=xort,then encode xort(e.g.,#lead+#trail+Center Bits)Decompression:decode xort,then xort vt-1=vtvtand vt-1vary little,so#lead is large.B