1、Building a Modern Search&Analytics Database on Top of RocksDBIgor CanadiFounding Engineer Rockset12AppsStreaming records+deltasSearch queriesVector searchReal-time analyticsSQL?Rockset3Search and analytics SQL databaseReal-timeCloud-nativeOptimized for applicationsRocksDB4Key-value store based on Lo
2、g-Structured Merge TreesOpen sourced by Facebook in 2013Wide adoption across the industryNot typically the backend for analytics enginesSelected Topics5Cloud-Native DesignRocksDB ReplicationShared Hot StorageAnalytics on Top of RocksDBCloud-Native Design7AppsStreaming records+deltasSearch queriesVec
3、tor searchReal-time analyticsSQL8AppsStreaming records+deltasSearch queriesVector searchReal-time analytics1.Sharding scalabilitySharding Choices and Tradeoffs9Value-dependentmapping?Data+indexestogether?noyesyesno Opportunities for larger read I/Os Coordination overheads balloon as ingest latency d
4、ropsCandocumentschange?yesno Unable to support most search and analytics appsDoc shardingSmall read I/Os Efficient streaming ingest Consistent indexesClustering vs Doc Sharding10ClusteringDoc Sharding11AppsStreaming records+deltasSearch queriesVector searchReal-time analytics1.Doc Sharding Scalabili
5、ty+Streaming Ingest122.Isolation Between Ingest and Query Work132.Post-Ingest Replication Isolation+ElasticityApp BApp A143.Disaggregated Storage Efficiency+ElasticityApp BApp AWhat Technology for Disaggregated Storage?15Cold(AWS S3)Hot(EBS or NVMe)Cheapest$/GB Highly durable Built-in RPC API High/u
6、npredictable latency Expensive$/IOPS Cheapest$/IOPS Low latency+Build/run your own RPC service More expensive$/GBCloud-Native Search+Analytics1.Doc sharding with indexes Converged indexing Scalability Streaming ingest2.Post-ingest replication Compute:compute separation Isolation Compute elasticity3.