A curated paper list of awesome Online Analytical Processing database systems, theory, frameworks, resources, tools and other awesomeness, for database researchers/engineers.
The repository is under construction. Welcome new PR, please conform to the committed rules:
paperName(with pdf link) (alias) [MeetingName Year] Github link if it has open-sourced code (optional)Thanks to all authors of the paper/repository I cite :D
- Awesome-OLAP-Paper
- Introduction
- Contributing
- Acknowledge
- Table of Content
- Query-Aware Database Generation
- Query Schedule
- Query Optimization
- Query Execution
- Data Dependency Search
- Query Compilation
- Bugs Detection
- Storage
- Proxy
- Data Transfer
- Data Loading
- Database Kernel
- Others
- Star History
- QAGen: Generating Query-Aware Test Databases [SIGMOD 07]
- Generating Targeted Queries for Database Testing [SIGMOD 08]
- Generating Databases for Query Workloads [VLDB 10]
- Data Generation using Declarative Constraints [SIGMOD 11]
- MyBenchmark: generating databases for query workloads [VLDB 14]
- Scalable and Dynamic Regeneration of Big Data Volumes [EDBT 18]
- Touchstone: Generating Enormous Query-Aware Test Databases [OSDI 18]
- Synthesizing Linked Data Under Cardinality and Integrity Constraints [SIGMOD 21]
- Projection-Compliant Database Generation [VLDB 22]
- SAM: Database Generation from Query Workloads with Supervised Autoregressive Models [SIGMOD 22]
- Mirage: Generating Enormous Databases for Complex Workloads [ICDE 24]
- Query Aware Database Generation for Match Operators [DASFAA 24]
- Controllable Tabular Data Synthesis Using Diffusion Models [SIGMOD 24]
- A Query-Aware Enormous Database Generator For System Performance Evaluation [SIGMOD 25]
- PrivSyn: Differentially Private Data Synthesis [ATC 21]
- Synthesizing Linked Data Under Cardinality and Integrity Constraints [SIGMOD 21]
- Data Synthesis via Differentially Private Markov Random Fields [VLDB 21]
- PrivLava: Synthesizing Relational Data with Foreign Keys under Differential Privacy [SIGMOD 23]
- Privacy-Enhanced Database Synthesis for Benchmark Publishing [VLDB 25]
- Self-Tuning Query Scheduling for Analytical Workloads [SIGMOD 21]
- Memory Efficient Scheduling of Query Pipeline Execution [CIDR 22]
- LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems [SIGMOD 22]
- Rotary: A Resource Arbitration Framework for Progressive Iterative Analytics [ICDE 23]
- Laser: Buffer-Aware Learned Query Scheduling in Master-Standby Databases [VLDB 25]
- Improving DBMS Scheduling Decisions with Accurate Performance Prediction on Concurrent Queries [VLDB 25]
- Sampling-Based Query Re-Optimization [SIGMOD 16]
- Leveraging Re-costing for Online Optimization of Parameterized Queries with Guarantees [SIGMOD 17]
- Adaptive Optimization of Very Large Join Queries [SIGMOD 18]
- Efficient Massively Parallel Join Optimization for Large Queries [SIGMOD 22]
- Leveraging Query Logs and Machine Learning for Parametric Query Optimization [VLDB 22]
- Rethink Query Optimization in HTAP Databases [SIGMOD 24]
- SPQO: Learning to Safely Reuse Cached Plans for Dynamic Workloads [DASFAA 24]
- Optimizing Nested Recursive Queries [SIGMOD 24]
- Efficient Enumeration of Recursive Plans in Transformation-based Query Optimizers [VLDB 24]
- Presto's History-based Query Optimizer [VLDB 24]
- RankPQO: Learning-to-Rank for Parametric Query Optimization [VLDB 25]
- Robust query processing through progressive optimization [SIGMOD 04]
- Robust Query Optimization Methods With Respect to Estimation Errors: A Survey [SIGMOD 15]
- Efficient Query Re-optimization with Judicious Subquery Selections [SIGMDO 23]
- ROME: Robust Query Optimization via Parallel Multi-Plan Execution [SIGMOD 24]
- QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting [VLDB 23]
- SlabCity: Whole-Query Optimization using Program Synthesis [VLDB 23]
- GEqO: ML-Accelerated Semantic Equivalence Detection [SIGMOD 24]
- Proving Query Equivalence Using Linear Integer Arithmetic [SIGMOD 24]
- QED: A Powerful Query Equivalence Decider for SQL [VLDB 24]
- VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints [OOPSLA 24]
- PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL-Query Verification [SIGMOD 25]
- Query Weak Equivalence and Its Verification in Analytical Databases [ICDE 25]
- Proving Cypher Query Equivalence [ICDE 25]
- Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries [None 87]
- Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results [ACM Transactions on Database Systems 93]
- Selectivity Estimation Without the Attribute Value Independence Assumption (MHIST) [SIGMOD 97]
- On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications [ICDT 99]
- Approximating multi-dimensional aggregate range queries over real attributes (GENHIST) [SIGMOD 00]
- Independence is good: Dependency-based histogram synopses for high-dimensional data (DBHist) [SIGMOD 01]
- STHoles: a multidimensional workload-aware histogram [SIGMOD 01]
- Selectivity Estimation using Probabilistic Models[SIGMOD 01]
- A multi-dimensional histogram for selectivity estimation and fast approximate query answering [CASCON 03]
- The history of histograms (abridged) [VLDB 03]
- SASH: A Self-Adaptive Histogram Set for Dynamically Changing Workloads[VLDB 03]
- Selectivity estimators for multidimensional range queries over real attributes (GENHIST) [VLDB 03]
- ISOMER: Consistent histogram construction using query feedback [ICDE 06]
- Join Over Histograms [Alberto Dell'Era 07]
- Consistent Histograms In The Presence of Distinct Value Counts [VLDB 08]
- Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions [VLDB 11]
- Efficiently adapting graphical models for selectivity estimation [VLDB 13]
- Improving Accuracy and Robustness of Self-Tuning Histograms by Subspace Clustering [TKDE 15]
- TKHist: Cardinality Estimation for Join Queries via Histograms with Dominant Attribute Correlation Finding [arXiv 25]
- Two-Level Sampling for Join Size Estimation [SIGMOD 17]
- Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing [SIGMOD 21]
- Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation [SIGMOD 15]
- Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models [VLDB 17]
- QuickSel: Quick Selectivity Learning with Mixture Models [SIGMOD 20]
- LHist: Towards Learning Multidimensional Histogram for Massive Spatial Data [ICDE 21]
- CoLSE: A Lightweight and Robust Hybrid Learned Model for Single-Table Cardinality Estimation using Joint CDF [ICDE 26]
- Access path selection in a relational database management system [SIGMOD 79]
- Plan Bouquets: Query Processing without Selectivity Estimation [SIGMOD 14]
- Exact Cardinality Query Optimization with Bounded Execution Cost [SIGMOD 19]
- JoinSketch: A Sketch Algorithm for Accurate and Unbiased Inner-Product Estimation [SIGMOD 23]
- Efficient and Effective Cardinality Estimation for Skyline Family [SIGMOD 23]
- Cardinality Estimation for Having-Clauses [VLDB 25]
- Faper: Join Tree with Uncertainty Awareness for Faster, More Precise and Robust Cardinality Estimation [PAKDD 25]
- Downsizing Diffusion Models for Cardinality Estimation [arXiv 25]
- Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches [A detailed book published in 2012]
- Preventing bad plans by bounding the impact of cardinality estimation errors [VLDB 09]
- Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsof SQL Server [VLDB 23]
- The Accuracy of Cardinality Estimators: Unraveling the Evaluation Result Conundrum [VLDB 25]
- Optimal Top-Down Join Enumeration [SIGMOD 07]
- A New, Highly Efficient, and Easy To Implement Top-Down Join Enumeration Algorithm [VLDB 11]
- Counter Strike: Generic Top-Down Join Enumeration for Hypergraphs [VLDB 13]
- Join Order Selection with Deep Reinforcement Learning: Fundamentals, Techniques, and Challenges [VLDB 23]
- Efficiently Computing Join Orders with Heuristic Search [SIGMOD 23]
- Ready to Leap (by Co-Design)? Join Order Optimisation on Quantum Hardware [SIGMOD 23]
- Quantum-Inspired Digital Annealing for Join Ordering [VLDB 24]
- POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance [VLDB 24]
- Sub-optimal Join Order Identification with L1-error [SIGMOD 24]
- DPconv: Super-Polynomially Faster Join Ordering [SIGMOD 25]
- Debunking the Myth of Join Ordering: Toward Robust SQL Analytics [SIGMOD 25]
- AJOSC: Adaptive join order selection for continuous queries on data streams [SIGMOD 25]
- Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems [VLDB 12]
- Leapfrog Triejoin: a worst-case optimal join algorithm [International Conference on Database Theory 12]
- Lightning Fast and Space Efficient Inequality Joins [VLDB 15]
- An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory [SIGMOD 16]
- Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems [SIGMOD 18]
- Adopting Worst-Case Optimal Joins in Relational Database Systems [VLDB 20]
- Free Join: Unifying Worst-Cast Optimal and Traditional Joins [arXiv 23]
- Reservoir Sampling over Joins [SIGMOD 24]
- Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries [CIDR 24]
- Efficiently Processing Joins and Grouped Aggregations on GPUs [SIGMOD 25]
- HoneyComb: A Parallel Worst-Case Optimal Join on Multicores [SIGMOD 25]
- SwiftSpatial: Spatial Joins on Modern Hardware [SIGMOD 25]
- Accelerate Distributed Joins with Predicate Transfer [SIGMOD 25]
- LEO – DB2’s LEarning Optimizer [VLDB 11]
- Predicting query execution time: are optimizer cost models really unusable? [ICDE 13]
- Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads [VLDB 13]
- Forecasting the cost of processing multi-join queries via hashing for main-memory databases [SoCC 15]
- Query Performance Prediction for Concurrent Queries using Graph Embedding [VLDB 20]
- Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload [arXiv 21]
- Rethinking Learned Cost Models: Why Start from Scratch? [SIGMOD 24]
- Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools [SIGMOD 24]
- How Good Are Query Optimizers, Really? [VLDB 15]
- Cardinality Estimation: An Experimental Survey [VLDB 17]
- Query optimization through the looking glass, and what we found running the Join Order Benchmark [VLDBJ 17]
- A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration [VLDB 21]
- Have query optimizers hit the wall? [VLDBJ 22]
- Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation [VLDB 22]
- Data dependencies for query optimization: a survey [VLDBJ 22]
- Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis [VLDB 23]
- How to Optimize SQL Queries? A Comparison Between Split, Holistic, and Hybrid Approaches [VLDB 25]
- Workload Insights From The Snowflake Data Cloud: What Do Production Analytic Queries Really Look Like? [VLDB 25]
- SQL Server Column Store Indexes [SIGMOD 11]
- The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases [ICDE 13]
- Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation [SIGMOD 18]
- CUBIT: Concurrent Updatable Bitmap Indexing [VLDB 25]
- B-Trees Are Back: Engineering Fast and Pageable Node Layouts [SIGMOD 25]
- MonetDB/X100: Hyper-Pipelining Query Execution [CIDR 05]
- DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing [DaMoN 08]
- Materialization Strategies in the Vertica Analytic Database: Lessons Learned [ICDE 13]
- Adaptive Query Processing in the Looking Glass [CIDR 15]
- Rethinking SIMD Vectorization for In-Memory Databases [SIGMOD 15]
- Efficient Processing of Window Functions in Analytical SQL Queries [VLDB 15]
- Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? [SIGMOD 17]
- Looking Ahead Makes Query Plans Robust [VLDB 17]
- Building Advanced SQL Analytics From Low-Level Plan Operators [SIGMOD 21]
- SkinnerMT: Parallelizing for Efficiency and Robustness in Adaptive Query Processing on Multicore Platforms [VLDB 22]
- ChainedFilter: Combining Membership Filters by Chain Rule [SIGMOD 24]
- Saving Money for Analytical Workloads in the Cloud [VLDB 24]
- Adaptive and Robust Query Execution for Lakehouses at Scale [VLDB 24]
- DuckDB-SGX2: The Good, The Bad and The Ugly within Confidential Analytical Query Processing [DaMoN 24]
- The Key to Effective UDF Optimization: Before Inlining, First Perform Outlining [VLDB 25]
- High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance [SIGMOD 25]
- Data Chunk Compaction in Vectorized Execution [SIGMOD 25]
- FAAQP: Fast and Accurate Approximate Query Processing based on Bitmap-augmented Sum-Product Network [SIGMOD 25]
- OLTP in the Cloud: Architectures, Tradeoffs, and Cost [VLDB 25]
- How to Architect a Query Compiler [SIGMOD 16]
- Adaptive Execution of Compiled Queries [ICDE 18]
- Search-Based Test Data Generation for SQL Queries [ICSE 18]
- Finding Bugs in Database Systems via Query Partitioning [OOPSLA 20]
- Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction [FSE 20]
- Testing Database Engines via Query Plan Guidance [ICSE 23]
- GDsmith: Detecting Bugs in Cypher Graph Database Engines [ISSTA 23]
- Snowcat: Efficient Kernel Concurrency Testing using a Learned Coverage Predictor [SOSP 23]
- Detecting Isolation Bugs via Transaction Oracle Construction [ICSE 23]
- Detecting Logic Bugs of Join Optimizations in DBMS [SIGMOD 23 Best Paper]
- Fonte: Finding Bug Inducing Commits from Failures [ICSE 23]
- Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction [VLDB 24]
- CONI: Detecting Database Connector Bugs via State-Aware Test Case Generation [ICSE 24]
- WINGFUZZ: Implementing Continuous Fuzzing for DBMSs [ATC 24]
- Keep It Simple: Testing Databases via Differential Query Plans [SIGMOD 24]
- Plume: Efficient and Complete Black-Box Checking of Weak Isolation Levels [OOPSLA2 2024]
- DBStorm: Generating Various Effective Workloads for Testing Isolation Levels [ISSTA 24]
- SQLaser: Detecting DBMS Logic Bugs with Clause-Guided Fuzzing [arXiv 24]
- Understanding and Detecting SQL Function Bugs [EuroSys 25]
- Understanding and Reusing Test Suites Across Database Systems [SIGMOD 25]
- Detecting Logic Bugs in Database Engines via Equivalent Expression Transformation [ATC 24]
- THANOS: DBMS Bug Detection via Storage Engine Rotation Based Differential Testing [ICSE 25]
- Semantic Conformance Testing of Relational DBMS [VLDB 25]
- Automatic Database Configuration Debugging using Retrieval-Augmented Language Models [SIGMOD 25]
- Finding Logic Bugs in Spatial Database Engines via Affine Equivalent Input [SIGMOD 25]
- Constant Optimization Driven Database System Testing [SIGMOD 25]
- Blackbox Fuzzing of Distributed Systems with Multi-Dimensional Inputs and Symmetry-Based Feedback Pruning [NDSS 25]
- Finding Logic Bugs in Graph-processing Systems via Graph-cutting [SIGMOD 25]
- Model Checking Guided Incremental Testing for Distributed Systems [ISSTA 25]
- Scaling Automated Database System Testing [arXiv 25]
- Testing Database Systems with Large Language Model Synthesized Fragments [arXiv 25]
- Detecting Schema-Related Logic Bugs in Relational DBMSs via Equivalent Database Construction [VLDB 25]
- Simple Testing Can Expose Most Critical Transaction Bugs: Understanding and Detecting Write-Specific Serializability Violations in Database Systems [VLDB 25]
- Detecting Isolation Anomalies in Relational DBMSs [ISSTA 25]
- Vbox: Efficient Black-Box Serializability Verification [arXiv 25]
- Fucci: Database Transaction Fuzzing via Random Conflict Construction and Multilevel Constraint Solving [VLDB 25]
- DDLUMOS: Understanding and Detecting Atomic DDL Bugs in DBMSs [ATC 25]
- Detecting Logic Bugs in DBMSs via Equivalent Data Construction [SIGMOD 25]
- SRS: Detecting Logic Bugs of Join Implementation in DBMSs via Set Relation Synthesis [SIGMOD 25]
- ARG: Testing Query Rewriters via Abstract Rule Guided Fuzzing [ASE 25]
- Anomaly Pattern-guided Transaction Bug Testing in Relational Databases [SIGMOD 26]
- Sequence-Oriented DBMS Fuzzing [ICDE 23]
- DynSQL: Stateful Fuzzing for Database Management Systems with Complex and Valid SQL Query Generation [ATC 23]
- Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing via Cross-DBMS SQL Transfer [ICSE 24]
- DepState: Detecting Synchronization Failure Bugs in Distributed Database Management Systems [ISSTA 25]
- Fawkes: Finding Data Durability Bugs in DBMSs via Recovered Data State Verification [SOSP 25]
- APOLLO: automatic detection and diagnosis of performance regressions in database systems [VLDB 19]
- CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality Estimation [ICSE 24]
- PUPPY: Finding Performance Degradation Bugs in DBMSs via Limited-Optimization Plan Construction [ICSE 25]
- Hulk: Exploring Data-Sensitive Performance Anomalies in DBMSs via Data-Driven Analysis [ISSTA 25]
- A Comprehensive Survey on Database Management System Fuzzing: Techniques, Taxonomy and Experimental Comparison [arXiv 23]
- Survey on Database Management System Fuzzing Techniques [Journal of Software 24]
- Substructure-aware Log Anomaly Detection [VLDB 25]
- From Logs to Causal Inference: Diagnosing Large Systems [VLDB 25]
- RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems [VLDB 25]
- OpDiag: Unveiling Database Performance Anomalies through Query Operator Attribution [TKDE 25]
- DBPecker: A Graph-Based Compound Anomaly Diagnosis System for Distributed RDBMSs [VLDB 25]
- Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces [arXiv 25]
- What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines [VLDB 23]
- An Empirical Evaluation of Columnar Storage Formats [VLDB 24]
- Leco: Lightweight compression via learning serial correlations [SIGMOD 24]
- Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine [SIGMOD 24]
- NULLS! Revisiting Null Representation in Modern Columnar Formats [DaMoN 24]
- Boosting OLTP Performance with Per-Page Logging on NVDIMM [SIGMOD 25]
- Data formats in analytical DBMSs: performance trade-offs and future directions [VLDBJ 25]
- Data chunk compaction in vectorized execution [SIGMOD 25]
- Lance: Efficient Random Access in Columnar Storage through Adaptive Structural Encodings [arXiv 25]
- Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility [VLDB 25]
- F3: The Open-Source Data File Format for the Future [SIGMOD 26]
- Dissecting, Designing, and Optimizing LSM-based Data Stores [SIGMOD 22 Tutorial]
- Magma: A High Data Density Storage Engine Used in Couchbase [VLDB 22]
- CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure [SIGMOD 24]
- CAMAL: Optimizing LSM-trees via Active Learning [SIGMOD 25]
- Disco: A Compact Index for LSM-trees [SIGMOD 25]
- Randomized Sketches for Quantile in LSM-tree based Store [SIGMOD 25]
- Rethinking The Compaction Policies in LSM-trees [SIGMOD 25]
- DFlush: DPU-Offloaded Flush for Disaggregated LSM-based Key-Value Stores [SIGMOD 25]
- Rethinking LSM-tree based Key-Value Stores: A Survey [arXiv 25]
- How to Grow an LSM-tree? Towards Bridging the Gap Between Theory and Practice [SIGMOD 25]
- Parallel kd-tree with Batch Updates [SIGMOD 25]
- Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics [CIDR 21]
- Disaggregated Database Systems [VLDB 23 Tutorial]
- GPU Database Systems Characterization and Optimization [VLDB 24]
- The Art of Latency Hiding in Modern Database Engines [VLDB 24]
- DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay [SIGMOD 24]
- Rapid Data Ingestion through DB-OS Co-design [SIGMOD 25]
- Practical DB-OS Co-Design with Privileged Kernel Bypass [SIGMOD 25]
- Low-Latency Transaction Scheduling via Userspace Interrupts: Why Wait or Yield When You Can Preempt? [SIGMOD 25]
- Are database system researchers making correct assumptions about transaction workloads? [SIGMOD 25]
- BPF-DB: A Kernel-Embedded Transactional Database Management System For eBPF Applications [SIGMOD 25]
- Styx: Transactional Stateful Functions on Streaming Dataflows [SIGMOD 25]
- GTX: A Write-Optimized Latch-free Graph Data System with Transactional Support [SIGMOD 25]
- Wait and See: A Delayed Transactions Partitioning Approach in Deterministic Database Systems for Better Performance [SIGMOD 25]
- Moving on From Group Commit: Autonomous Commit Enables High Throughput and Low Latency on NVMe SSDs [SIGMOD 25]
- A Hybrid Approach to Integrating Deterministic and Non-deterministic Concurrency Control in Database Systems [VLDB 25]
- VerIso: Verifiable Isolation Guarantees for Database Transactions [VLDB 25]
- What Goes Around Comes Around... And Around... [SIGMOD 24]
- Cloud-Native Database Systems and Unikernels: Reimagining OS Abstractions for Modern Hardware [VLDB 24]
- Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility [VLDB 25]
- OLTP Through the Looking Glass 16 Years Later: Communication is the New Bottleneck [CIDR 25]
- Scalable Garbage Collection for In-Memory MVCC Systems [VLDB 13]
- Rethinking serializable multiversion concurrency control [VLDB 15]
- An Empirical Evaluation of In-Memory Multi-Version Concurrency Control [VLDB 17]
- Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting [SIGMOD 18]
- Long-lived Transactions Made Less Harmful [SIGMOD 20]
- Rethink the Scan in MVCC Databases [SIGMOD 21]
- Diva: Making MVCC Systems HTAP-Friendly [SIGMOD 22]
- Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems [VLDB 22]
- Scalable and Robust Snapshot Isolation for High-Performance Storage Engines [VLDB 23]
- One-shot Garbage Collection for In-memory OLTP through Temporality-aware Version Storage [SIGMOD 23]
- HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots [ICDE 12]
- TiDB: A raft-based htap database [VLDB 20]
- OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster [VLDB 23]
- Perseus: Achieving Strong Consistency and High Data Freshness for Scalable Geo-distributed HTAP [SIGMOD 25]
- veDB-HTAP:a Highly Integrated, Efficient and Adaptive HTAP System [VLDB 25]
- BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications [SIGMOD 17]
- F1 Lightning: HTAP as a Service [VLDB 20]
- Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing [ATC 21]
- ByteHTAP: ByteDance’s HTAP System with High Data Freshness and Strong Data Consistency [VLDB 22]
- Hermes: Off-the-Shelf Real-Time Transactional Analytics [VLDB 25]
- PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database [VLDB 18]
- PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba [SIGMOD 23]
- HTAP Databases: What is New and What is Next [SIGMOD 22]
- Data Sharing Model and Optimization Strategies in HTAP Database Systems [Journal of Software 23]
- HTAP Databases: A Survey [TKDE 24]
- A survey on hybrid transactional and analytical processing [VLDB Journal 24]
- Survey on Benchmarking Ability of HTAP Benchmarks [Journal of Software 24]
- Adaptive HTAP through elastic resource scheduling [SIGMOD 20]
- Hybrid transactional/analytical processing amplifies io in lsm-trees [IEEE 22]
- Proteus: Autonomous Adaptive Storage for Mixed Workloads [SIGMOD 22]
- TiQuE: Improving the Transactional Performance of Analytical Systems for True Hybrid Workloads [VLDB 23]
- Deploying Computational Storage for HTAP DBMSs Takes More Than Just Computation Offloading [VLDB 23]
- Log Replaying for Real-Time HTAP: An Adaptive Epoch-based Two-Stage Framework [ICDE 24]
- Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAP [VLDB 24]
- Twisted Twin: A Collaborative and Competitive Memory Management Approach in HTAP Systems [VLDB 25]
- PUSHtap: PIM-based In-Memory HTAP with Unified Data Storage Format [ASPLOS 25]
- DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay [SIGMOD 24]
- Fast Parallel Recovery for Transactional Stream Processing on Multicores [ICDE 24]
- Surprise Benchmarking: The Why, What, and How [DBTest 24]
- TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems [VLDB 23]
- Dike: A Benchmark Suite for Distributed Transactional Databases [SIGMOD 23]
- DBPA: A Benchmark for Transactional Database Performance Anomalies [SIGMOD 23]
- TPC-DS, Taking Decision Support Benchmarking to the Next Level [SIGMOD 02]
- Generating Thousands of Benchmark Queries in Seconds [VLDB 04]
- The Making of TPC-DS [VLDB 06]
- Why You Should Run TPC-DS: A Workload Analysis [VLDB 07]
- Introducing Skew into the TPC-H Benchmark [21]
- How Good is My HTAP System? [SIGMOD 22]
- OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems [ICDE 22]
- Cloud Analytics Benchmark [VLDB 23]
- PBench: Workload Synthesizer with Real Statistics for Cloud Analytics Benchmarking [VLDB 25]
- CloudyBench: A Testbed for A Comprehensive Evaluation of Cloud-Native Databases [ICDE 25]
- Redbench: Workload Synthesis From Cloud Traces [VLDB 26]
- M2Bench: A Database Benchmark for Multi-Model Analytic Workloads [VLDB 23]
- Pollock: A Data Loading Benchmark [VLDB 23]
- VeriBench: Analyzing the Performance of Database Systems with Verifiability [VLDB 23]
- TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications [VLDB 23]
- CDSBen: Benchmarking the Performance of Storage Services in Cloud-native Database System at ByteDance [VLDB 23]
- FEBench: A Benchmark for Real-Time Relational Data Feature Extraction [VLDB 23]
- TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems [VLDB 23]
- ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems [VLDB 23]
- Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet [VLDB 24]
- The LDBC Financial Benchmark: Transaction Workload [VLDB 25]
- Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search [SIGMOD 25]
- MIRAGE-ANNS: Mixed Approach Graph-based Indexing for Approximate Nearest Neighbor Search [SIGMOD 25]
- UNIFY: Unified Index for Range Filtered Approximate Nearest Neighbors Search [VLDB 25]
- Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study [SIGMOD 26]
- TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs [SIGMOD 26]
- Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization [arXiv 25]
- Multi-model Databases: A New Journey to Handle the Variety of Data [CSUR 19]
- M2Bench: A Database Benchmark for Multi-Model Analytic Workloads [VLDB 23]
- MMSBench-Net: Scenario-Based Evaluation of Multi-Model Database Systems [23]
- MMDBench: A Benchmark for Hybrid Query in Multimodal Database [24]
- 𝑺𝒕𝒆𝒊𝒏𝒆𝒓-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes [VLDB 24]
- Evaluating and Generating Query Workloads for High Dimensional Vector Similarity Search [KDD 25]
- Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data [arXiv 25]
- SemBench: A Benchmark for Semantic Query Processing Engines [arXiv 25]
- Are There Fundamental Limitations in Supporting Vector Data Management in Relational Databases? A Case Study of PostgreSQL [ICDE 24]
- Survey of Vector Database Management Systems [VLDBJ 24]
- Vector Database Management Techniques and Systems [SIGMOD 24]
- BigVectorBench: Heterogeneous Data Embedding and Compound Queries are Essential in Evaluating Vector Databases [VLDB 25]
- 向量数据库及DB4LLM技术 [JOS 25]
- FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework [VLDB 24]
- Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search [SIGMOD 25]
- Consistency in Non-Transactional Distributed Storage Systems [arXiv 15]
- NOC-NOC: Towards Performance-optimal Distributed Transactions [SIGMOD 24]
- Native Distributed Databases: Problems, Challenges and Opportunities [VLDB 24 Tutorial]
- A survey on transactional stream processing [VLDBJ 23]
- Are Database System Researchers Making Correct Assumptions about Transaction Workloads? [SIGMOD 25]