Awesome-OLAP-Paper

Introduction

A curated paper list of awesome Online Analytical Processing database systems, theory, frameworks, resources, tools and other awesomeness, for database researchers/engineers.

Contributing

The repository is under construction. Welcome new PR, please conform to the committed rules:

paperName(with pdf link) [MeetingName Year] Github link if it has open-sourced code (optional)

Acknowledge

Thanks to all authors of the paper/repository I cite :D

Table of Content

Awesome-OLAP-Paper
- Introduction
- Contributing
- Acknowledge
- Table of Content
- Query-Aware Database Generation
  - Privacy
  - Survey
- Query Schedule
- Query Optimization
  - Query Rewrite
  - Cardinality Estimation
    - Histogram
    - Sampling
    - Others
    - Survey
  - Join Order
  - Join Algorithms
  - Cost Model
  - View
  - Survey
  - Index
- Query Execution
- Data Dependency Search
- Query Compilation
- Bugs Detection
  - Static Analysis
- Storage
  - LSM-Tree
- Proxy
- Data Loading
- Database Kernel
  - Survey
- Others
  - MVCC
  - HTAP
    - System Architecture
    - Kernel Optimization
  - Result Replay
  - Benchmark
    - OLTP
    - OLAP
    - HTAP
    - Others
    - Multi-Model
  - Time Series
  - Vector Database
    - Survey
  - Algorithm
  - Distributed Systems
  - OLTP
  - AI4DB
  - Industry
- Star History

Query-Aware Database Generation

QAGen: Generating Query-Aware Test Databases [SIGMOD 07]
Generating Targeted Queries for Database Testing [SIGMOD 08]
Generating Databases for Query Workloads [VLDB 10]
Data Generation using Declarative Constraints [SIGMOD 11]
MyBenchmark: generating databases for query workloads [VLDB 14]
Scalable and Dynamic Regeneration of Big Data Volumes [EDBT 18]
Touchstone: Generating Enormous Query-Aware Test Databases [OSDI 18]
Synthesizing Linked Data Under Cardinality and Integrity Constraints [SIGMOD 21]
Projection-Compliant Database Generation [VLDB 22]
SAM: Database Generation from Query Workloads with Supervised Autoregressive Models [SIGMOD 22]
Mirage: Generating Enormous Databases for Complex Workloads [ICDE 24]

Privacy

PrivSyn: Differentially Private Data Synthesis [ATC 21]
Synthesizing Linked Data Under Cardinality and Integrity Constraints [SIGMOD 21]
Data Synthesis via Differentially Private Markov Random Fields [VLDB 21]
PrivLava: Synthesizing Relational Data with Foreign Keys under Differential Privacy [SIGMOD 23]
Privacy-Enhanced Database Synthesis for Benchmark Publishing [arXiv 24]

Survey

Synthetic Data Generation for Enterprise DBMS [ICDE 23]

Query Schedule

Self-Tuning Query Scheduling for Analytical Workloads [SIGMOD 21]
Memory Efficient Scheduling of Query Pipeline Execution [CIDR 22]
LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems [SIGMOD 22]
Rotary: A Resource Arbitration Framework for Progressive Iterative Analytics [ICDE 23]

Query Optimization

Sampling-Based Query Re-Optimization [SIGMOD 16]
Kepler: Robust Learning for Parametric Query Optimization [SIGMOD 23]
Rethink Query Optimization in HTAP Databases [SIGMOD 24]
Optimizing Nested Recursive Queries [SIGMOD 24]
Efficient Enumeration of Recursive Plans in Transformation-based Query Optimizers [VLDB 24]
ROME: Robust Query Optimization via Parallel Multi-Plan Execution [SIGMOD 24]
Presto’s History-based Query Optimizer [VLDB 24]

Query Rewrite

QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting [VLDB 23]
SlabCity: Whole-Query Optimization using Program Synthesis [VLDB 23]
GEqO: ML-Accelerated Semantic Equivalence Detection [SIGMOD 24]
Proving Query Equivalence Using Linear Integer Arithmetic [SIGMOD 24]
QED: A Powerful Query Equivalence Decider for SQL [VLDB 24]
VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints [OOPSLA 24]

Cardinality Estimation

Survey

Join Order

Join Algorithms

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems [VLDB 12]
Leapfrog Triejoin: a worst-case optimal join algorithm [International Conference on Database Theory 12]
An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory [SIGMOD 16]
Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems [SIGMOD 18]
Adopting Worst-Case Optimal Joins in Relational Database Systems [VLDB 20]
Free Join: Unifying Worst-Cast Optimal and Traditional Joins [arXiv 23]
Reservoir Sampling over Joins [SIGMOD 24]

Cost Model

View

Foreign Keys Open the Door for Faster Incremental View Maintenance [SIGMOD 23]

Survey

How Good Are Query Optimizers, Really? [VLDB 15]
Cardinality Estimation: An Experimental Survey [VLDB 17]
A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration [VLDB 21]
Have query optimizers hit the wall? [VLDB Journal 22]
Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation [VLDB 22]
Data dependencies for query optimization: a survey [VLDB Journal 22]
Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis [VLDB 23]

Index

SQL Server Column Store Indexes [SIGMOD 11]
Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation [SIGMOD 18]

Query Execution

MonetDB/X100: Hyper-Pipelining Query Execution [CIDR 05]
Materialization Strategies in the Vertica Analytic Database: Lessons Learned [ICDE 13]
Rethinking SIMD Vectorization for In-Memory Databases [SIGMOD 15]
Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? [SIGMOD 17]
Building Advanced SQL Analytics From Low-Level Plan Operators [SIGMOD 21]
SkinnerMT: Parallelizing for Efficiency and Robustness in Adaptive Query Processing on Multicore Platforms [VLDB 22]
ChainedFilter: Combining Membership Filters by Chain Rule [SIGMOD 24]
Saving Money for Analytical Workloads in the Cloud [VLDB 24]
Adaptive and Robust Query Execution for Lakehouses at Scale [VLDB 24]

Data Dependency Search

Discovering Functional Dependencies through Hitting Set Enumeration [SIGMOD 24]

Query Compilation

How to Architect a Query Compiler [SIGMOD 16]
Adaptive Execution of Compiled Queries [ICDE 18]

Bugs Detection

APOLLO: automatic detection and diagnosis of performance regressions in database systems [VLDB 19]
Finding Bugs in Database Systems via Query Partitioning [OOPSLA 20]
Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction [FSE 20]
Sequence-Oriented DBMS Fuzzing [ICDE 23]
DynSQL: Stateful Fuzzing for Database Management Systems with Complex and Valid SQL Query Generation [ATC 23]
Detecting Isolation Bugs via Transaction Oracle Construction [ICSE 23]
Detecting Logic Bugs of Join Optimizations in DBMS [SIGMOD 23 Best Paper]
Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction [VLDB 24]
CONI: Detecting Database Connector Bugs via State-Aware Test Case Generation [ICSE 24]
Keep It Simple: Testing Databases via Differential Query Plans [SIGMOD 24]
Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing via Cross-DBMS SQL Transfer [ICSE 24]
Plume: Efficient and Complete Black-Box Checking of Weak Isolation Levels [OOPSLA2 2024]
CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality Estimation [ICSE 24]
DBStorm: Generating Various Effective Workloads for Testing Isolation Levels [ISSTA 24]
PUPPY: Finding Performance Degradation Bugs in DBMSs via Limited-Optimization Plan Construction [ICSE 25]
Understanding and Detecting SQL Function Bugs [EuroSys 25]
Understanding and Reusing Test Suites Across Database Systems [SIGMOD 25]
SQLaser: Detecting DBMS Logic Bugs with Clause-Guided Fuzzing [arXiv 24]
THANOS: DBMS Bug Detection via Storage Engine Rotation Based Differential Testing [ICSE 25]
Conformance Testing of Relational DBMS Against SQL Specifications [ICSE 25]
Automatic Database Configuration Debugging using Retrieval-Augmented Language Models [SIGMOD 25]
Constant Optimization Driven Database System Testing [SIGMOD 25]

Static Analysis

Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach [PACMPL 24]

Storage

LSM-Tree

Dissecting, Designing, and Optimizing LSM-based Data Stores [SIGMOD 22 Tutorial]
Magma: A High Data Density Storage Engine Used in Couchbase [VLDB 22]
CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure [SIGMOD 24]
NULLS! Revisiting Null Representation in Modern Columnar Formats [DaMoN 24]
CAMAL: Optimizing LSM-trees via Active Learning [SIGMOD 25]

Proxy

Tigger: A Database Proxy That Bounces With User-Bypass [VLDB 23]

Data Loading

ConnectorX: Accelerating Data Loading From Databases to Dataframes [VLDB 22]

Database Kernel

Survey

What Goes Around Comes Around... And Around... [SIGMOD 24]

Others

MVCC

Scalable Garbage Collection for In-Memory MVCC Systems [VLDB 13]
Rethinking serializable multiversion concurrency control [VLDB 15]
An Empirical Evaluation of In-Memory Multi-Version Concurrency Control [VLDB 17]
Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting [SIGMOD 18]
Long-lived Transactions Made Less Harmful [SIGMOD 20]
Rethink the Scan in MVCC Databases [SIGMOD 21]
Diva: Making MVCC Systems HTAP-Friendly [SIGMOD 22]
Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems [VLDB 22]
Scalable and Robust Snapshot Isolation for High-Performance Storage Engines [VLDB 23]
One-shot Garbage Collection for In-memory OLTP through Temporality-aware Version Storage [SIGMOD 23]

HTAP

System Architecture

Linear Consistency

Sequential Consistency

Session Consistency

Survey

HTAP Databases: What is New and What is Next [SIGMOD 22]
Data Sharing Model and Optimization Strategies in HTAP Database Systems [Journal of Software 23]
HTAP Databases: A Survey [TKDE 24]
A survey on hybrid transactional and analytical processing [VLDB Journal 24]
Survey on Benchmarking Ability of HTAP Benchmarks [Journal of Software 24]

Kernel Optimization

Result Replay

DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay [SIGMOD 24]

Benchmark

OLTP

Dike: A Benchmark Suite for Distributed Transactional Databases [SIGMOD 23]
DBPA: A Benchmark for Transactional Database Performance Anomalies [SIGMOD 23]

OLAP

Why You Should Run TPC-DS: A Workload Analysis [VLDB 07]
The Making of TPC-DS [VLDB 06]
TPC-DS, Taking Decision Support Benchmarking to the Next Level [SIGMOD 02]
Generating Thousands of Benchmark Queries in Seconds [VLDB 04]

HTAP

Others

Multi-Model

Time Series

An Experimental Evaluation of Anomaly Detection in Time Series [VLDB 24]

Vector Database

Survey

Algorithm

FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework [VLDB 24]

Distributed Systems

Consistency in Non-Transactional Distributed Storage Systems [arXiv 15]
NOC-NOC: Towards Performance-optimal Distributed Transactions [SIGMOD 24]
Native Distributed Databases: Problems, Challenges and Opportunities [VLDB 24 Tutorial]

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
Ai4DB-Paper @ 67b7517		Ai4DB-Paper @ 67b7517
DBGiant-Industry-Paper @ 6e7bcff		DBGiant-Industry-Paper @ 6e7bcff
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

License

Wind-Gone/awesome-olap-paper

Folders and files

Latest commit

History

Repository files navigation

Awesome-OLAP-Paper

Introduction

Contributing

Acknowledge

Table of Content

Query-Aware Database Generation

Privacy

Survey

Query Schedule

Query Optimization

Query Rewrite

Cardinality Estimation

Histogram

Sampling

Others

Survey

Join Order

Join Algorithms

Cost Model

View

Survey

Index

Query Execution

Data Dependency Search

Query Compilation

Bugs Detection

Static Analysis

Storage

LSM-Tree

Proxy

Data Loading

Database Kernel

Survey

Others

MVCC

HTAP

System Architecture

Linear Consistency

Sequential Consistency

Session Consistency

Survey

Kernel Optimization

Result Replay

Benchmark

OLTP

OLAP

HTAP

Others

Multi-Model

Time Series

Vector Database

Survey

Algorithm

Distributed Systems

OLTP

AI4DB

Industry

Star History

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages