-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Allocating Some Indexes and Bloom Filters using Huge Page TLB
When the DB uses tons of gigabytes of memory, there is a higher chance that when accessing data in memory, the program will get data TLB miss, as well as cache misses to retrieve the mapping. When using hash table based indexes and bloom filters, provided by some mem tables and table readers, users are more prone to it, because the data locality is worse. Those indexes and blooms are perfect candidates to be allocated in huge page TLB. When you see a high data TLB overheads in data structures where the feature is supported, consider to give it a try.
Currently, it is only supported in Linux.
- You need to pre-reserve huge pages in linux
- Find out the huge page size to allocate
See Linux Documentation/vm/hugetlbpage.txt for details.
Here is where the feature is available to use and how:
-
mem table's bloom filter: set Options.memtable_prefix_bloom_huge_page_tlb_size to be the huge page size.
-
hash linked list mem table's indexes and bloom filters: when calling NewHashLinkListRepFactory() to create a factory object for the mem table, pass huge page size into the parameter huge_page_tlb_size.
-
PlainTableReader's indexes and bloom filters. When creating table factory for plain tables using NewPlainTableFactory() or NewTotalOrderPlainTableFactory(), set the parameter huge_page_tlb_size to be the huge page size.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator (Experimental)
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc