Skip to content

mv initial level fix

Matthew Von-Maszewski edited this page Jun 29, 2015 · 5 revisions

Status

  • merged to master -
  • code complete - June 17, 2015
  • development started - June 17, 2015

History / Context

Google's Version::PickLevelForMemTableOutput() routine was not beneficial to Riak's heavily random write loads. In fact, it got in the way of some early development ideas and performance research. Therefore it was disabled in Basho's version of leveldb long ago. Now Basho is examining sequential key loading in both traditional Riak handoff and new features. The routine does provide modest benefit if made aware of Basho's other leveldb additions such as multiple overlap levels and tiered storage. This branch reactivates the routine.

Basho's previous code always wrote a newly filled memory table to level-0. This branch examines the possibility of writing that file to level-2 or level-3 based upon various criteria.

Branch Description

The decision as to which level to use, other than the default level-0, is stretched over two routines: Version::PickLevelForMemTableOutput() and DBImpl::WriteLevel0Table(). Combined they evaluate the four questions:

  • does the new file overlap key ranges of existing files at level-0, level-1, level-2 or level-3?
  • would its addition to level-2 or level-3 violate the m_MaxGrandParentOverlapBytes rule?
  • are compactions running against any key ranges in level-1, level-2, or level-3?
  • is the destination level, level-2 or level-3, within the "slow tier"?

The combined logic selects the highest level of level-2 and level-3 where the answer to all four questions above is "no". Otherwise, defaults to adding the file to level-0.

db/version_set.cc / db/version_set.h

Version::PickLevelForMemTableOutput() is rewritten to consider two of the four questions when picking the initial level for placement of a memory table:

  • does the new file overlap key ranges of existing files at level-0, level-1, level-2 or level-3?
  • would its addition to level-2 or level-3 violate the m_MaxGrandParentOverlapBytes rule?

The logic selects the highest level of level-2 and level-3 where the answer to both questions above is "no". Otherwise, defaults to adding the file to level-0.

VersionSet::NeighborCompactionsQuiet() is a new function that takes an existing line out of VersionSet::Finalize() to make the logic rule available both to VersionSet::Finalize() and DBImpl::WriteLevel0Table() (in db/db_impl.cc).

db/db_impl.cc

DBImpl::WriteLevel0Table() is the only user of the Version::PickLevelForMemTableOutput() routine. It answers the remaining two of four questions:

  • are compactions running against any key ranges in level-1, level-2, or level-3?
  • is the destination level, level-2 or level-3, within the "slow tier"?
Clone this wiki locally