feat: add dictionary encoding(draft, for discussion only) #3134
+181
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR tries to support dictionary encoding by integrating it with
MiniBlock PageLayout
.The general approach here is:
In a
MiniBlock PageLayout
, there is a optionaldictionary field
that stores a dictionary encoding if thisminiblock
has a dictionary.The rational for this is that if we dictionary encoding something, it's indices will definitely fall into a
MiniBlockLayout
.By doing this, we don't need to have a specific
DictionaryEncoding
, it can be anyArrayEncoding
.The
Dictionary
and theindices
are cascaded into another encoding automatically.Currently, the dictionary is stored inside the page along with
chunk meta data
andchunk data
, this is not ideal and is aTODO
task.This is a draft for discussion with the above idea so I only supported
FixedWidthDataBlock
with this encoding, the effort to add support forVariableWidthData
is trivial.some performance comparison with parquet(no snappy):
tpch lineitem table with scale factor 10.
for
l_extendedprice
, dictionary encoding is not applied due to large cardinality.#3123