-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Do not review. Prefer https://github.com/facebookincubator/velox/pull/8660] Add logic to serialize/deserialize SetAccumulators #8660
Conversation
✅ Deploy Preview for meta-velox canceled.
|
c71d810
to
c8b0c0a
Compare
0cb45bc
to
864d8c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit Read the code for primitive types and strings. Some comments below.
08a0b29
to
e2c8b17
Compare
@mbasmanova : Have updated this code post rebase and addressing comments. PTAL. |
e2c8b17
to
744a64f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
velox/exec/SetAccumulator.h
Outdated
auto* rawBuffer = flatResult->getRawStringBufferWithSpace(totalBytes, true); | ||
|
||
auto nullIndexValue = nullIndexSerializationValue(); | ||
memcpy(rawBuffer, &nullIndexValue, kSizeOfVector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use common::OutputByteStream instead of plain memcpy?
We can add seekp(position) method to it if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel seekp(position) breaks stream (which are contiguous usually) abstraction. You don't think that way ?
velox/exec/SetAccumulator.h
Outdated
void serialize(const VectorPtr& result, vector_size_t index) { | ||
auto* flatResult = result->as<FlatVector<StringView>>(); | ||
auto* rawBuffer = | ||
flatResult->getRawStringBufferWithSpace(stringSetBytes, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not work well when there are millions of values in the accumulator. Perhaps, we could put some limits to fail cleanly instead of crashing.
Spilling logic needs to calculate spill size for each row before deciding how many rows to spill at once, no? We may need to expose APIs to compute these sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some thinking it seems better to serialize into an ArrayVector rather than a single serialized string. I'm trying that approach.
velox/exec/SetAccumulator.h
Outdated
auto* flatResult = result->as<FlatVector<StringView>>(); | ||
auto* rawBuffer = flatResult->getRawStringBufferWithSpace(totalSize, true); | ||
|
||
SerializationStream stream(rawBuffer, totalSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse common::OutputByteStream here and add necessary APIs to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OutputByteStream is in velox/common/base/IOUtils.h which is a much lower level library. I feel adding a dependency of AddressableNonNullValueList which is at velox/exec layer to it is making it less re-usable.
This current code is a reasonable compromise.
wdyt ?
744a64f
to
ba48d90
Compare
804d81f
to
0eda534
Compare
@mbasmanova : Have updated the code to serialize to an ARRAY(VARBINARY) instead of a single String buffer. PTAL. |
80883f7
to
6ea1c43
Compare
@xiaoxmeng : Meng, Would appreciate a round of review. Thanks ! |
e9be986
to
e480a2e
Compare
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions! |
32ed1be
to
173b5d3
Compare
@xiaoxmeng : Meng, ping for review. Thanks |
173b5d3
to
b8fd99e
Compare
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions! |
b8fd99e
to
9aa2cef
Compare
This is the second in a set of PRs to add support for spilling distinct aggregations (see full version in #7791).
The logic to serialize/deserialize SetAccumulators is used in the DistinctAggregations for spilling.