Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix double accounting of memory when spilling aggregation is used #413

Merged
merged 5 commits into from
Mar 12, 2019

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented Mar 8, 2019

No description provided.

@sopel39 sopel39 requested a review from findepi March 8, 2019 15:26
@cla-bot cla-bot bot added the cla-signed label Mar 8, 2019
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @sopel39

(everything except last commit) lgtm % minor comments

@@ -272,6 +273,7 @@ public OperatorFactory duplicate()
private final HashCollisionsCounter hashCollisionsCounter;

private HashAggregationBuilder aggregationBuilder;
private LocalMemoryContext aggregationBuilderMemoryContext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memoryContext?

@@ -368,6 +370,10 @@ public void addInput(Page page)
if (aggregationBuilder == null) {
// TODO: We ignore spillEnabled here if any aggregate has ORDER BY clause or DISTINCT because they are not yet implemented for spilling.
if (step.isOutputPartial() || !spillEnabled || hasOrderBy() || hasDistinct()) {
aggregationBuilderMemoryContext = operatorContext.localUserMemoryContext();
if (useSystemMemory) {
aggregationBuilderMemoryContext = operatorContext.localSystemMemoryContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass MemoryContext into InMemoryHashAggregationBuilder?

Or, maybe better, why not have this assignment in the constructor?
As it is now, it looks like creating a Mem Context multiple times (which i think is not really the case).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass MemoryContext into InMemoryHashAggregationBuilder?

I tried that approach, but since InMemoryHashAggregationBuilder is used within SpillableHashAggregationBuilder it cannot account memory on it's own. This is because SpillableHashAggregationBuilder changes memory pools from revocable to user and InMemoryHashAggregationBuilder doesn't (and shouldn't) know about this flip.
Ideally (TODO), SpillableHashAggregationBuilder should use underlying accumulators and GroupByHash directly and omit InMemoryHashAggregationBuilder.

Or, maybe better, why not have this assignment in the constructor?

That makes sense.

@@ -510,6 +522,10 @@ private void closeAggregationBuilder()
// aggregationBuilder.close() will release all memory reserved in memory accounting.
// The reference must be set to null afterwards to avoid unaccounted memory.
aggregationBuilder = null;
if (aggregationBuilderMemoryContext != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can't be null at this point

true,
useSystemMemory);
() -> {
aggregationBuilderMemoryContext.setBytes(((InMemoryHashAggregationBuilder) aggregationBuilder).getSizeInMemory());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InMemoryHashAggregationBuilder instance gets a callback in its constructor, but the callback refers to that instance. Should InMemoryHashAggregationBuilder use this callback in the ctor to update initial memory occupancy, this would fail.
To fix this, UpdateMemory should take long bytes parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix this, UpdateMemory should take long bytes parameter.

GroupByHash would need to be passed LocalMemoryContext as an argument then. I also tried this approach, but it explodes the changes and still doesn't nicely solve problem of flipping memory from revocable to user (e.g: GroupByHash would need to keep getEstimatedSize method, but also accept LocalMemoryContext as arg).

Objects should either:

  • expose it's memory usage via getEstimatedSize method (if they don't "own" the memory)
  • account for memory themselves (via LocalMemoryContext). In this case AggregatedMemoryContext should be passed to constructor of such objects.

Mixing both usage patterns is confusing.

@@ -510,6 +522,10 @@ private void closeAggregationBuilder()
// aggregationBuilder.close() will release all memory reserved in memory accounting.
// The reference must be set to null afterwards to avoid unaccounted memory.
aggregationBuilder = null;
if (aggregationBuilderMemoryContext != null) {
aggregationBuilderMemoryContext.setBytes(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're zeroing memory contexts at the end of this method.
Either add operatorContext.localSystemMemoryContext().setBytes(0); there (instead of this line), or remove the all other .setBytes(0) calls there. Or explain :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Didn't notice that.
Just memoryContext.setBytes(0); at end of this method is enough

// if before building result from hashAggregationBuilder we would convert it to "read only" version.
// Read only version of GroupByHash from hashAggregationBuilder could be compacted by dropping
// most of it's field, freeing up some memory that could be used for sorting.
return hashAggregationBuilder.getSizeInMemory() + hashAggregationBuilder.getGroupIdsSortingSize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hashAggregationBuilder.getSizeInMemory() does not include hashAggregationBuilder.getGroupIdsSortingSize()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashAggregationBuilder.getGroupIdsSortingSize() part is only materialized in memory when buildHashSortedResult is called. This happens only when unspilling. Therefore getSizeInMemoryWhenUnspilling is used to estimate aggregation size when unspilling only.

@@ -219,6 +210,15 @@ private boolean shouldMergeWithMemory(long memorySize)
}
}

private long getSizeInMemoryWhenUnspilling()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the old method name getSizeInMemory.

  • the fact that it's only invoked after spiller is created seems not relevant to semantics of the method
  • the method is invoked before spilling (ie before getFutureValue(spillToDisk());), so it's not "when unspilling"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method is invoked before spilling (ie before getFutureValue(spillToDisk());), so it's not "when unspilling"

It is used to estimate size of memory during unspilling phase and to decide if we should merge with mem or not. This method has very limited use case and the previous name was confusing

InMemoryHashAggregationBuilder was doing it's own memory
accounting for the purpose of GroupByHash yielding.
When SpillableHashAggregationBuilder is used it also
accounts InMemoryHashAggregationBuilder as a revocable memory.
Therefore memory was accounted twice.

This commit removes memory accoutning from
InMemoryHashAggregationBuilder.
@sopel39 sopel39 merged commit fe4271c into trinodb:master Mar 12, 2019
@sopel39 sopel39 deleted the ks/fix_spill_aggregation branch March 12, 2019 15:10
@findepi findepi added this to the 306 milestone Mar 13, 2019
@sopel39 sopel39 mentioned this pull request Mar 13, 2019
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants