Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexecargs: fix recent memory leak #64453

Merged
merged 4 commits into from
May 5, 2021
Merged

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Apr 30, 2021

colfetcher: set all unneeded vectors to all nulls

The cFetcher creates a batch with the same schema as the table it is
reading from. In many cases not all columns from the table are needed
for the query, so as an important perfomance optimization the cFetcher
doesn't decode the data for those columns. As a result, such unneeded
columns are left "unset". This works ok in most cases, however, if we
attempt to materialize such a batch with unset vectors, the conversion
to datums might encounter errors (e.g. UUID values must either be NULL
or have 16 bytes length).

This commit improves this situation slightly by tracking the set of
unneeded columns and setting those vectors to all NULL values. This
will allow to simplify the planning code a bit in the follow-up commit.

Release note: None

colbuilder: remove unnecessary complication when wrapping table reader

Previously, we had some custom code for the case when we supported the
table reader core but not the post-processing spec - we attempted to
revert to the core-pre-planning state and plan the whole table reader
with render expressions on top.

Given the previous commit, I think this is no longer necessary, so this
commit removes that special code in favor of the general handling of
only the post-processing spec via a noop processor. This commit was
prompted by some complications because of this old code for the
follow-up commit.

Release note: None

colexecargs: fix recent memory leak

In c3b1617 we introduced a new utility
struct that keeps information about the meta objects in the operator
tree. Those meta objects are tracked by several slices which are
resliced to be of length 0 when the "outer" object is released back to
the corresponding pool. However, the slices still end up holding
references to the old meta objects prohibiting those from being GCed.
Such a behavior results in a memory leak. This commit fixes the issue by
explicitly resetting the slices for reuse.

Fixes: #62320.
Fixes: #64093.

Release note: None

sql: audit implementations of Releasable interface of slices' reuse

This commit performed the audit of all slices that are kept by
components implementing execinfra.Releasable interface to make sure
that the slices that might be referencing large objects are deeply
reset. (By deep reset I mean all slots are set to nil so that the
possibly large objects could be garbage-collected.) This was prompted by
the previous commit which fixed a recent regression, but this commit
seems like a good idea on its own, and it might be worth backporting it
too.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich changed the title colexecargs: reduce copies when operating with helper struct colexecargs: reduce allocations when operating with helper struct Apr 30, 2021
@yuzefovich yuzefovich force-pushed the fix-meta-info branch 5 times, most recently from f8d1665 to e05af1c Compare April 30, 2021 16:30
@yuzefovich yuzefovich requested review from jordanlewis and a team April 30, 2021 16:30
@yuzefovich
Copy link
Member Author

@nvanbenschoten @RaduBerinde I wonder if you could scrutinize the third commit a bit and possibly come up with an explanation for why it reduces the RAM usage noticeably. Some background info is that without that commit the RAM usage during schemachange/tpcc and tpccbench (and probably other) test runs is noticeably higher (#62320 (comment), #64093 (comment)).

Unfortunately, the heap profiling hasn't been very helpful in tracking that down. I only have guesses for why the third commit improves things (#62320 (comment)).

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first 2 commits LGTM, 3rd I will review again a bit later

Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/colfetcher/cfetcher.go, line 1499 at r1 (raw file):

	// We need to set all values in "not needed" vectors to nulls because if the
	// batch is materialized (i.e. values are converted to datums), the
	// conversion of unset values might encounter an error.

bummer... can we instead teach the materializer to notice this case?


pkg/sql/logictest/testdata/logic_test/vectorize_overloads, line 713 at r1 (raw file):

└ Node 1
  └ *rowexec.noopProcessor
    └ *colfetcher.ColBatchScan

wrong commit i think

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/colfetcher/cfetcher.go, line 1499 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

bummer... can we instead teach the materializer to notice this case?

It'll be a bit harder and less clean IMO - the materializer has the built-in assumption that all vectors need to be converted, and ColBatchScan is the only operator that might produce batches with unneeded vectors. We would need to examine the post-processing spec at the materializer level to learn which vectors are not needed, or we would have to plumb this information from the ColBatchScan somehow. This comes from the fact that we only the post-processing spec knows about projections and renders (TableReader core is an exception) that encoded this "unneediness" information.

I prefer the current approach because we're pushing this exceptional logic into the component that creates "exceptional" batches, that component already knows about not needed columns, and setting all unneeded vectors to all nulls works well for the conversion in the materializer.


pkg/sql/logictest/testdata/logic_test/vectorize_overloads, line 713 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

wrong commit i think

Yeah, I noticed it too - fixed.

@yuzefovich yuzefovich changed the title colexecargs: reduce allocations when operating with helper struct colexecargs: fix recent memory leak May 2, 2021
@yuzefovich
Copy link
Member Author

Changed the third commit to a smaller, more proper fix, but I think keeping the first two commits is still worth it.

@yuzefovich yuzefovich force-pushed the fix-meta-info branch 3 times, most recently from 1f20f91 to b121d57 Compare May 3, 2021 21:36
@yuzefovich
Copy link
Member Author

Added another commit auditing our slice reuse in Release implementations as per discussion with @jordanlewis. I think a couple of places could be concerning, so that commit might be worthy of a backport.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/colconv/vec_to_datum_tmpl.go, line 103 at r6 (raw file):

	for _, vec := range c.convertedVecs {
		if len(vec) > 0 {
			_ = vec[len(vec)-1]

What are we trying to do here? Why would for i := range vec have bound checks in the first place? By the way, I believe that for i := range vec { vec[i] = nil } has a fast path in Go and gets converted to a memset.


pkg/sql/colexec/colexecargs/op_creation.go, line 180 at r6 (raw file):

		// objects are still referenced by the corresponding sync.Pools, so the
		// references in r.Releasables will not be the reason for the objects to
		// not be garbage-collected.

Each sync.Pool grows and shrinks as necessary, it is possible that those other pools would shrink and leave this as the only reference.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/colconv/vec_to_datum_tmpl.go, line 103 at r6 (raw file):

Previously, RaduBerinde wrote…

What are we trying to do here? Why would for i := range vec have bound checks in the first place? By the way, I believe that for i := range vec { vec[i] = nil } has a fast path in Go and gets converted to a memset.

Good point - I was trying to eliminate bounds checks and blindly followed the way we usually do it in other places. The crucial difference is that here vec is a local variable, so the compiler can prove that the loop for i := range vec is always in bounds, whereas usually in other places the thing to iterate over is a field in the struct. I confirmed this here: https://godbolt.org/z/KevehETje. It does use memclrHasPointers which is likely what you were referring to as an optimization.

(A side thought is that possibly in other places we should follow the same pattern of declaring a local variable to access the field in the struct, and then iterate over the local variable rather than our somewhat ugly [len(s)-1] accesses outside of the for loop.)

Refactored.


pkg/sql/colexec/colexecargs/op_creation.go, line 180 at r6 (raw file):

Previously, RaduBerinde wrote…

Each sync.Pool grows and shrinks as necessary, it is possible that those other pools would shrink and leave this as the only reference.

Makes sense, fixed.

@yuzefovich
Copy link
Member Author

@jordanlewis @RaduBerinde I believe that both of are on board with this PR, can someone give it another look and stamp if satisfied?

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @yuzefovich)


pkg/sql/colflow/vectorized_flow.go, line 1264 at r7 (raw file):

	// slice). Unset the slot so that we don't keep the reference to the old
	// materializer.
	if len(r.processors) == 1 {

If this comment is true, why do we keep this as a slice at all? Should it just be a pointer?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/colflow/vectorized_flow.go, line 1264 at r7 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

If this comment is true, why do we keep this as a slice at all? Should it just be a pointer?

It's because in the row-based flows can have multiple processors, so we abstracted out SetProcessors method that takes in a pointer. We do pool the slice as part of vectorizedFlowCreatorHelper pooling and instantiate it with 1 capacity.

@craig
Copy link
Contributor

craig bot commented May 5, 2021

Build failed:

The cFetcher creates a batch with the same schema as the table it is
reading from. In many cases not all columns from the table are needed
for the query, so as an important perfomance optimization the cFetcher
doesn't decode the data for those columns. As a result, such unneeded
columns are left "unset". This works ok in most cases, however, if we
attempt to materialize such a batch with unset vectors, the conversion
to datums might encounter errors (e.g. UUID values must either be NULL
or have 16 bytes length).

This commit improves this situation slightly by tracking the set of
unneeded columns and setting those vectors to all NULL values. This
will allow to simplify the planning code a bit in the follow-up commit.

Release note: None
Previously, we had some custom code for the case when we supported the
table reader core but not the post-processing spec - we attempted to
revert to the core-pre-planning state and plan the whole table reader
with render expressions on top.

Given the previous commit, I think this is no longer necessary, so this
commit removes that special code in favor of the general handling of
only the post-processing spec via a noop processor. This commit was
prompted by some complications because of this old code for the
follow-up commit.

Release note: None
In c3b1617 we introduced a new utility
struct that keeps information about the meta objects in the operator
tree. Those meta objects are tracked by several slices which are
resliced to be of length 0 when the "outer" object is released back to
the corresponding pool. However, the slices still end up holding
references to the old meta objects prohibiting those from being GCed.
Such a behavior results in a memory leak. This commit fixes the issue by
explicitly resetting the slices for reuse.

Release note: None
This commit performed the audit of all slices that are kept by
components implementing `execinfra.Releasable` interface to make sure
that the slices that might be referencing large objects are deeply
reset. (By deep reset I mean all slots are set to `nil` so that the
possibly large objects could be garbage-collected.) This was prompted by
the previous commit which fixed a recent regression, but this commit
seems like a good idea on its own, and it might be worth backporting it
too.

Release note: None
@yuzefovich
Copy link
Member Author

Rebased on top of master.

bors r+

@craig
Copy link
Contributor

craig bot commented May 5, 2021

Build succeeded:

@craig craig bot merged commit ecf484d into cockroachdb:master May 5, 2021
@yuzefovich yuzefovich deleted the fix-meta-info branch May 5, 2021 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants