colexecargs: fix recent memory leak #64453

yuzefovich · 2021-04-30T05:34:01Z

colfetcher: set all unneeded vectors to all nulls

The cFetcher creates a batch with the same schema as the table it is
reading from. In many cases not all columns from the table are needed
for the query, so as an important perfomance optimization the cFetcher
doesn't decode the data for those columns. As a result, such unneeded
columns are left "unset". This works ok in most cases, however, if we
attempt to materialize such a batch with unset vectors, the conversion
to datums might encounter errors (e.g. UUID values must either be NULL
or have 16 bytes length).

This commit improves this situation slightly by tracking the set of
unneeded columns and setting those vectors to all NULL values. This
will allow to simplify the planning code a bit in the follow-up commit.

Release note: None

colbuilder: remove unnecessary complication when wrapping table reader

Previously, we had some custom code for the case when we supported the
table reader core but not the post-processing spec - we attempted to
revert to the core-pre-planning state and plan the whole table reader
with render expressions on top.

Given the previous commit, I think this is no longer necessary, so this
commit removes that special code in favor of the general handling of
only the post-processing spec via a noop processor. This commit was
prompted by some complications because of this old code for the
follow-up commit.

Release note: None

colexecargs: fix recent memory leak

In c3b1617 we introduced a new utility
struct that keeps information about the meta objects in the operator
tree. Those meta objects are tracked by several slices which are
resliced to be of length 0 when the "outer" object is released back to
the corresponding pool. However, the slices still end up holding
references to the old meta objects prohibiting those from being GCed.
Such a behavior results in a memory leak. This commit fixes the issue by
explicitly resetting the slices for reuse.

Fixes: #62320.
Fixes: #64093.

Release note: None

sql: audit implementations of Releasable interface of slices' reuse

This commit performed the audit of all slices that are kept by
components implementing execinfra.Releasable interface to make sure
that the slices that might be referencing large objects are deeply
reset. (By deep reset I mean all slots are set to nil so that the
possibly large objects could be garbage-collected.) This was prompted by
the previous commit which fixed a recent regression, but this commit
seems like a good idea on its own, and it might be worth backporting it
too.

Release note: None

cockroach-teamcity · 2021-04-30T05:34:07Z

This change is

yuzefovich · 2021-04-30T17:02:01Z

@nvanbenschoten @RaduBerinde I wonder if you could scrutinize the third commit a bit and possibly come up with an explanation for why it reduces the RAM usage noticeably. Some background info is that without that commit the RAM usage during schemachange/tpcc and tpccbench (and probably other) test runs is noticeably higher (#62320 (comment), #64093 (comment)).

Unfortunately, the heap profiling hasn't been very helpful in tracking that down. I only have guesses for why the third commit improves things (#62320 (comment)).

jordanlewis

The first 2 commits LGTM, 3rd I will review again a bit later

Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)

pkg/sql/colfetcher/cfetcher.go, line 1499 at r1 (raw file):

	// We need to set all values in "not needed" vectors to nulls because if the
	// batch is materialized (i.e. values are converted to datums), the
	// conversion of unset values might encounter an error.

bummer... can we instead teach the materializer to notice this case?

pkg/sql/logictest/testdata/logic_test/vectorize_overloads, line 713 at r1 (raw file):

└ Node 1
  └ *rowexec.noopProcessor
    └ *colfetcher.ColBatchScan

wrong commit i think

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

pkg/sql/colfetcher/cfetcher.go, line 1499 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

bummer... can we instead teach the materializer to notice this case?

It'll be a bit harder and less clean IMO - the materializer has the built-in assumption that all vectors need to be converted, and ColBatchScan is the only operator that might produce batches with unneeded vectors. We would need to examine the post-processing spec at the materializer level to learn which vectors are not needed, or we would have to plumb this information from the ColBatchScan somehow. This comes from the fact that we only the post-processing spec knows about projections and renders (TableReader core is an exception) that encoded this "unneediness" information.

I prefer the current approach because we're pushing this exceptional logic into the component that creates "exceptional" batches, that component already knows about not needed columns, and setting all unneeded vectors to all nulls works well for the conversion in the materializer.

pkg/sql/logictest/testdata/logic_test/vectorize_overloads, line 713 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

wrong commit i think

Yeah, I noticed it too - fixed.

yuzefovich · 2021-05-02T19:04:00Z

Changed the third commit to a smaller, more proper fix, but I think keeping the first two commits is still worth it.

yuzefovich · 2021-05-03T21:37:52Z

Added another commit auditing our slice reuse in Release implementations as per discussion with @jordanlewis. I think a couple of places could be concerning, so that commit might be worthy of a backport.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)

pkg/sql/colconv/vec_to_datum_tmpl.go, line 103 at r6 (raw file):

	for _, vec := range c.convertedVecs {
		if len(vec) > 0 {
			_ = vec[len(vec)-1]

What are we trying to do here? Why would for i := range vec have bound checks in the first place? By the way, I believe that for i := range vec { vec[i] = nil } has a fast path in Go and gets converted to a memset.

pkg/sql/colexec/colexecargs/op_creation.go, line 180 at r6 (raw file):

		// objects are still referenced by the corresponding sync.Pools, so the
		// references in r.Releasables will not be the reason for the objects to
		// not be garbage-collected.

Each sync.Pool grows and shrinks as necessary, it is possible that those other pools would shrink and leave this as the only reference.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)

pkg/sql/colconv/vec_to_datum_tmpl.go, line 103 at r6 (raw file):

Previously, RaduBerinde wrote…

What are we trying to do here? Why would for i := range vec have bound checks in the first place? By the way, I believe that for i := range vec { vec[i] = nil } has a fast path in Go and gets converted to a memset.

Good point - I was trying to eliminate bounds checks and blindly followed the way we usually do it in other places. The crucial difference is that here vec is a local variable, so the compiler can prove that the loop for i := range vec is always in bounds, whereas usually in other places the thing to iterate over is a field in the struct. I confirmed this here: https://godbolt.org/z/KevehETje. It does use memclrHasPointers which is likely what you were referring to as an optimization.

(A side thought is that possibly in other places we should follow the same pattern of declaring a local variable to access the field in the struct, and then iterate over the local variable rather than our somewhat ugly [len(s)-1] accesses outside of the for loop.)

Refactored.

pkg/sql/colexec/colexecargs/op_creation.go, line 180 at r6 (raw file):

Previously, RaduBerinde wrote…

Each sync.Pool grows and shrinks as necessary, it is possible that those other pools would shrink and leave this as the only reference.

Makes sense, fixed.

yuzefovich · 2021-05-04T21:30:48Z

@jordanlewis @RaduBerinde I believe that both of are on board with this PR, can someone give it another look and stamp if satisfied?

jordanlewis

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @yuzefovich)

pkg/sql/colflow/vectorized_flow.go, line 1264 at r7 (raw file):

	// slice). Unset the slot so that we don't keep the reference to the old
	// materializer.
	if len(r.processors) == 1 {

If this comment is true, why do we keep this as a slice at all? Should it just be a pointer?

yuzefovich

TFTRs!

bors r+

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)

pkg/sql/colflow/vectorized_flow.go, line 1264 at r7 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

If this comment is true, why do we keep this as a slice at all? Should it just be a pointer?

It's because in the row-based flows can have multiple processors, so we abstracted out SetProcessors method that takes in a pointer. We do pool the slice as part of vectorizedFlowCreatorHelper pooling and instantiate it with 1 capacity.

craig · 2021-05-05T01:26:26Z

Build failed:

GitHub CI (Cockroach)

The cFetcher creates a batch with the same schema as the table it is reading from. In many cases not all columns from the table are needed for the query, so as an important perfomance optimization the cFetcher doesn't decode the data for those columns. As a result, such unneeded columns are left "unset". This works ok in most cases, however, if we attempt to materialize such a batch with unset vectors, the conversion to datums might encounter errors (e.g. UUID values must either be NULL or have 16 bytes length). This commit improves this situation slightly by tracking the set of unneeded columns and setting those vectors to all NULL values. This will allow to simplify the planning code a bit in the follow-up commit. Release note: None

Previously, we had some custom code for the case when we supported the table reader core but not the post-processing spec - we attempted to revert to the core-pre-planning state and plan the whole table reader with render expressions on top. Given the previous commit, I think this is no longer necessary, so this commit removes that special code in favor of the general handling of only the post-processing spec via a noop processor. This commit was prompted by some complications because of this old code for the follow-up commit. Release note: None

In c3b1617 we introduced a new utility struct that keeps information about the meta objects in the operator tree. Those meta objects are tracked by several slices which are resliced to be of length 0 when the "outer" object is released back to the corresponding pool. However, the slices still end up holding references to the old meta objects prohibiting those from being GCed. Such a behavior results in a memory leak. This commit fixes the issue by explicitly resetting the slices for reuse. Release note: None

This commit performed the audit of all slices that are kept by components implementing `execinfra.Releasable` interface to make sure that the slices that might be referencing large objects are deeply reset. (By deep reset I mean all slots are set to `nil` so that the possibly large objects could be garbage-collected.) This was prompted by the previous commit which fixed a recent regression, but this commit seems like a good idea on its own, and it might be worth backporting it too. Release note: None

yuzefovich · 2021-05-05T04:22:03Z

Rebased on top of master.

bors r+

craig · 2021-05-05T05:17:07Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich changed the title ~~colexecargs: reduce copies when operating with helper struct~~ colexecargs: reduce allocations when operating with helper struct Apr 30, 2021

yuzefovich force-pushed the fix-meta-info branch 5 times, most recently from f8d1665 to e05af1c Compare April 30, 2021 16:30

yuzefovich requested review from jordanlewis and a team April 30, 2021 16:30

yuzefovich mentioned this pull request Apr 30, 2021

roachtest: tpcdsvec failed #64464

Closed

jordanlewis reviewed Apr 30, 2021

View reviewed changes

yuzefovich force-pushed the fix-meta-info branch from e05af1c to c455ceb Compare April 30, 2021 18:15

yuzefovich commented Apr 30, 2021

View reviewed changes

yuzefovich force-pushed the fix-meta-info branch from c455ceb to 4d2b210 Compare May 2, 2021 18:59

yuzefovich changed the title ~~colexecargs: reduce allocations when operating with helper struct~~ colexecargs: fix recent memory leak May 2, 2021

yuzefovich force-pushed the fix-meta-info branch 3 times, most recently from 1f20f91 to b121d57 Compare May 3, 2021 21:36

RaduBerinde reviewed May 3, 2021

View reviewed changes

yuzefovich force-pushed the fix-meta-info branch from b121d57 to 5961f62 Compare May 4, 2021 02:11

yuzefovich commented May 4, 2021

View reviewed changes

jordanlewis approved these changes May 5, 2021

View reviewed changes

yuzefovich commented May 5, 2021

View reviewed changes

yuzefovich added 3 commits May 4, 2021 21:21

yuzefovich force-pushed the fix-meta-info branch from 5961f62 to 99153b4 Compare May 5, 2021 04:21

craig bot merged commit ecf484d into cockroachdb:master May 5, 2021

yuzefovich deleted the fix-meta-info branch May 5, 2021 16:56

yuzefovich mentioned this pull request May 5, 2021

release-21.1: sql: audit implementations of Releasable interface of slices' reuse #64728

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexecargs: fix recent memory leak #64453

colexecargs: fix recent memory leak #64453

yuzefovich commented Apr 30, 2021 •

edited

Loading

cockroach-teamcity commented Apr 30, 2021

yuzefovich commented Apr 30, 2021

jordanlewis left a comment

yuzefovich left a comment •

edited

Loading

yuzefovich commented May 2, 2021

yuzefovich commented May 3, 2021

RaduBerinde left a comment

yuzefovich left a comment

yuzefovich commented May 4, 2021

jordanlewis left a comment

yuzefovich left a comment

craig bot commented May 5, 2021

yuzefovich commented May 5, 2021

craig bot commented May 5, 2021

colexecargs: fix recent memory leak #64453

colexecargs: fix recent memory leak #64453

Conversation

yuzefovich commented Apr 30, 2021 • edited Loading

cockroach-teamcity commented Apr 30, 2021

yuzefovich commented Apr 30, 2021

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich left a comment • edited Loading

Choose a reason for hiding this comment

yuzefovich commented May 2, 2021

yuzefovich commented May 3, 2021

RaduBerinde left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented May 4, 2021

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented May 5, 2021

yuzefovich commented May 5, 2021

craig bot commented May 5, 2021

yuzefovich commented Apr 30, 2021 •

edited

Loading

yuzefovich left a comment •

edited

Loading