Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: use ScanRequest instead of ExportRequest during backfills #44663

Merged

Conversation

ajwerner
Copy link
Contributor

@ajwerner ajwerner commented Feb 3, 2020

This PR is motivated by the desire to get the memory usage of CDC under
control in the presense of much larger ranges. Currently when a changefeed
decides it needs to do a backfill, it breaks the spans up along range
boundaries and then fetches the data (with some parallelism) for the backfill.

The memory overhead was somewhat bounded by the range size. If we want to make
the range size dramatically larger, the memory usage would become a function of
that new, much larger range size.

Fortunately, we don't have much need for these ExportRequests any more.
Another fascinating revelation of late is that the ScanResponse does indeed
include MVCC timestamps (not the we necessarily needed them but it's a good
idea to keep them for compatibility).

The ScanRequest permits currently a limit on NumRows which this commit
utilized. I wanted to get this change typed in anticipation of #44341 which
will provide a limit on NumBytes.

I retained the existing parallelism as ScanRequests with limits are not
parallel.

I would like to do some benchmarking but I feel pretty okay about the
testing we have in place already. @danhhz what do you want to see here?

Relates to #39717.

Release note: None.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: if the roachtests are happy, i'm happy. cdc/initial-scan is designed to stress exactly this, i'd be curious how long it takes before and after this change

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/ccl/changefeedccl/poller.go, line 453 at r1 (raw file):

	ctx context.Context, span roachpb.Span, ts hlc.Timestamp, withDiff bool,
) error {
	txn := p.db.NewTxn(ctx, "poller backfill")

nit: i'd think "changefeed backfill" would be more evocative


pkg/ccl/changefeedccl/poller.go, line 478 at r1 (raw file):

		afterScan := timeutil.Now()
		res := b.RawResponse().Responses[0].GetScan()
		for _, br := range res.BatchResponses {

i'd still keep the decode+AddKV part as a separate method for readability but ymmv


pkg/ccl/changefeedccl/poller.go, line 488 at r1 (raw file):

				var prevVal roachpb.Value
				if withDiff {
					prevVal = kv.Value

looks like you lost a (somewhat important) comment here

@ajwerner ajwerner force-pushed the ajwerner/scan-requests-in-cdc-backfill branch from cd5f1c0 to 7c594dc Compare February 4, 2020 20:03
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it with the 8M row limit and got roughly the same as last night. Running it again. I'm 99% sure the sink is the bottleneck. I accidentally for one run ran it where it didn't call slurp and the whole scan took 6s. I think it's all sort of in the noise. It's possible that the scan at 1<<18 rows is a hair slower but it doesn't seem statistically significant. Anyway, I intend to make it use something like 16MB when we get the size limit at which point it should be better.

I think I'd like to merge this given it doesn't seem egregiously worse.

export: 8m12.375818885s
export: 9m43.823838146s
export: 9m17.891210167s
export: 8m34.513046008s
8<<20: 8m37.686091157s
1<<20: 9m3.046564286s
1<<18: 8m28.071550859s
1<<18: 9m49.689186991s
1<<18: 11m20.302953236s

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/ccl/changefeedccl/poller.go, line 453 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

nit: i'd think "changefeed backfill" would be more evocative

Done.


pkg/ccl/changefeedccl/poller.go, line 478 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

i'd still keep the decode+AddKV part as a separate method for readability but ymmv

Done.


pkg/ccl/changefeedccl/poller.go, line 488 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

looks like you lost a (somewhat important) comment here

Done.

This PR is motivated by the desire to get the memory usage of CDC under
control in the presense of much larger ranges. Currently when a changefeed
decides it needs to do a backfill, it breaks the spans up along range
boundaries and then fetches the data (with some parallelism) for the backfill.

The memory overhead was somewhat bounded by the range size. If we want to make
the range size dramatically larger, the memory usage would become a function of
that new, much larger range size.

Fortunately, we don't have much need for these `ExportRequest`s any more.
Another fascinating revelation of late is that the `ScanResponse` does indeed
include MVCC timestamps (not the we necessarily needed them but it's a good
idea to keep them for compatibility).

The `ScanRequest` permits currently a limit on `NumRows` which this commit
utilized. I wanted to get this change typed in anticipation of cockroachdb#44341 which
will provide a limit on `NumBytes`.

I retained the existing parallelism as ScanRequests with limits are not
parallel.

I would like to do some benchmarking but I feel pretty okay about the
testing we have in place already. @danhhz what do you want to see here?

Relates to cockroachdb#39717.

Release note: None.
@ajwerner ajwerner force-pushed the ajwerner/scan-requests-in-cdc-backfill branch from 7c594dc to 78a3ee8 Compare February 4, 2020 20:37
@danhhz
Copy link
Contributor

danhhz commented Feb 4, 2020

As long as the performance is in the same ballpark, it's fine by me. Those numbers look okay, merge away!

@ajwerner
Copy link
Contributor Author

ajwerner commented Feb 4, 2020

Flaked on #41941

@ajwerner ajwerner marked this pull request as ready for review February 4, 2020 21:03
@ajwerner
Copy link
Contributor Author

ajwerner commented Feb 4, 2020

TFTR!

bors r=danhhz

craig bot pushed a commit that referenced this pull request Feb 4, 2020
44663: changefeedccl: use ScanRequest instead of ExportRequest during backfills r=danhhz a=ajwerner

This PR is motivated by the desire to get the memory usage of CDC under
control in the presense of much larger ranges. Currently when a changefeed
decides it needs to do a backfill, it breaks the spans up along range
boundaries and then fetches the data (with some parallelism) for the backfill.

The memory overhead was somewhat bounded by the range size. If we want to make
the range size dramatically larger, the memory usage would become a function of
that new, much larger range size.

Fortunately, we don't have much need for these `ExportRequest`s any more.
Another fascinating revelation of late is that the `ScanResponse` does indeed
include MVCC timestamps (not the we necessarily needed them but it's a good
idea to keep them for compatibility).

The `ScanRequest` permits currently a limit on `NumRows` which this commit
utilized. I wanted to get this change typed in anticipation of #44341 which
will provide a limit on `NumBytes`.

I retained the existing parallelism as ScanRequests with limits are not
parallel.

I would like to do some benchmarking but I feel pretty okay about the
testing we have in place already. @danhhz what do you want to see here?

Relates to #39717.

Release note: None.

44719: sql: add telemetry for uses of alter primary key r=otan a=rohany

Fixes #44716.

This PR adds a telemetry counter for uses
of the alter primary key command.

Release note (sql change): This PR adds collected telemetry
from clusters upon using the alter primary key command.

44721: vendor: bump pebble to 89adc50375ffd11c8e62f46f1a5c320012cffafe r=petermattis a=petermattis

* db: additional tweak to the sstable boundary generation
* db: add memTable.logSeqNum
* db: force flushing of overlapping queued memtables during ingestion
* tool: lsm visualization tool
* db: consistently handle key decoding failure
* cmd/pebble: fix lint for maxOpsPerSec's type inference
* tool: add "find" command
* cmd/pebble: fix --rate flag
* internal/metamorphic: use the strict MemFS and add an operation to reset the DB
* db: make DB.Close() wait for any ongoing deletion of obsolete files
* sstable: encode varints directly into buf in blockWriter.store
* sstable: micro-optimize Writer.addPoint()
* sstable: minor cleanup of Writer/blockWriter
* sstable: micro-optimize blockWriter.store

Fixes #44631 

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Co-authored-by: Peter Mattis <petermattis@gmail.com>
@craig
Copy link
Contributor

craig bot commented Feb 5, 2020

Build succeeded

@craig craig bot merged commit 78a3ee8 into cockroachdb:master Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants