-
Notifications
You must be signed in to change notification settings - Fork 66
Support multi-engine per table (batching) #113
Conversation
Hi contributor, thanks for your PR. This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically. |
/run-all-tests |
Removed `[tikv-importer] batch-size` to avoid confusion. Removed `[mydumper] min-region-size` since it is useless now.
bb752cc
to
54de1aa
Compare
/run-all-tests |
uint32 status = 3; | ||
int64 alloc_base = 4; | ||
repeated EngineCheckpointModel engines = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use map, a nature way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Engine IDs are assigned sequentially, so making it an array is more natural I think (a map would be just mapping 0, 1, 2, 3, ... to engines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you must make sure Engine IDs are assigned sequentially
, this logic should be written in design document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Updated the design doc.
if err := engineRows.Scan(&engineID, &status); err != nil { | ||
return errors.Trace(err) | ||
} | ||
for len(cp.Engines) <= engineID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use map too?
@@ -0,0 +1 @@ | |||
create database cpeng; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's cpeng
? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check-Point — Engines 🤔
&value.Chunk.Offset, &value.Chunk.EndOffset, &value.Chunk.PrevRowIDMax, &value.Chunk.RowIDMax, | ||
&kvcBytes, &kvcKVs, &kvcChecksum, | ||
); err != nil { | ||
return errors.Trace(err) | ||
} | ||
value.Checksum = verify.MakeKVChecksum(kvcBytes, kvcKVs, kvcChecksum) | ||
cp.Chunks = append(cp.Chunks, value) | ||
cp.Engines[engineID].Chunks = append(cp.Engines[engineID].Chunks, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is chunks
sorted as comment at L101?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... a subslice of a sorted slice is still sorted 😁
54de1aa
to
294f228
Compare
/run-all-tests |
59f529c
to
a2274b8
Compare
a2274b8
to
92e8dad
Compare
LGTM |
/run-all-tests |
LGTM |
I'll merge after confirming whether 10G is a good default size. Maybe this needs to be larger (the previous default was 500G). |
* mydump: non-uniform batch size * *: make the `batch-size-scale` configurable * *: implemented the optimized non-uniform strategy * tests: due to change of strategy, checkpoint_engines count becomes 4 again * mydump/region: slightly adjust the batch size computation * Use the exact result of 1/Beta(N, R) instead of an approximation * When the number of engines is small and the total engine size of the first (table-concurrency) batches exceed the table size, the last batch was truncated, and disrupt the pipeline. Now in these case we will reduce the batch size to avoid this disruption. * restore: log the SQL size and KV size of each engine for debugging * config: change default batch size and ratio given experiment result * config: added more explanation about batch-import-ratio
What problem does this PR solve?
Implements RFC 3 (a.k.a. Batching).
What is changed and how it works?
Decoupled 1 table = 1 engine. Now one table can produce multiple engines, partitioned by a
batch-size
, allowing import of table partially. See the design document above for details.Check List
Tests
Code changes
Side effects
Related changes
tidb-ansible
repository