Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

Support multi-engine per table (batching) #113

Merged
merged 13 commits into from
Jan 14, 2019
Merged

Support multi-engine per table (batching) #113

merged 13 commits into from
Jan 14, 2019

Conversation

kennytm
Copy link
Collaborator

@kennytm kennytm commented Dec 31, 2018

What problem does this PR solve?

Implements RFC 3 (a.k.a. Batching).

What is changed and how it works?

Decoupled 1 table = 1 engine. Now one table can produce multiple engines, partitioned by a batch-size, allowing import of table partially. See the design document above for details.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    • Tested on the 10T machines

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to update the documentation
  • Need to update the tidb-ansible repository
  • Need to be included in the release note

@kennytm kennytm added status/WIP Work in progress Should Update Docs Should update docs after this PR is merged. Remove this label once the docs are updated priority/important type/feature New feature labels Dec 31, 2018
@sre-bot
Copy link

sre-bot commented Dec 31, 2018

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

@kennytm
Copy link
Collaborator Author

kennytm commented Dec 31, 2018

/run-all-tests

@kennytm kennytm changed the title [WIP] Support multi-engine per table (batching) Support multi-engine per table (batching) Dec 31, 2018
@kennytm kennytm added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP Work in progress labels Dec 31, 2018
@kennytm
Copy link
Collaborator Author

kennytm commented Jan 2, 2019

/run-all-tests

lightning/kv/importer.go Show resolved Hide resolved
lightning/mydump/region.go Outdated Show resolved Hide resolved
uint32 status = 3;
int64 alloc_base = 4;
repeated EngineCheckpointModel engines = 6;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use map, a nature way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Engine IDs are assigned sequentially, so making it an array is more natural I think (a map would be just mapping 0, 1, 2, 3, ... to engines).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you must make sure Engine IDs are assigned sequentially, this logic should be written in design document

Copy link
Collaborator Author

@kennytm kennytm Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Updated the design doc.

if err := engineRows.Scan(&engineID, &status); err != nil {
return errors.Trace(err)
}
for len(cp.Engines) <= engineID {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use map too?

tests/checkpoint_engines/config.toml Show resolved Hide resolved
@@ -0,0 +1 @@
create database cpeng;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's cpeng? 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check-Point — Engines 🤔

&value.Chunk.Offset, &value.Chunk.EndOffset, &value.Chunk.PrevRowIDMax, &value.Chunk.RowIDMax,
&kvcBytes, &kvcKVs, &kvcChecksum,
); err != nil {
return errors.Trace(err)
}
value.Checksum = verify.MakeKVChecksum(kvcBytes, kvcKVs, kvcChecksum)
cp.Chunks = append(cp.Chunks, value)
cp.Engines[engineID].Chunks = append(cp.Engines[engineID].Chunks, value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is chunks sorted as comment at L101?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... a subslice of a sorted slice is still sorted 😁

@kennytm
Copy link
Collaborator Author

kennytm commented Jan 2, 2019

/run-all-tests

lightning/kv/importer.go Outdated Show resolved Hide resolved
@kennytm kennytm added status/LGT1 One reviewer already commented LGTM (LGTM1) and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Jan 3, 2019
@lonng
Copy link
Contributor

lonng commented Jan 3, 2019

LGTM

@lonng lonng added status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) status/LGT1 One reviewer already commented LGTM (LGTM1) and removed status/LGT1 One reviewer already commented LGTM (LGTM1) status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) labels Jan 3, 2019
@IANTHEREAL
Copy link
Collaborator

/run-all-tests

@IANTHEREAL
Copy link
Collaborator

LGTM

@IANTHEREAL IANTHEREAL added status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) and removed status/LGT1 One reviewer already commented LGTM (LGTM1) labels Jan 3, 2019
@kennytm
Copy link
Collaborator Author

kennytm commented Jan 3, 2019

I'll merge after confirming whether 10G is a good default size. Maybe this needs to be larger (the previous default was 500G).

* mydump: non-uniform batch size

* *: make the `batch-size-scale` configurable

* *: implemented the optimized non-uniform strategy

* tests: due to change of strategy, checkpoint_engines count becomes 4 again

* mydump/region: slightly adjust the batch size computation

* Use the exact result of 1/Beta(N, R) instead of an approximation
* When the number of engines is small and the total engine size of the
  first (table-concurrency) batches exceed the table size, the last batch
  was truncated, and disrupt the pipeline. Now in these case we will
  reduce the batch size to avoid this disruption.

* restore: log the SQL size and KV size of each engine for debugging

* config: change default batch size and ratio given experiment result

* config: added more explanation about batch-import-ratio
@kennytm kennytm merged commit f73d0f9 into master Jan 14, 2019
@kennytm kennytm deleted the kennytm/batching branch January 14, 2019 12:17
@kennytm kennytm removed the Should Update Docs Should update docs after this PR is merged. Remove this label once the docs are updated label Mar 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) type/feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants