Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization: Unified Sorter (#972) #1122

Merged
merged 4 commits into from
Nov 26, 2020

Conversation

ti-srebot
Copy link
Contributor

cherry-pick #972 to release-4.0


What problem does this PR solve?

#945

  • Improves serialization performance.
  • Supports concurrent sorting.
  • Enables automatic fallback to disk when memory is not enough for sorting.

What is changed and how it works?

  • Implemented UnifiedSorter.
  • Made UnifiedSorter the default sorter.
  • Moved mounting after sorting, to ease serialization to temp files.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change

Side effects

  • Possible performance regression
  • Increased code complexity

Benchmarks

Method

We generate mock KV-entries with 1KB data in each entry. We use 256 goroutines to generate data in parallel. There is no resolved event until all data has been fed to the sorter.

Results

Data Time
50GB 7 min
100GB 16 min
200GB 32 min

Machine specifications

CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Memory: 16GB DDR4 2133MHz * 8
Disk: Intel Corporation NVMe Datacenter SSD

Sorter configuration

{
	NumConcurrentWorker:  16,
	ChunkSizeLimit:       1 * 1024 * 1024 * 1024,   // 1GB
	MaxMemoryPressure:    60,
	MaxMemoryConsumption: 16 * 1024 * 1024 * 1024,  // 16GB
}

Release note

  • No release note

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot ti-srebot added component/puller Puller component. status/ptal Could you please take a look? subject/performance Denotes an issue or pull request is related to replication performance. type/4.0-cherry-pick labels Nov 25, 2020
@ti-srebot ti-srebot added this to the v4.0.9 milestone Nov 25, 2020
@liuzix
Copy link
Contributor

liuzix commented Nov 25, 2020

/run-all-tests

@liuzix
Copy link
Contributor

liuzix commented Nov 25, 2020

/run-all-tests

@amyangfei amyangfei merged commit 3221296 into pingcap:release-4.0 Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/puller Puller component. status/ptal Could you please take a look? subject/performance Denotes an issue or pull request is related to replication performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants