Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: hotspot: solution-based key-rate-aware balance solver #2141

Merged
merged 31 commits into from
Feb 26, 2020

Conversation

Luffbee
Copy link
Contributor

@Luffbee Luffbee commented Feb 18, 2020

What problem does this PR solve?

The second step of #2139 .

What is changed and how it works?

Introduce solution based balance solver.

Check List

Tests

  • Unit test

@Luffbee Luffbee added type/enhancement The issue or PR belongs to an enhancement. component/schedule Scheduling logic. labels Feb 18, 2020
@Luffbee Luffbee added this to the v4.0.0-beta.1 milestone Feb 18, 2020
@Luffbee Luffbee modified the milestones: v4.0.0-beta.1, v4.0.0-rc Feb 20, 2020
@codecov-io
Copy link

codecov-io commented Feb 21, 2020

Codecov Report

Merging #2141 into master will decrease coverage by 0.22%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2141      +/-   ##
==========================================
- Coverage   76.27%   76.05%   -0.23%     
==========================================
  Files         195      195              
  Lines       20577    20577              
==========================================
- Hits        15696    15650      -46     
- Misses       3703     3734      +31     
- Partials     1178     1193      +15
Impacted Files Coverage Δ
server/id/id.go 77.27% <ø> (ø) ⬆️
server/grpc_service.go 58.36% <ø> (+0.21%) ⬆️
server/config_manager/config_manager.go 77.01% <ø> (ø) ⬆️
server/statistics/region.go 100% <ø> (ø) ⬆️
server/core/region_tree.go 93.27% <ø> (ø) ⬆️
server/api/member.go 60.83% <ø> (ø) ⬆️
server/schedule/placement/rule_manager.go 86.45% <ø> (ø) ⬆️
tests/config.go 95.91% <ø> (ø) ⬆️
server/schedulers/shuffle_region.go 84.28% <ø> (ø) ⬆️
server/statistics/store.go 70.87% <ø> (ø) ⬆️
... and 118 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4600833...4ac422a. Read the comment docs.

@Luffbee Luffbee changed the title scheduler: hotspot: solution based balance solver scheduler: hotspot: solution-based key-rate-aware balance solver Feb 22, 2020
@nolouch
Copy link
Contributor

nolouch commented Feb 24, 2020

PTAL @disksing @rleungx

server/schedulers/hot_region.go Outdated Show resolved Hide resolved
server/schedulers/hot_region.go Outdated Show resolved Hide resolved
server/schedulers/hot_region.go Show resolved Hide resolved
server/schedulers/hot_region.go Outdated Show resolved Hide resolved
if len(ops) > 0 {
return ops
}
// prefer to balance by peer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the retry loop, will it affect the speed of hot spot scheduling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it won't. The the statistics data is not changed for each iteration, and new balanceSolver will try almost all possible solutions.

}
} else {
keyDecRatio := (dstLd.KeyRate + peer.GetKeyRate()) / (srcLd.KeyRate + 1)
keyHot := peer.GetKeyRate() >= 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be used in other place, so I don't think it is necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant is easy to maintain and the meaning is much more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add two constants: minHotByteRate and minHotKeyRate.

case read:
return readLeader
}
return resourceTypeLen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it cause panic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only cause panic when the arguments are invalid. I think this is acceptable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to panic in this function rather than use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

server/cluster/coordinator.go Show resolved Hide resolved
server/schedulers/hot_region.go Outdated Show resolved Hide resolved
server/schedulers/hot_region.go Show resolved Hide resolved
server/schedulers/hot_region.go Outdated Show resolved Hide resolved
if bs.rwTy == write && bs.opTy == transferLeader {
lpCmp = sliceLPCmp(
minLPCmp(negLoadCmp(sliceLoadCmp(
stLdRankCmp(stLdCount, stepRank(bs.maxSrc.Count, bs.rankStep.Count)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the meaning of rankStep? it seems always the same in the rank compare.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used to discretization. Map float64 to int64. For all values in the interval [rank0 + i * step, rank0 + (i+1) * step), the rank the i.

Copy link
Contributor

@nolouch nolouch Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. If I understand correctly, the ranking i (stepRank return) can only be 0 or 1. Can this distinguish those stores very well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, rank i = int64((value - rank0) / step) can be any int64.

Copy link
Contributor

@nolouch nolouch Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. I missed the stepRatio, can u add some comments about step ratio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added.

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@nolouch
Copy link
Contributor

nolouch commented Feb 26, 2020

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 26, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Feb 26, 2020

/run-all-tests

@sre-bot sre-bot merged commit c8589fc into tikv:master Feb 26, 2020
@Luffbee Luffbee deleted the solution-based branch February 26, 2020 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/schedule Scheduling logic. status/can-merge Indicates a PR has been approved by a committer. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants