-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UCP] Noise reduction support when cluster auto scaling #2307
[UCP] Noise reduction support when cluster auto scaling #2307
Conversation
Thanks for your contribution. If your PR get merged, you will be rewarded 2000 points. |
52cc533
to
de74eb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems generally good to me. Please address the comment.
And there is one more question for me about the time-window noise-reduction. It seems the tidb autoscaler should have the same result Scalein / ScaleOut
in a certain time then it would auto-scale. If the Operator is down during the time-window, would the noise reduction fail?
For example:
0:01s Tidb auto-scaler have ScaleOut result, the status become ScaleOut
0:02s Operator Down
4:01s Operator Up
4:02s Tidb auto-scaler have ScaleOut result again, the auto-scaler admit to scale-out tidb.
In this case, I think the noise reduction fail
5140412
to
90e23cf
Compare
305220a
to
dce8456
Compare
/run-e2e-tests |
5b69b83
to
579ad1b
Compare
@cofyc @Yisaer @DanielZhangQD PTAL |
579ad1b
to
49a70b3
Compare
/run-e2e-in-kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM
// If not set, the default ReadyToScaleThresholdSeconds will be set to 300. | ||
// +optional | ||
ReadyToScaleThresholdSeconds *int32 `json:"readyToScaleThresholdSeconds,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
300 is about 10 times re-sync duration, I think it might be too long for a default value. What about 30 as it only require 1 re-sync duration.
pkg/autoscaler/autoscaler/util.go
Outdated
if tac.Spec.TiKV.ReadyToScaleThresholdSeconds == nil { | ||
tac.Spec.TiKV.ReadyToScaleThresholdSeconds = pointer.Int32Ptr(300) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
300 is too large. ditto.
pkg/label/label.go
Outdated
AnnTiKVReadyToScaleTimestamp = "tikv.tidb.pingcap.com/ready-to-scale-timestamp" | ||
|
||
// AnnLastSyncingTimestamp records last sync timestamp | ||
AnnLastSyncingTimestamp = "auto-scaling.tidb.pingcap.com/last-syncing-timestamp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think auto-scaling
here is unnecessary, this annotation could be common used for each API Object.
e3e2a7a
to
f620f6e
Compare
* add AnnLastSyncingTimestamp timestamp * add ReadyToScaleThresholdSeconds in AutoScalerSpec * add AnnTiKVReadyToScaleTimestamp timestamp labels to record AutoScalerPhase * add Normal, ReadyToScaleOut and ReadyToScaleIn three AutoScalerPhase * add checkStsReadyAutoScalingTimestamp to check AutoScalerPhase timestamp, only for TiKV * add checkStsLastSyncTimestamp to check maximum thresholdSec allowed before reset phase to Normal, only for TiKV * add checkStsAutoScaling combine checkStsLastSyncTimestamp, checkStsReadyAutoScalingTimestamp and checkStsAutoScalingInterval * add unit tests * add integration e2e tests * update doc
f620f6e
to
2747929
Compare
/run-e2e-in-kind |
/merge |
Your auto merge job has been accepted, waiting for:
|
Team qidelongdongqiang complete task #2241 and get 2500 score, current score 3979 |
/run-cherry-picker |
Signed-off-by: sre-bot <sre-bot@pingcap.com>
cherry pick to release-1.1 in PR #2568 |
What problem does this PR solve?
UCP #2241
What is changed and how does it work?
Add Phrase for
TidbClusterAutoScaler
, it has three optional values: Normal, ReadyToScaleOut, ReadyToScaleIn.Add user defined autoscale threshold.
use tidb as an example, tikv should be the same.
check phase
after autoscaler calculate the target replica, if target replica equals to current replica, set auto scaler phase to normal and return, otherwise check if the phase equal to scaleOut or scaleIn, if not, update the phase and record timestamp. go to 2.
check timestamp threshold
check the record timestamp remains longer than threshold which user defined. If not, return, otherwise go to 3.
do autoscale
set phase to normal and do normal autoscale.
Design for e2e tests:
After mock response from Prometheus, without noise reduction, it will start auto scaling after maximum 30s, but with noise reduction, we expect at least in 300s (ReadyToScaleThresholdSeconds),
cluster should remain the replica number. And after
ReadyToScaleThresholdSeconds
time, the cluster should start normal auto scaling and auto scale phase should be back to normal.Check List
Tests
Code changes
Side effects
n/a
Related changes
Does this PR introduce a user-facing change?: