-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip recalculating the rate in MaxReplicationLagModule when it can't be done #12620
Skip recalculating the rate in MaxReplicationLagModule when it can't be done #12620
Conversation
…be done This defends against lag records with nil stats which can lead to segfaults. See vitessio#12619 Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should have a unit test to repro this. If that's a pain we can merge without one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejortegau can you please add a new unit test here: https://github.com/vitessio/vitess/blob/main/go/vt/throttler/max_replication_lag_module_test.go
This new unit test can ensure that we don't crash or otherwise fail if any of the relevant structs are nil — at least lagRecordNow.stats
(guessing there's others too, but maybe not). It really doesn't have to be much. I'll then merge this.
Thank you for yet another contribution! ❤️
Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Added one (very simple) one. |
Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Hi, @mattlord I don't know if you saw, but I added unit testing as requested. Is there anything else that is missing before this can be merged? Thanks! |
expectPanic bool | ||
}{ | ||
{ | ||
name: "Zero lag", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Isn't this zero time, not zero lag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called it Zero lag because it tests the case here trigerring a panic when lagRecordNow.isZero()
is True
.
time: time.Time{}, | ||
TabletHealth: discovery.TabletHealth{Stats: nil}, | ||
}, | ||
expectPanic: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not in this PR, but we should revisit all the panics in the throttler code base. Better to return a well-known error and exit cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I was very surprised when I saw that this code just panics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepthi / @ejortegau: panic
s are cleaned up in this PR: #12901 👍. Reviews appreciated!
I believe the test failures are because we upgraded the go version. A merge from main should fix them. |
I just synced my fork and merged |
…be done (vitessio#12620) * Skip recalculating the rate in MaxReplicationLagModule when it can't be done This defends against lag records with nil stats which can lead to segfaults. See vitessio#12619 Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Address PR comments. Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Make linter happy Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> --------- Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
…be done (vitessio#12620) * Skip recalculating the rate in MaxReplicationLagModule when it can't be done This defends against lag records with nil stats which can lead to segfaults. See vitessio#12619 Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Address PR comments. Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Make linter happy Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> --------- Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
* Skip recalculating the rate in MaxReplicationLagModule when it can't be done (vitessio#12620) * Skip recalculating the rate in MaxReplicationLagModule when it can't be done This defends against lag records with nil stats which can lead to segfaults. See vitessio#12619 Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Address PR comments. Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Make linter happy Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> --------- Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * Throttled transactions return MySQL error code 1041 ER_OUT_OF_RESOURCES (vitessio#12949) This error code seems better suited to represent the fact that transactions are being throttled by the server due to some form of resource contention than the current code 1203 ER_TOO_MANY_USER_CONNECTIONS. Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> * MaxReplicationLagModule.recalculateRate no longer fills the log (vitessio#14875) Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> --------- Signed-off-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com> Co-authored-by: Eduardo J. Ortega U <5791035+ejortegau@users.noreply.github.com>
Description
This PR defends against lag records with nil stats which can lead to segfaults.
Related Issue(s)
#12619
Checklist
Deployment Notes
N/A