-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: handle kill signal during write result to connection #52882
Conversation
Hi @wshwsh12. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #52882 +/- ##
=================================================
- Coverage 74.5303% 56.0530% -18.4773%
=================================================
Files 1507 1657 +150
Lines 358289 635446 +277157
=================================================
+ Hits 267034 356187 +89153
- Misses 71847 255021 +183174
- Partials 19408 24238 +4830
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/retest |
1 similar comment
/retest |
ec4937d
to
1587299
Compare
/retest |
Other lgtm |
pkg/executor/adapter.go
Outdated
executor exec.Executor | ||
fields []*ast.ResultField | ||
executor exec.Executor | ||
// `Fields` maybe call after `Close`, and executor will clear in `Close` function, so we need to store the schema in recordSet to avoid null pointer exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// `Fields` maybe call after `Close`, and executor will clear in `Close` function, so we need to store the schema in recordSet to avoid null pointer exception. | |
// The `Fields` method may be called after `Close`, and the executor is cleared in the `Close` function. | |
// Therefore, we need to store the schema in `recordSet` to avoid a null pointer exception when calling `executor.Schema()`. |
// If kill sql fails after 60 seconds, the current SQL may be stuck in the write packet network stack. | ||
// Now, we can reclaim the resource by calling `Finish` and start looking for the next SQL with large memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If kill sql fails after 60 seconds, the current SQL may be stuck in the write packet network stack. | |
// Now, we can reclaim the resource by calling `Finish` and start looking for the next SQL with large memory usage. | |
// If the SQL cannot be terminated after 60 seconds, it may be stuck in the network stack while writing packets to the client, | |
// encountering some bugs that cause it to hang, or failing to detect the kill signal. | |
// In this case, the resources can be reclaimed by calling the `Finish` method, and then we can start looking for the next SQL with the largest memory usage. |
if seconds := time.Since(s.killStartTime) / time.Second; seconds >= 60 { | ||
// If kill sql fails after 60 seconds, the current SQL may be stuck in the write packet network stack. | ||
// Now, we can reclaim the resource by calling `Finish` and start looking for the next SQL with large memory usage. | ||
logutil.BgLogger().Warn(fmt.Sprintf("global memory controller failed to kill the top-consumer in %ds, try to close the executors force", seconds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logutil.BgLogger().Warn(fmt.Sprintf("global memory controller failed to kill the top-consumer in %ds, try to close the executors force", seconds)) | |
logutil.BgLogger().Warn(fmt.Sprintf("global memory controller failed to kill the top consumer in %d seconds. Attempting to force close the executors.", seconds)) |
pkg/util/sqlkiller/sqlkiller.go
Outdated
@@ -41,11 +41,47 @@ const ( | |||
type SQLKiller struct { | |||
Signal killSignal | |||
ConnID uint64 | |||
Finish func() | |||
// InWriteResultSet is used to mark whether the query is calling clientConn.writeResultSet(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// InWriteResultSet is used to mark whether the query is calling clientConn.writeResultSet(). | |
// InWriteResultSet is used to indicate whether the query is currently calling clientConn.writeResultSet(). | |
// If the query is in writeResultSet and Finish() can acquire rs.finishLock, we can assume the query is waiting for the client to receive data from the server over network I/O. | |
pkg/util/sqlkiller/sqlkiller.go
Outdated
if atomic.CompareAndSwapUint32(&killer.Signal, 0, reason) { | ||
status := atomic.LoadUint32(&killer.Signal) | ||
err := killer.getKillError(status) | ||
logutil.BgLogger().Warn("kill query started", zap.Uint64("conn", killer.ConnID), zap.String("reason", err.Error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logutil.BgLogger().Warn("kill query initiated", zap.Uint64("connection ID", killer.ConnID), zap.String("reason", err.Error()))
pkg/util/sqlkiller/sqlkiller.go
Outdated
// FinishResultSet is used to finish the result set. | ||
// If the cancel signal is received and SQL is waiting for network IO, resource released can be performed first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// FinishResultSet is used to close the result set.
// If a kill signal is sent but the SQL query is stuck in the network stack while writing packets to the client,
// encountering some bugs that cause it to hang, or failing to detect the kill signal, we can call Finish to release resources used during the SQL execution process.
pkg/server/server.go
Outdated
@@ -907,6 +907,12 @@ func (s *Server) Kill(connectionID uint64, query bool, maxExecutionTime bool) { | |||
// Mark the client connection status as WaitShutdown, when clientConn.Run detect | |||
// this, it will end the dispatch loop and exit. | |||
conn.setStatus(connStatusWaitShutdown) | |||
if conn.bufReadConn != nil { | |||
// When we try `kill connection` and tidb is stuck in the write packet network stack, we can quickly exit the network stack and end the SQL by setting WriteDeadline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// When we try `kill connection` and tidb is stuck in the write packet network stack, we can quickly exit the network stack and end the SQL by setting WriteDeadline. | |
// When attempting to 'kill connection' and TiDB is stuck in the network stack while writing packets, | |
// we can quickly exit the network stack and terminate the SQL execution by setting WriteDeadline. |
pkg/executor/adapter.go
Outdated
status := a.stmt.Ctx.GetSessionVars().SQLKiller.GetKillSignal() | ||
inWriteResultSet := a.stmt.Ctx.GetSessionVars().SQLKiller.InWriteResultSet.Load() | ||
if status > 0 && inWriteResultSet { | ||
logutil.BgLogger().Warn("kill query, this SQL may be stuck in the network I/O stack.", zap.Uint64("conn", a.stmt.Ctx.GetSessionVars().ConnectionID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logutil.BgLogger().Warn("kill query, this SQL may be stuck in the network I/O stack.", zap.Uint64("conn", a.stmt.Ctx.GetSessionVars().ConnectionID)) | |
logutil.BgLogger().Warn("kill query, this SQL might be stuck in the network stack while writing packets to the client.", zap.Uint64("connection ID", a.stmt.Ctx.GetSessionVars().ConnectionID)) | |
pkg/executor/adapter.go
Outdated
// finishLock is a mutex used to synchronize access to the `Next` and `Finish` function of the adapter. | ||
// It ensures that only one goroutine can access to the `Next` and `Finish` function at a time, preventing race conditions. | ||
// When we terminate the current SQL externally(e.g. kill query), we may use an additional goroutine to call the Finish function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// finishLock is a mutex used to synchronize access to the `Next` and `Finish` function of the adapter. | |
// It ensures that only one goroutine can access to the `Next` and `Finish` function at a time, preventing race conditions. | |
// When we terminate the current SQL externally(e.g. kill query), we may use an additional goroutine to call the Finish function. | |
// finishLock is a mutex used to synchronize access to the `Next` and `Finish` functions of the adapter. | |
// It ensures that only one goroutine can access the `Next` and `Finish` functions at a time, preventing race conditions. | |
// When we terminate the current SQL externally (e.g., kill query), an additional goroutine would be used to call the `Finish` function. |
pkg/executor/adapter.go
Outdated
status := a.stmt.Ctx.GetSessionVars().SQLKiller.GetKillSignal() | ||
inWriteResultSet := a.stmt.Ctx.GetSessionVars().SQLKiller.InWriteResultSet.Load() | ||
if status > 0 && inWriteResultSet { | ||
logutil.BgLogger().Warn("kill query, this SQL may be stuck in the network I/O stack.", zap.Uint64("conn", a.stmt.Ctx.GetSessionVars().ConnectionID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will kill connection
trigger this log?
if so, how do we distinguish kill query
and kill connection
from the log?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: guo-shaoge, XuHuaiyu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
What problem does this PR solve?
Issue Number: close #44009
Problem Summary:
What changed and how does it work?
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.