Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb panic after injection network partition between two AZ #54335

Closed
Lily2025 opened this issue Jul 1, 2024 · 2 comments · Fixed by #54390
Closed

tidb panic after injection network partition between two AZ #54335

Lily2025 opened this issue Jul 1, 2024 · 2 comments · Fixed by #54390
Assignees
Labels

Comments

@Lily2025
Copy link

Lily2025 commented Jul 1, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、run mussel workload
2、inject network partition between two AZ

2. What did you expect to see? (Required)

no panic

3. What did you see instead (Required)

tidb panic
2024-06-30 10:15:40 log="\n" 2024-06-30 10:15:40 log="/tidb-server --store=tikv --advertise-address=tc-tidb-1.tc-tidb-peer.endless-ha-test-airbnb-tps-7510461-1-362.svc --host=0.0.0.0 --path=tc-pd:2379 --config=/etc/tidb/tidb.toml\n" 2024-06-30 10:15:40 log="start tidb-server ...\n" 2024-06-30 10:15:37 log="\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/cmd/tidb-server/main.go:905 +0x37c\n" 2024-06-30 10:15:37 log="created by main.createServer in goroutine 1\n" 2024-06-30 10:15:37 log="\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/expensivequery/expensivequery.go:98 +0xaa8\n" 2024-06-30 10:15:37 log="github.com/pingcap/tidb/pkg/util/expensivequery.(*Handle).Run(0xc003868408)\n" 2024-06-30 10:15:37 log="\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:918 +0x515\n" 2024-06-30 10:15:37 log="github.com/pingcap/tidb/pkg/server.(*Server).Kill(0xc003cd3b00, 0x1241bfee, 0x1, 0x0?)\n" 2024-06-30 10:15:37 log="\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:950 +0x2ca\n" 2024-06-30 10:15:37 log="github.com/pingcap/tidb/pkg/server.killQuery(0xc1a069e680, 0x1)\n" 2024-06-30 10:15:37 log="\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/sqlkiller/sqlkiller.go:84\n" 2024-06-30 10:15:37 log="github.com/pingcap/tidb/pkg/util/sqlkiller.(*SQLKiller).FinishResultSet(...)\n" 2024-06-30 10:15:37 log="\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2064 +0x1c\n" 2024-06-30 10:15:37 log="github.com/pingcap/tidb/pkg/server.(*clientConn).handleStmt.func1()\n" 2024-06-30 10:15:37 log="\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/internal/resultset/resultset.go:69 +0x33\n" 2024-06-30 10:15:37 log="github.com/pingcap/tidb/pkg/server/internal/resultset.(*tidbResultSet).Finish(0xa2b2e20?)\n" 2024-06-30 10:15:37 log="\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2358 +0x14\n" 2024-06-30 10:15:37 log="github.com/pingcap/tidb/pkg/session.(*execStmtResult).Finish(0xc0032f59a0?)\n" 2024-06-30 10:15:37 log="goroutine 12862 [running]:\n" 2024-06-30 10:15:37 log="\n" 2024-06-30 10:15:37 log="[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x540d174]\n" 2024-06-30 10:15:37 log="panic: runtime error: invalid memory address or nil pointer dereference\n" 2024-06-30 10:15:37 log="[conn.go:1162] [\"command dispatched failed\"] [conn=306298908] [session_alias=] [connInfo=\"id:306298908, addr:10.233.108.10:51888 status:10, collation:utf8mb4_general_ci, user:root\"] [command=Query] [status=\"inTxn:0, autocommit:1\"] [sql=\"select /*+ max_execution_time(400), set_var(tikv_client_read_timeout=100) */ pk, sk, ts, v from t1 as of timestamp now() - interval 10 second where pk = '151832271' and sk = 'y4_16' and ts >= '2024-06-30 02:14:36.647382' and ts < '2024-06-30 02:15:26.647382' order by ts desc limit 5\"] [txn_mode=PESSIMISTIC] [timestamp=450812634988544000] [err=\"context canceled\\ngh.neting.cc/pingcap/errors.AddStack\\n\\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/errors.go:178\\ngh.neting.cc/pingcap/errors.Trace\\n\\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/juju_adaptor.go:15\\ngh.neting.cc/pingcap/tidb/pkg/store/copr.(*copIterator).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/store/copr/coprocessor.go:1095\\ngh.neting.cc/pingcap/tidb/pkg/distsql.(*selectResult).fetchResp\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/distsql/select_result.go:318\\ngh.neting.cc/pingcap/tidb/pkg/distsql.(*selectResult).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/distsql/select_result.go:384\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*tableResultHandler).nextChunk\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/table_reader.go:607\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*TableReaderExecutor).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/table_reader.go:330\\ngh.neting.cc/pingcap/tidb/pkg/executor/internal/exec.Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/internal/exec/executor.go:410\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*LimitExec).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/executor.go:1366\\ngh.neting.cc/pingcap/tidb/pkg/executor/internal/exec.Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/internal/exec/executor.go:410\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*ExecStmt).next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:1250\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*recordSet).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:175\\ngh.neting.cc/pingcap/tidb/pkg/server/internal/resultset.(*tidbResultSet).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/internal/resultset/resultset.go:64\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).writeChunks\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2332\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).writeResultSet\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2275\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).handleStmt\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2068\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).handleQuery\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1785\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).dispatch\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1359\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).Run\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1125\\ngh.neting.cc/pingcap/tidb/pkg/server.(*Server).onConn\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:739\\nruntime.goexit\\n\\t/usr/local/go/src/runtime/asm_amd64.s:1650\"]" 2024-06-30 10:15:37 log="[conn.go:1162] [\"command dispatched failed\"] [conn=306298862] [session_alias=] [connInfo=\"id:306298862, addr:10.233.119.153:60770 status:10, collation:utf8mb4_general_ci, user:root\"] [command=Query] [status=\"inTxn:0, autocommit:1\"] [sql=\"select /*+ max_execution_time(400), set_var(tikv_client_read_timeout=100) */ pk, sk, ts, v from t1 as of timestamp now() - interval 10 second where pk = '151832271' and sk = 'y4_16' and ts >= '2024-06-30 02:14:36.648815' and ts < '2024-06-30 02:15:26.648815' order by ts desc limit 5\"] [txn_mode=PESSIMISTIC] [timestamp=450812634988544000] [err=\"context canceled\\ngh.neting.cc/pingcap/errors.AddStack\\n\\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/errors.go:178\\ngh.neting.cc/pingcap/errors.Trace\\n\\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/juju_adaptor.go:15\\ngh.neting.cc/pingcap/tidb/pkg/store/copr.(*copIterator).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/store/copr/coprocessor.go:1095\\ngh.neting.cc/pingcap/tidb/pkg/distsql.(*selectResult).fetchResp\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/distsql/select_result.go:318\\ngh.neting.cc/pingcap/tidb/pkg/distsql.(*selectResult).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/distsql/select_result.go:384\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*tableResultHandler).nextChunk\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/table_reader.go:607\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*TableReaderExecutor).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/table_reader.go:330\\ngh.neting.cc/pingcap/tidb/pkg/executor/internal/exec.Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/internal/exec/executor.go:410\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*LimitExec).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/executor.go:1366\\ngh.neting.cc/pingcap/tidb/pkg/executor/internal/exec.Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/internal/exec/executor.go:410\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*ExecStmt).next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:1250\\ngh.neting.cc/pingcap/tidb/pkg/executor.(*recordSet).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:175\\ngh.neting.cc/pingcap/tidb/pkg/server/internal/resultset.(*tidbResultSet).Next\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/internal/resultset/resultset.go:64\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).writeChunks\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2332\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).writeResultSet\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2275\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).handleStmt\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2068\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).handleQuery\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1785\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).dispatch\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1359\\ngh.neting.cc/pingcap/tidb/pkg/server.(*clientConn).Run\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1125\\ngh.neting.cc/pingcap/tidb/pkg/server.(*Server).onConn\\n\\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:739\\nruntime.goexit\\n\\t/usr/local/go/src/runtime/asm_amd64.s:1650\"]" 2024-06-30 10:15:37 log="[sqlkiller.go:112] [\"kill finished\"] [conn=306298908]"

4. What is your TiDB version? (Required)

./tidb-server -V
Release Version: v8.2.0-alpha
Edition: Community
Git Commit Hash: 7df4f66
Git Branch: heads/refs/tags/v8.2.0-alpha
UTC Build Time: 2024-06-29 11:47:18
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-06-30T09:57:02.114+0800

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label Jul 1, 2024
@Lily2025
Copy link
Author

Lily2025 commented Jul 1, 2024

/severity major
/assign zyguan

@zyguan
Copy link
Contributor

zyguan commented Jul 1, 2024

Might be related to this PR, it's possible to call tidbResultSet.Finish after the trs has been closed (trs.recordSet is nil after close). @wshwsh12 Could you PTAL.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x540d174]

goroutine 12862 [running]:
github.com/pingcap/tidb/pkg/session.(*execStmtResult).Finish(0xc0032f59a0?)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2358 +0x14
github.com/pingcap/tidb/pkg/server/internal/resultset.(*tidbResultSet).Finish(0xa2b2e20?)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/internal/resultset/resultset.go:69 +0x33
github.com/pingcap/tidb/pkg/server.(*clientConn).handleStmt.func1()
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2064 +0x1c
github.com/pingcap/tidb/pkg/util/sqlkiller.(*SQLKiller).FinishResultSet(...)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/sqlkiller/sqlkiller.go:84
github.com/pingcap/tidb/pkg/server.killQuery(0xc1a069e680, 0x1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants