Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport 11325 - PlannedReparentShard/vstream hang fix #138

Merged

Conversation

tanjinx
Copy link

@tanjinx tanjinx commented Oct 3, 2023

Description

backport vitessio#11325
DemotePrimary is hanging on vstreamer open:

1 @ 0x43a4b6 0x44bbef 0x44bbc6 0x469546 0x4781a5 0x12d4852 0x12d4832 0x1393e54 0x1393aad 0x13921b8 0x1391d2c 0x139cf25 0x14d2be4 0x14d293c 0x14ed8c2 0x1496498 0xa647e3 0xb8333a 0xbec647 0xb8333a 0xb831de 0x1496358 0xa4b94f 0xa4fe76 0xa494f8 0x46d8e1
#	0x469545	sync.runtime_SemacquireMutex+0x25									runtime/sema.go:77
#	0x4781a4	sync.(*Mutex).lockSlow+0x164										sync/mutex.go:171
#	0x12d4851	sync.(*Mutex).Lock+0x51											sync/mutex.go:90
#	0x12d4831	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*Engine).Open+0x31				vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/engine.go:153
#	0x1393e53	vitess.io/vitess/go/vt/vttablet/tabletserver.(*stateManager).connect+0x73				vitess.io/vitess/go/vt/vttablet/tabletserver/state_manager.go:506
#	0x1393aac	vitess.io/vitess/go/vt/vttablet/tabletserver.(*stateManager).unservePrimary+0x4c			vitess.io/vitess/go/vt/vttablet/tabletserver/state_manager.go:451
#	0x13921b7	vitess.io/vitess/go/vt/vttablet/tabletserver.(*stateManager).execTransition+0x97			vitess.io/vitess/go/vt/vttablet/tabletserver/state_manager.go:258
#	0x1391d2b	vitess.io/vitess/go/vt/vttablet/tabletserver.(*stateManager).SetServingType+0x26b			vitess.io/vitess/go/vt/vttablet/tabletserver/state_manager.go:220
#	0x139cf24	vitess.io/vitess/go/vt/vttablet/tabletserver.(*TabletServer).SetServingType+0x44			vitess.io/vitess/go/vt/vttablet/tabletserver/tabletserver.go:368
#	0x14d2be3	vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).demotePrimary+0x203			vitess.io/vitess/go/vt/vttablet/tabletmanager/rpc_replication.go:542
#	0x14d293b	vitess.io/vitess/go/vt/vttablet/tabletmanager.(*TabletManager).DemotePrimary+0x5b			vitess.io/vitess/go/vt/vttablet/tabletmanager/rpc_replication.go:493
#	0x14ed8c1	vitess.io/vitess/go/vt/vttablet/grpctmserver.(*server).DemotePrimary+0x1a1				vitess.io/vitess/go/vt/vttablet/grpctmserver/server.go:462
#	0x1496497	vitess.io/vitess/go/vt/proto/tabletmanagerservice._TabletManager_DemotePrimary_Handler.func1+0x77	vitess.io/vitess/go/vt/proto/tabletmanagerservice/tabletmanagerservice_grpc.pb.go:1647
#	0xa647e2	github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x402				github.com/opentracing-contrib/go-grpc@v0.0.0-20180928155321-4b5a12d3ff02/server.go:57
#	0xb83339	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39				github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:25
#	0xbec646	vitess.io/vitess/go/vt/servenv.authenticatingUnaryInterceptor+0x86					vitess.io/vitess/go/vt/servenv/grpc_server.go:300
#	0xb83339	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x39				github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:25
#	0xb831dd	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xbd				github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:34
#	0x1496357	vitess.io/vitess/go/vt/proto/tabletmanagerservice._TabletManager_DemotePrimary_Handler+0x137		vitess.io/vitess/go/vt/proto/tabletmanagerservice/tabletmanagerservice_grpc.pb.go:1649
#	0xa4b94e	google.golang.org/grpc.(*Server).processUnaryRPC+0xcee							google.golang.org/grpc@v1.45.0/server.go:1282
#	0xa4fe75	google.golang.org/grpc.(*Server).handleStream+0xa35							google.golang.org/grpc@v1.45.0/server.go:1619
#	0xa494f7	google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97						google.golang.org/grpc@v1.45.0/server.go:921

for vstream/vschema stacks:

1 @ 0x43a4b6 0x44aafe 0xa11a30 0xa118ea 0xa5aa3c 0x146e80b 0x147a9ce 0x12e20ae 0x12e6b29 0x12e6104 0x12e5785 0x12e54d4 0x12e3e45 0x12d517d 0x13a1e25 0x147a851 0x146e793 0xa6518d 0xb8365a 0xbec559 0xb8365a 0xb834fe 0xa4e69b 0xa4fe30 0xa494f8 0x46d8e1
#	0xa11a2f	google.golang.org/grpc/internal/transport.(*writeQuota).get+0x2ef				google.golang.org/grpc@v1.45.0/internal/transport/flowcontrol.go:59
#	0xa118e9	google.golang.org/grpc/internal/transport.(*http2Server).Write+0x1a9				google.golang.org/grpc@v1.45.0/internal/transport/http2_server.go:1084
#	0xa5aa3b	google.golang.org/grpc.(*serverStream).SendMsg+0x1bb						google.golang.org/grpc@v1.45.0/stream.go:1512
#	0x146e80a	vitess.io/vitess/go/vt/proto/queryservice.(*queryVStreamServer).Send+0x2a			vitess.io/vitess/go/vt/proto/queryservice/queryservice_grpc.pb.go:1144
#	0x147a9cd	vitess.io/vitess/go/vt/vttablet/grpcqueryservice.(*query).VStream.func1+0x8d			vitess.io/vitess/go/vt/vttablet/grpcqueryservice/server.go:334
#	0x12e20ad	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.newUVStreamer.func1+0xed			vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/uvstreamer.go:109
#	0x12e6b28	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*vstreamer).parseEvents.func1+0x3c8	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:235
#	0x12e6103	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*vstreamer).parseEvents+0x803		vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:355
#	0x12e5784	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*vstreamer).replicate+0x124		vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:189
#	0x12e54d3	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*vstreamer).Stream+0x1b3		vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:168
#	0x12e3e44	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*uvstreamer).Stream+0x2e4		vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/uvstreamer.go:429
#	0x12d517c	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*Engine).Stream+0x1bc			vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/engine.go:239
#	0x13a1e24	vitess.io/vitess/go/vt/vttablet/tabletserver.(*TabletServer).VStream+0xc4			vitess.io/vitess/go/vt/vttablet/tabletserver/tabletserver.go:1071
#	0x147a850	vitess.io/vitess/go/vt/vttablet/grpcqueryservice.(*query).VStream+0x1d0				vitess.io/vitess/go/vt/vttablet/grpcqueryservice/server.go:333
#	0x146e792	vitess.io/vitess/go/vt/proto/queryservice._Query_VStream_Handler+0xd2				vitess.io/vitess/go/vt/proto/queryservice/queryservice_grpc.pb.go:1131
#	0xa6518c	github.com/opentracing-contrib/go-grpc.OpenTracingStreamServerInterceptor.func1+0x36c		github.com/opentracing-contrib/go-grpc@v0.0.0-20180928155321-4b5a12d3ff02/server.go:114
#	0xb83659	github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1+0x39			github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:49
#	0xbec558	vitess.io/vitess/go/vt/servenv.authenticatingStreamInterceptor+0xd8				vitess.io/vitess/go/vt/servenv/grpc_server.go:291
#	0xb83659	github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1+0x39			github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:49
#	0xb834fd	github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1+0xbd			github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:58
#	0xa4e69a	google.golang.org/grpc.(*Server).processStreamingRPC+0xf1a					google.golang.org/grpc@v1.45.0/server.go:1548
#	0xa4fe2f	google.golang.org/grpc.(*Server).handleStream+0x9ef						google.golang.org/grpc@v1.45.0/server.go:1623
#	0xa494f7	google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97					google.golang.org/grpc@v1.45.0/server.go:921
1 @ 0x43a4b6 0x44aafe 0x12e41a9 0x12e414a 0x12d700a 0x11068a8 0x1104e70 0x1106575 0x46d8e1
#	0x12e41a8	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*vstreamer).SetVSchema+0x128	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:139
#	0x12e4149	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*uvstreamer).SetVSchema+0xc9	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/uvstreamer.go:452
#	0x12d7009	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer.(*Engine).setWatch.func1+0x3a9	vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/engine.go:383
#	0x11068a7	vitess.io/vitess/go/vt/srvtopo.(*SrvVSchemaWatcher).WatchSrvVSchema.func1+0x47		vitess.io/vitess/go/vt/srvtopo/watch_srvvschema.go:78
#	0x1104e6f	vitess.io/vitess/go/vt/srvtopo.(*watchEntry).update+0x1ef				vitess.io/vitess/go/vt/srvtopo/watch.go:164
#	0x1106574	vitess.io/vitess/go/vt/srvtopo.NewSrvVSchemaWatcher.func1+0xd4				vitess.io/vitess/go/vt/srvtopo/watch_srvvschema.go:43

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@tanjinx tanjinx requested a review from a team as a code owner October 3, 2023 21:24
@tanjinx tanjinx changed the title Backport 11325 - PRS/vstream hang fix Backport 11325 - PlannedReparentShard/vstream hang fix Oct 3, 2023
@tanjinx
Copy link
Author

tanjinx commented Oct 4, 2023

ok  	vitess.io/vitess/go/test/endtoend/reparent/plannedreparent	448.122s
2023/10/04 22:27:50 local.plannedreparent: PASSED in 7m29s
2023/10/04 22:27:50 ============================================================
2023/10/04 22:27:50 local.emergencyreparent                 	PASS
2023/10/04 22:27:50 local.newfeatures                       	FAIL (1 tries)
2023/10/04 22:27:50 local.plannedreparent                   	PASS
2023/10/04 22:27:50 ============================================================
2023/10/04 22:27:50 2 PASSED, 0 FLAKY, 1 FAILED, 0 SKIPPED
2023/10/04 22:27:50 Total time: 16m31.6s
exit status 1

like a test issue:

I1004 21:46:04.389288   67562 utils.go:624] 	zone2-0000000201 ks 0 replica localhost:18816 localhost:18818 [] <null> MySQL56/678c8876-62ff-11ee-ba15-6045bd78fab5:1-16 0
    reparent_test.go:168: 
        	Error Trace:	/home/runner/work/vitess/vitess/go/test/endtoend/reparent/newfeaturetest/reparent_test.go:168
        	Error:      	Not equal: 
        	            	expected: "FilePos/vt-0000000101-bin.000001"
        	            	actual  : "FilePos/vt-0000000101-bin.000001:"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-FilePos/vt-0000000101-bin.000001
        	            	+FilePos/vt-0000000101-bin.000001:
        	Test:       	TestFullStatus
--- FAIL: TestFullStatus (18.94s)
FAIL
FAIL	vitess.io/vitess/go/test/endtoend/reparent/newfeaturetest	109.140s
FAIL
2023/10/04 21:46:16 local.newfeatures: saving test output to _test/20231004-213540.9121/local.newfeatures-1.1.log
2023/10/04 21:46:16 local.newfeatures: FAILED (try 1/1) in 1m50.1s: exit status 1
2023/10/04 21:46:16 local.newfeatures: retry limit exceeded

@tanjinx tanjinx merged commit e12e86e into slack-vitess-r14.0.5-prod-patch Oct 4, 2023
237 of 238 checks passed
@timvaillancourt
Copy link
Member

ok  	vitess.io/vitess/go/test/endtoend/reparent/plannedreparent	448.122s
2023/10/04 22:27:50 local.plannedreparent: PASSED in 7m29s
2023/10/04 22:27:50 ============================================================
2023/10/04 22:27:50 local.emergencyreparent                 	PASS
2023/10/04 22:27:50 local.newfeatures                       	FAIL (1 tries)
2023/10/04 22:27:50 local.plannedreparent                   	PASS
2023/10/04 22:27:50 ============================================================
2023/10/04 22:27:50 2 PASSED, 0 FLAKY, 1 FAILED, 0 SKIPPED
2023/10/04 22:27:50 Total time: 16m31.6s
exit status 1

like a test issue:

I1004 21:46:04.389288   67562 utils.go:624] 	zone2-0000000201 ks 0 replica localhost:18816 localhost:18818 [] <null> MySQL56/678c8876-62ff-11ee-ba15-6045bd78fab5:1-16 0
    reparent_test.go:168: 
        	Error Trace:	/home/runner/work/vitess/vitess/go/test/endtoend/reparent/newfeaturetest/reparent_test.go:168
        	Error:      	Not equal: 
        	            	expected: "FilePos/vt-0000000101-bin.000001"
        	            	actual  : "FilePos/vt-0000000101-bin.000001:"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-FilePos/vt-0000000101-bin.000001
        	            	+FilePos/vt-0000000101-bin.000001:
        	Test:       	TestFullStatus
--- FAIL: TestFullStatus (18.94s)
FAIL
FAIL	vitess.io/vitess/go/test/endtoend/reparent/newfeaturetest	109.140s
FAIL
2023/10/04 21:46:16 local.newfeatures: saving test output to _test/20231004-213540.9121/local.newfeatures-1.1.log
2023/10/04 21:46:16 local.newfeatures: FAILED (try 1/1) in 1m50.1s: exit status 1
2023/10/04 21:46:16 local.newfeatures: retry limit exceeded

@tanjinx this PR fixes this vitessio#13778. Feel free to backport it or I will soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants