Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/index/tpcc/w=1000 failed #76230

Closed
cockroach-teamcity opened this issue Feb 8, 2022 · 6 comments
Closed

roachtest: schemachange/index/tpcc/w=1000 failed #76230

cockroach-teamcity opened this issue Feb 8, 2022 · 6 comments
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

roachtest.schemachange/index/tpcc/w=1000 failed with artifacts on release-21.2 @ 79c1f34aa3c91977ed6784338d7383931733d0b6:

The test failed on branch=release-21.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/schemachange/index/tpcc/w=1000/run_1
	monitor.go:128,tpcc.go:279,schemachange.go:312,test_runner.go:777: monitor failure: unexpected node event: 4: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:279
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.makeIndexAddTpccTest.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/schemachange.go:312
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 4: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1296,context.go:89,cluster.go:1284,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-4325255-1644304598-79-n5cpu16 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		4: dead (exit status 137)
		1: 12377
		3: 11753
		2: 11564
		Error: UNCLASSIFIED_PROBLEM: 4: dead (exit status 137)
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/roachprod.Monitor
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/roachprod/roachprod.go:596
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:569
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:255
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (3) 4: dead (exit status 137)
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: roachtest README

Same failure on other branches

/cc @cockroachdb/sql-schema

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-release-21.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 8, 2022
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Feb 8, 2022
@ajwerner
Copy link
Contributor

ajwerner commented Feb 8, 2022

@ajwerner to put artifacts in the cloud for future investigation.

@ajwerner ajwerner self-assigned this Feb 8, 2022
@ajwerner
Copy link
Contributor

ajwerner commented Feb 8, 2022

@ajwerner ajwerner removed their assignment Feb 8, 2022
@ajwerner
Copy link
Contributor

So, the node dies, no sign of an oom, no heap profile, no panic, nothing. What in the world! The VM doesn't die, just the process.

I visualized the timeseries data, nothing gives. I'm going to say we have no clue.

@ajwerner ajwerner removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-schema-deprecated Use T-sql-foundations instead labels Feb 14, 2022
@ajwerner
Copy link
Contributor

I should add that the exit file does say

cockroach exited with code 137: Tue Feb 8 13:12:04 UTC 2022

We know that 128+9 = 137 so it was a SIGKILL. :/ Not from the oomkiller though. Who killed my process!

@ajwerner
Copy link
Contributor

Oh, I found something interesting in journalctl:

Feb 08 13:12:04 teamcity-4325255-1644304598-79-n5cpu16-0004 sshd[12431]: Received disconnect from 68.173.127.158 port 48079:11: disconnected by user
Feb 08 13:12:04 teamcity-4325255-1644304598-79-n5cpu16-0004 sshd[12431]: Disconnected from user ubuntu 68.173.127.158 port 48079
Feb 08 13:12:04 teamcity-4325255-1644304598-79-n5cpu16-0004 sshd[12355]: pam_unix(sshd:session): session closed for user ubuntu
Feb 08 13:12:04 teamcity-4325255-1644304598-79-n5cpu16-0004 systemd-logind[1378]: Session 47 logged out. Waiting for processes to exit.
Feb 08 13:12:04 teamcity-4325255-1644304598-79-n5cpu16-0004 bash[11451]: ./cockroach.sh: line 76: 11462 Killed                  "${BINARY}" "${START_CMD}" "${ARGS[@]}" >> "${LOG_DIR}/cockroach.stdout.log" 2>> "${LOG_DIR}/cockroach.stderr.log"
Feb 08 13:12:04 teamcity-4325255-1644304598-79-n5cpu16-0004 bash[12443]: cockroach exited with code 137: Tue Feb  8 13:12:04 UTC 2022

@ajwerner
Copy link
Contributor

Given it seems like a broken tcp connection from teamcity, I'm going to close this as unactionable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

2 participants