-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Race in Pipelinerun reconciler tests #1124
Comments
Not able to repro. I just tried with 1000: $ go test -run "^TestReconcilePropagateLabels" ./pkg/reconciler/v1alpha1/pipelinerun/ -count 1000
ok github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun 1.164s |
hrm, another data point, I'm able to repro. go version 1.12.5
Tried with different settings of GOMAXPROCS but no change to the result, it panics each time I run it. Edit: updated to go 1.12.7 and see the panic there as well. |
Yikes! im putting this into our current milestone |
So, looks like its less of a flake and more of a data race:
|
I think I'm seeing this also when trying to get nightly releases running that run the unit tests (#860 ): [unit-tests : build-step-unit-test] ==================
[unit-tests : build-step-unit-test] WARNING: DATA RACE
[unit-tests : build-step-unit-test] Read at 0x00c0004b0743 by goroutine 178:
[unit-tests : build-step-unit-test] testing.(*common).logDepth()
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:629 +0x132
[unit-tests : build-step-unit-test] testing.(*common).Logf()
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:614 +0x90
[unit-tests : build-step-unit-test] testing.(*T).Logf()
[unit-tests : build-step-unit-test] <autogenerated>:1 +0x75
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0x11f
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.(*testingWriter).Write()
[unit-tests : build-step-unit-test] <autogenerated>:1 +0xa9
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x1c4
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x1e7
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x142
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).setTimer()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:138 +0x5ac
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).waitRun()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:257 +0x2da
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).WaitPipelineRun()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:243 +0xbd
[unit-tests : build-step-unit-test]
[unit-tests : build-step-unit-test] Previous write at 0x00c0004b0743 by main goroutine:
[unit-tests : build-step-unit-test] testing.tRunner.func1()
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:856 +0x354
[unit-tests : build-step-unit-test] testing.tRunner()
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:869 +0x17f
[unit-tests : build-step-unit-test] testing.runTests()
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:1155 +0x523
[unit-tests : build-step-unit-test] testing.(*M).Run()
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:1072 +0x2eb
[unit-tests : build-step-unit-test] main.main()
[unit-tests : build-step-unit-test] _testmain.go:126 +0x334
[unit-tests : build-step-unit-test]
[unit-tests : build-step-unit-test] Goroutine 178 (running) created at:
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.(*Reconciler).Reconcile()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun.go:144 +0x1a16
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.TestReconcileWithFailingConditionChecks()
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun_test.go:1376 +0x16aa
[unit-tests : build-step-unit-test] testing.tRunner()
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:865 +0x163
[unit-tests : build-step-unit-test] ==================
[unit-tests : build-step-unit-test] panic: Log in goroutine after TestReconcileWithFailingConditionChecks has completed
[unit-tests : build-step-unit-test]
[unit-tests : build-step-unit-test] goroutine 274 [running]:
[unit-tests : build-step-unit-test] testing.(*common).logDepth(0xc00030d900, 0xc000684210, 0xae, 0x3)
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:634 +0x51a
[unit-tests : build-step-unit-test] testing.(*common).log(...)
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:614
[unit-tests : build-step-unit-test] testing.(*common).Logf(0xc00030d900, 0x2167d8f, 0x2, 0xc0004f8f00, 0x1, 0x1)
[unit-tests : build-step-unit-test] /usr/local/go/src/testing/testing.go:649 +0x91
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write(0x258ca60, 0xc00030d900, 0x0, 0xc0004fb400, 0xaf, 0x400, 0xc0000c5180, 0xc0004f8ef0, 0xc0003a4e00)
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0x120
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write(0xc00065a510, 0x0, 0xbf53cb5bece4ca42, 0x1242dbcf, 0x3710be0, 0x2196fe3, 0x27, 0xc0006c2960, 0x46, 0x1, ...)
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x1c5
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000a6c6e0, 0x0, 0x0, 0x0)
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x1e8
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log(0xc0004fc198, 0x0, 0x217e12f, 0x15, 0xc0000d5dd0, 0x1, 0x1, 0x0, 0x0, 0x0)
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x143
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).Infof(...)
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:138
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).setTimer(0xc0000bdc40, 0x2531300, 0xc000737b80, 0x346308fec8d, 0xc0003c6e10)
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:281 +0x5ad
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).waitRun(0xc0000bdc40, 0x2531300, 0xc000737b80, 0x34630b8a000, 0xc00067c660, 0xc0003c6e10)
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:257 +0x2db
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).WaitPipelineRun(0xc0000bdc40, 0xc000737b80, 0xc00067c660)
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:243 +0xbe
[unit-tests : build-step-unit-test] created by github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.(*Reconciler).Reconcile
[unit-tests : build-step-unit-test] /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun.go:144 +0x1a17
[unit-tests : build-step-unit-test] FAIL github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun 0.347s |
This Pipeline will be triggered via prow over in the tektoncd/plumbing repo every night. It will create releases of all images normally released when doing official releases, plus also the image used for building with ko, and tag them with the date and commit they were built at, and will create the release.yaml as well. This Pipeline is missing a few things that are in the manual release Pipeline - due to tektoncd#1124 unit tests have a race condition, due to tektoncd#1205 the linting is flakey and it would be frustrating to lose a whole nightly release, and finally due to using v0.3.1 it's not possible to use workingDir, which is required by the golang build Task. The Pipelines and Tasks have been updated to work with Tekton Pipelines v0.3.1 because that's what we're using in our official cluster (since currently Prow requires it). Made release instructions more oriented toward someone actually making a release vs. a random person trying to run the same pipeline against their own infrastructure. Removed example Runs b/c it's much simpler to invoke via `tkn`, or Prow (these were falling out of date with how we were actually using the Pipelines/Tasks as well). Fixes tektoncd#860
This Pipeline will be triggered via prow over in the tektoncd/plumbing repo every night. It will create releases of all images normally released when doing official releases, plus also the image used for building with ko, and tag them with the date and commit they were built at, and will create the release.yaml as well. This Pipeline is missing a few things that are in the manual release Pipeline - due to tektoncd#1124 unit tests have a race condition, due to tektoncd#1205 the linting is flakey and it would be frustrating to lose a whole nightly release, and finally due to using v0.3.1 it's not possible to use workingDir, which is required by the golang build Task. The Pipelines and Tasks have been updated to work with Tekton Pipelines v0.3.1 because that's what we're using in our official cluster (since currently Prow requires it). Made release instructions more oriented toward someone actually making a release vs. a random person trying to run the same pipeline against their own infrastructure. Removed example Runs b/c it's much simpler to invoke via `tkn`, or Prow (these were falling out of date with how we were actually using the Pipelines/Tasks as well). Removed the `gcs-uploader-image` PipelineResource which is no longer being used. Fixes tektoncd#860
I've done some testing around this one.
If I enable
If I enable
So perhaps there are two different issues happening here? |
Okay so I think I've got a fix for this and there seem to be ~3 different things going wrong:
docker run --entrypoint go gcr.io/tekton-releases/tests/test-runner@sha256:a4a64b2b70f85a618bbbcc6c0b713b313b2e410504dee24c9f90ec6fe3ebf63f version
go version go1.11.4 linux/amd64 So my suggested plan of action:
|
My guess here was that in some test cases the timeout_handler subroutine is not released and it tries to log after the test has finished - but I was not able to prove this yet - nor pinpoint the specific cause for the subroutine to stay alive and logging. We can try removing logging at all from the timeout handler, but we might send logging back via a channel perhaps, to avoid having zero logging from the timeout handler.
Good catch. I'm not sure why TestReconcilePropagateLabels or TestReconcile_InvalidPipelineRuns are causing issue specifically though.
For the logging issue at least I read that go used to eat log messages silently before, now it's complaining if a subroutine logs something that will never be logged.
Sounds good! |
Created #1307 to add thread safe logging back in |
The function `GetLogMessages` isn't used anywhere. I had tried to remove it to see if it was causing the data race in tektoncd#1124 - it _isnt_ but still it's not being used anywhere so why not remove :)
Logging in the timeout handler was added as part of tektoncd#731 cuz it helped us debug when the timeout handler didn't work as expected. Unfortunately it looks like the logger we're using can't be used in multiple go routines (uber-go/zap#99 may be related). Removing this logging to fix tektoncd#1124, hopefully can find a safe way to add logging back in tektoncd#1307.
GetRunKey is accessed in goroutines via the timeout_handler; accessing attributes of an object in a goroutine is not threadsafe. This is used as a key in a map, so for now replacing this with a value that should be unique but also threadsafe to fix tektoncd#1124
Created tektoncd/plumbing#71 to update the base image we're using with Prow |
The function `GetLogMessages` isn't used anywhere. I had tried to remove it to see if it was causing the data race in #1124 - it _isnt_ but still it's not being used anywhere so why not remove :)
Logging in the timeout handler was added as part of #731 cuz it helped us debug when the timeout handler didn't work as expected. Unfortunately it looks like the logger we're using can't be used in multiple go routines (uber-go/zap#99 may be related). Removing this logging to fix #1124, hopefully can find a safe way to add logging back in #1307.
GetRunKey is accessed in goroutines via the timeout_handler; accessing attributes of an object in a goroutine is not threadsafe. This is used as a key in a map, so for now replacing this with a value that should be unique but also threadsafe to fix #1124
This Pipeline will be triggered via prow over in the tektoncd/plumbing repo every night. It will create releases of all images normally released when doing official releases, plus also the image used for building with ko, and tag them with the date and commit they were built at, and will create the release.yaml as well. This Pipeline is missing a few things that are in the manual release Pipeline - due to #1124 unit tests have a race condition, due to #1205 the linting is flakey and it would be frustrating to lose a whole nightly release, and finally due to using v0.3.1 it's not possible to use workingDir, which is required by the golang build Task. The Pipelines and Tasks have been updated to work with Tekton Pipelines v0.3.1 because that's what we're using in our official cluster (since currently Prow requires it). Made release instructions more oriented toward someone actually making a release vs. a random person trying to run the same pipeline against their own infrastructure. Removed example Runs b/c it's much simpler to invoke via `tkn`, or Prow (these were falling out of date with how we were actually using the Pipelines/Tasks as well). Removed the `gcs-uploader-image` PipelineResource which is no longer being used. Fixes #860
The reconciler tests seem to occasionally fail with a panic. Looks like https://github.com/tektoncd/pipeline/blob/master/pkg/reconciler/timeout_handler.go#L265 is being called after the test exits. (My assumption is that the
defer cancel()
should ensure that the timeout goroutines get cleaned up but I understand very little about how this works)Additional Information
go test -run "^TestReconcilePropagateLabels" ./pkg/reconciler/v1alpha1/pipelinerun/ -count 10
The text was updated successfully, but these errors were encountered: