Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Race in Pipelinerun reconciler tests #1124

Closed
dibyom opened this issue Jul 24, 2019 · 10 comments · Fixed by #1308
Closed

Data Race in Pipelinerun reconciler tests #1124

dibyom opened this issue Jul 24, 2019 · 10 comments · Fixed by #1308
Assignees
Labels
area/testing Issues or PRs related to testing kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.

Comments

@dibyom
Copy link
Member

dibyom commented Jul 24, 2019

The reconciler tests seem to occasionally fail with a panic. Looks like https://github.com/tektoncd/pipeline/blob/master/pkg/reconciler/timeout_handler.go#L265 is being called after the test exits. (My assumption is that the defer cancel() should ensure that the timeout goroutines get cleaned up but I understand very little about how this works)

Additional Information

go test -run "^TestReconcilePropagateLabels" ./pkg/reconciler/v1alpha1/pipelinerun/ -count 10

panic: Log in goroutine after TestReconcilePropagateLabels/without_pipelinetask_name has completed

goroutine 101 [running]:
testing.(*common).logDepth(0xc000222300, 0xc000786240, 0xbd, 0x3)
        /usr/lib/google-golang/src/testing/testing.go:634 +0x3a6
testing.(*common).log(...)
        /usr/lib/google-golang/src/testing/testing.go:614
testing.(*common).Logf(0xc000222300, 0x18b328b, 0x2, 0xc0003b4080, 0x1, 0x1)
        /usr/lib/google-golang/src/testing/testing.go:649 +0x7f
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write(0x1bd0060, 0xc000222300, 0x0, 0xc0007a8400, 0xbe, 0x400, 0xc0007a8400, 0x1635f40, 0xc0003b4070)
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0xfc
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write(0xc0003d7050, 0x0, 0xbf46480bc791818a, 0x10edee7, 0x2b0e160, 0xc0004fcf00, 0x36, 0xc000358000, 0x42, 0x1, ...)
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x107
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000457550, 0x0, 0x0, 0x0)
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x119
github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log(0xc0003de388, 0x0, 0x18c895c, 0x15, 0xc000719de8, 0x1, 0x1, 0x0, 0x0, 0x0)
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x101
github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).Infof(...)
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:138
github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).setTimer(0xc000445c80, 0x1b7b400, 0xc00023e780, 0x34630b755cc, 0xc0002de3b0)
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:265 +0x4d3
github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).waitRun(0xc000445c80, 0x1b7b400, 0xc00023e780, 0x34630b8a000, 0xc0000f4640, 0xc0002de3b0)
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:241 +0x29d
github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).WaitPipelineRun(0xc000445c80, 0xc00023e780, 0xc0000f4640)
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:227 +0x68
created by github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun.(*Reconciler).Reconcile
        /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun/pipelinerun.go:141 +0x1224
FAIL    github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun        0.221s
@vdemeester vdemeester added area/testing Issues or PRs related to testing kind/flake Categorizes issue or PR as related to a flakey test labels Jul 25, 2019
@dlorenc
Copy link
Contributor

dlorenc commented Jul 30, 2019

Not able to repro. I just tried with 1000:

$ go test -run "^TestReconcilePropagateLabels" ./pkg/reconciler/v1alpha1/pipelinerun/ -count 1000
ok      github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun        1.164s

@ghost
Copy link

ghost commented Jul 30, 2019

hrm, another data point, I'm able to repro. go version 1.12.5

sbws@sbws18:~/dev/go/src/github.com/tektoncd/pipeline
 (master) λ go test -run "^TestReconcilePropagateLabels" ./pkg/reconciler/v1alpha1/pipelinerun/ -count 10
panic: Log in goroutine after TestReconcilePropagateLabels/with_pipelinetask_name has completed [recovered]
	panic: Log in goroutine after TestReconcilePropagateLabels/with_pipelinetask_name has completed

goroutine 35 [running]:
testing.tRunner.func1(0xc000373900)
	/usr/lib/google-golang/src/testing/testing.go:830 +0x392
panic(0x1639d20, 0xc0002b91b0)
	/usr/lib/google-golang/src/runtime/panic.go:522 +0x1b5
testing.(*common).logDepth(0xc0002c2400, 0xc0001068c0, 0xd3, 0x3)
	/usr/lib/google-golang/src/testing/testing.go:634 +0x3a6
testing.(*common).log(...)
	/usr/lib/google-golang/src/testing/testing.go:614
testing.(*common).Logf(0xc0002c2400, 0x18bee2d, 0x2, 0xc0002b91a0, 0x1, 0x1)
	/usr/lib/google-golang/src/testing/testing.go:649 +0x7f
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write(0x1bdaa00, 0xc0002c2400, 0x0, 0xc000167c00, 0xd4, 0x400, 0xc000167c00, 0x1639d20, 0xc0002b9190)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0xfc
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write(0xc0001c2420, 0xff, 0xbf4831fee116d9cc, 0x189ff75, 0x2b29400, 0xc00004f040, 0x47, 0xc0005dd460, 0x1a, 0x1, ...)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x107
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000667ef0, 0x0, 0x0, 0x0)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x119
github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log(0xc0000105f0, 0xc0000b57ff, 0x0, 0x0, 0xc0000b5750, 0x1, 0x1, 0x0, 0x0, 0x0)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x101
github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).Debug(...)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:97
github.com/tektoncd/pipeline/pkg/reconciler.NewBase(0x1c0fb40, 0xc0000d88c0, 0x1baf900, 0xc0000d8820, 0x0, 0x0, 0x1ba7600, 0xc0000c2af0, 0xc000010038, 0x0, ...)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/reconciler.go:95 +0x3db
github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun.NewController(0x1bcbac0, 0xc000089b80, 0x1ba7600, 0xc0000c2af0, 0xc0002b90e0)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun/controller.go:68 +0x3af
github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun.getPipelineRunController(0xc000373900, 0xc000079e38, 0x1, 0x1, 0xc000079e40, 0x1, 0x1, 0x0, 0x0, 0x0, ...)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun/pipelinerun_test.go:54 +0x173
github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun.TestReconcilePropagateLabels.func1(0xc000373900)
	/usr/local/google/home/sbws/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun/pipelinerun_test.go:748 +0x410
testing.tRunner(0xc000373900, 0xc0002b8e70)
	/usr/lib/google-golang/src/testing/testing.go:865 +0xc0
created by testing.(*T).Run
	/usr/lib/google-golang/src/testing/testing.go:916 +0x35a
FAIL	github.com/tektoncd/pipeline/pkg/reconciler/v1alpha1/pipelinerun	0.218s

Tried with different settings of GOMAXPROCS but no change to the result, it panics each time I run it.

Edit: updated to go 1.12.7 and see the panic there as well.

@bobcatfish
Copy link
Collaborator

Yikes! im putting this into our current milestone

@dibyom dibyom changed the title Pipelinerun reconciler tests seem to be flaky Data Race in Pipelinerun reconciler tests Aug 30, 2019
@dibyom
Copy link
Member Author

dibyom commented Aug 30, 2019

So, looks like its less of a flake and more of a data race:
go test -race ./...

ok  	github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/resources	(cached)
==================
WARNING: DATA RACE
Read at 0x00c0002fb843 by goroutine 141:
  testing.(*common).logDepth()
      /usr/lib/google-golang/src/testing/testing.go:629 +0x132
  testing.(*common).Logf()
      /usr/lib/google-golang/src/testing/testing.go:614 +0x90
  testing.(*T).Logf()
      <autogenerated>:1 +0x75
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0x11f
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.(*testingWriter).Write()
      <autogenerated>:1 +0xa9
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x1c4
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x1e7
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x142
  github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).setTimer()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:138 +0x5ac
  github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).waitRun()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:257 +0x2da
  github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).WaitTaskRun()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:230 +0xbd

Previous write at 0x00c0002fb843 by goroutine 29:
  testing.tRunner.func1()
      /usr/lib/google-golang/src/testing/testing.go:856 +0x354
  testing.tRunner()
      /usr/lib/google-golang/src/testing/testing.go:869 +0x17f

Goroutine 141 (running) created at:
  github.com/tektoncd/pipeline/pkg/reconciler/taskrun.(*Reconciler).reconcile()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/taskrun/taskrun.go:298 +0x20a7
  github.com/tektoncd/pipeline/pkg/reconciler/taskrun.(*Reconciler).Reconcile()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/taskrun/taskrun.go:141 +0xb07
  github.com/tektoncd/pipeline/pkg/reconciler/taskrun.TestReconcile.func1()
      /usr/local/google/home/dibyajyoti/dev/go/src/github.com/tektoncd/pipeline/pkg/reconciler/taskrun/taskrun_test.go:1055 +0x86f
  testing.tRunner()
      /usr/lib/google-golang/src/testing/testing.go:865 +0x163

Goroutine 29 (finished) created at:
  testing.(*T).Run()
      /usr/lib/google-golang/src/testing/testing.go:916 +0x65a
  testing.runTests.func1()
      /usr/lib/google-golang/src/testing/testing.go:1159 +0xa8
  testing.tRunner()
      /usr/lib/google-golang/src/testing/testing.go:865 +0x163
  testing.runTests()
      /usr/lib/google-golang/src/testing/testing.go:1157 +0x523
  testing.(*M).Run()
      /usr/lib/google-golang/src/testing/testing.go:1074 +0x2f5
  main.main()
      _testmain.go:80 +0x222
==================

@dibyom dibyom added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. and removed kind/flake Categorizes issue or PR as related to a flakey test labels Aug 30, 2019
@bobcatfish
Copy link
Collaborator

I think I'm seeing this also when trying to get nightly releases running that run the unit tests (#860 ):

[unit-tests : build-step-unit-test] ==================
[unit-tests : build-step-unit-test] WARNING: DATA RACE
[unit-tests : build-step-unit-test] Read at 0x00c0004b0743 by goroutine 178:
[unit-tests : build-step-unit-test]   testing.(*common).logDepth()
[unit-tests : build-step-unit-test]       /usr/local/go/src/testing/testing.go:629 +0x132
[unit-tests : build-step-unit-test]   testing.(*common).Logf()
[unit-tests : build-step-unit-test]       /usr/local/go/src/testing/testing.go:614 +0x90
[unit-tests : build-step-unit-test]   testing.(*T).Logf()
[unit-tests : build-step-unit-test]       <autogenerated>:1 +0x75
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0x11f
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.(*testingWriter).Write()
[unit-tests : build-step-unit-test]       <autogenerated>:1 +0xa9
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x1c4
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x1e7
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x142
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).setTimer()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:138 +0x5ac
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).waitRun()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:257 +0x2da
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).WaitPipelineRun()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:243 +0xbd
[unit-tests : build-step-unit-test] 
[unit-tests : build-step-unit-test] Previous write at 0x00c0004b0743 by main goroutine:
[unit-tests : build-step-unit-test]   testing.tRunner.func1()
[unit-tests : build-step-unit-test]       /usr/local/go/src/testing/testing.go:856 +0x354
[unit-tests : build-step-unit-test]   testing.tRunner()
[unit-tests : build-step-unit-test]       /usr/local/go/src/testing/testing.go:869 +0x17f
[unit-tests : build-step-unit-test]   testing.runTests()
[unit-tests : build-step-unit-test]       /usr/local/go/src/testing/testing.go:1155 +0x523
[unit-tests : build-step-unit-test]   testing.(*M).Run()
[unit-tests : build-step-unit-test]       /usr/local/go/src/testing/testing.go:1072 +0x2eb
[unit-tests : build-step-unit-test]   main.main()
[unit-tests : build-step-unit-test]       _testmain.go:126 +0x334
[unit-tests : build-step-unit-test] 
[unit-tests : build-step-unit-test] Goroutine 178 (running) created at:
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.(*Reconciler).Reconcile()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun.go:144 +0x1a16
[unit-tests : build-step-unit-test]   github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.TestReconcileWithFailingConditionChecks()
[unit-tests : build-step-unit-test]       /workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun_test.go:1376 +0x16aa
[unit-tests : build-step-unit-test]   testing.tRunner()
[unit-tests : build-step-unit-test]       /usr/local/go/src/testing/testing.go:865 +0x163
[unit-tests : build-step-unit-test] ==================
[unit-tests : build-step-unit-test] panic: Log in goroutine after TestReconcileWithFailingConditionChecks has completed
[unit-tests : build-step-unit-test] 
[unit-tests : build-step-unit-test] goroutine 274 [running]:
[unit-tests : build-step-unit-test] testing.(*common).logDepth(0xc00030d900, 0xc000684210, 0xae, 0x3)
[unit-tests : build-step-unit-test] 	/usr/local/go/src/testing/testing.go:634 +0x51a
[unit-tests : build-step-unit-test] testing.(*common).log(...)
[unit-tests : build-step-unit-test] 	/usr/local/go/src/testing/testing.go:614
[unit-tests : build-step-unit-test] testing.(*common).Logf(0xc00030d900, 0x2167d8f, 0x2, 0xc0004f8f00, 0x1, 0x1)
[unit-tests : build-step-unit-test] 	/usr/local/go/src/testing/testing.go:649 +0x91
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write(0x258ca60, 0xc00030d900, 0x0, 0xc0004fb400, 0xaf, 0x400, 0xc0000c5180, 0xc0004f8ef0, 0xc0003a4e00)
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0x120
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write(0xc00065a510, 0x0, 0xbf53cb5bece4ca42, 0x1242dbcf, 0x3710be0, 0x2196fe3, 0x27, 0xc0006c2960, 0x46, 0x1, ...)
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x1c5
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000a6c6e0, 0x0, 0x0, 0x0)
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x1e8
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log(0xc0004fc198, 0x0, 0x217e12f, 0x15, 0xc0000d5dd0, 0x1, 0x1, 0x0, 0x0, 0x0)
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x143
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).Infof(...)
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:138
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).setTimer(0xc0000bdc40, 0x2531300, 0xc000737b80, 0x346308fec8d, 0xc0003c6e10)
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:281 +0x5ad
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).waitRun(0xc0000bdc40, 0x2531300, 0xc000737b80, 0x34630b8a000, 0xc00067c660, 0xc0003c6e10)
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:257 +0x2db
[unit-tests : build-step-unit-test] github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).WaitPipelineRun(0xc0000bdc40, 0xc000737b80, 0xc00067c660)
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:243 +0xbe
[unit-tests : build-step-unit-test] created by github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.(*Reconciler).Reconcile
[unit-tests : build-step-unit-test] 	/workspace/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun.go:144 +0x1a17
[unit-tests : build-step-unit-test] FAIL	github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun	0.347s

@bobcatfish bobcatfish self-assigned this Sep 3, 2019
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Sep 4, 2019
This Pipeline will be triggered via prow over in the tektoncd/plumbing
repo every night. It will create releases of all images normally
released when doing official releases, plus also the image used for
building with ko, and tag them with the date and commit they were built
at, and will create the release.yaml as well.

This Pipeline is missing a few things that are in the manual release
Pipeline - due to tektoncd#1124 unit tests have a race condition, due to tektoncd#1205
the linting is flakey and it would be frustrating to lose a whole
nightly release, and finally due to using v0.3.1 it's not possible to
use workingDir, which is required by the golang build Task.

The Pipelines and Tasks have been updated to work with Tekton Pipelines
v0.3.1 because that's what we're using in our official cluster (since
currently Prow requires it).

Made release instructions more oriented toward someone actually
making a release vs. a random person trying to run the same pipeline
against their own infrastructure.

Removed example Runs b/c it's much simpler to invoke
via `tkn`, or Prow (these were falling out of date with how we were
actually using the Pipelines/Tasks as well).

Fixes tektoncd#860
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Sep 4, 2019
This Pipeline will be triggered via prow over in the tektoncd/plumbing
repo every night. It will create releases of all images normally
released when doing official releases, plus also the image used for
building with ko, and tag them with the date and commit they were built
at, and will create the release.yaml as well.

This Pipeline is missing a few things that are in the manual release
Pipeline - due to tektoncd#1124 unit tests have a race condition, due to tektoncd#1205
the linting is flakey and it would be frustrating to lose a whole
nightly release, and finally due to using v0.3.1 it's not possible to
use workingDir, which is required by the golang build Task.

The Pipelines and Tasks have been updated to work with Tekton Pipelines
v0.3.1 because that's what we're using in our official cluster (since
currently Prow requires it).

Made release instructions more oriented toward someone actually
making a release vs. a random person trying to run the same pipeline
against their own infrastructure.

Removed example Runs b/c it's much simpler to invoke
via `tkn`, or Prow (these were falling out of date with how we were
actually using the Pipelines/Tasks as well).

Removed the `gcs-uploader-image` PipelineResource which is no longer
being used.

Fixes tektoncd#860
@afrittoli
Copy link
Member

afrittoli commented Sep 11, 2019

I've done some testing around this one.
Only looking at pipelinerun_test.go, it seems that the following tests generate issues:

-func TestReconcile_InvalidPipelineRuns(t *testing.T) {
-func TestReconcilePropagateLabels(t *testing.T) {
-func TestReconcileWithConditionChecks(t *testing.T) {
-func TestReconcileWithFailingConditionChecks(t *testing.T) {

If I enable TestReconcileWithFailingConditionChecks or TestReconcileWithConditionChecks I get a panic but no data race:

$ go test -race  ./pkg/reconciler/pipelinerun/
E0911 16:40:28.050423   72124 event.go:259] Could not construct reference to: '&v1alpha1.PipelineRun{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"test-pipeline-run-with-timeout", GenerateName:"", Namespace:"foo", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"tekton.dev/pipeline":"test-pipeline"}, Annotations:map[string]string{}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1alpha1.PipelineRunSpec{PipelineRef:v1alpha1.PipelineRef{Name:"test-pipeline", APIVersion:""}, Resources:[]v1alpha1.PipelineResourceBinding(nil), Params:[]v1alpha1.Param(nil), ServiceAccount:"test-sa", ServiceAccounts:[]v1alpha1.PipelineRunSpecServiceAccount(nil), Results:(*v1alpha1.Results)(nil), Status:"", Timeout:(*v1.Duration)(0xc00011b4c0), PodTemplate:v1alpha1.PodTemplate{NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), Volumes:[]v1.Volume(nil)}, NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil)}, Status:v1alpha1.PipelineRunStatus{Status:v1beta1.Status{ObservedGeneration:0, Conditions:v1beta1.Conditions{apis.Condition{Type:"Succeeded", Status:"False", Severity:"", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0xbf56633b0300ff48, ext:71671574, loc:(*time.Location)(0x4314120)}}}, Reason:"PipelineRunTimeout", Message:"PipelineRun \"test-pipeline-run-with-timeout\" failed to finish within \"&Duration{Duration:12h0m0s,}\""}}}, Results:(*v1alpha1.Results)(nil), StartTime:(*v1.Time)(0xc00000e720), CompletionTime:(*v1.Time)(nil), TaskRuns:map[string]*v1alpha1.PipelineRunTaskRunStatus{}}}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'Failed' 'PipelineRun "test-pipeline-run-with-timeout" failed to finish within "&Duration{Duration:12h0m0s,}"'
E0911 16:40:28.061250   72124 event.go:259] Could not construct reference to: '&v1alpha1.PipelineRun{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"test-pipeline-retry-run-with-timeout", GenerateName:"", Namespace:"foo", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"tekton.dev/pipeline":"test-pipeline-retry"}, Annotations:map[string]string{}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1alpha1.PipelineRunSpec{PipelineRef:v1alpha1.PipelineRef{Name:"test-pipeline-retry", APIVersion:""}, Resources:[]v1alpha1.PipelineResourceBinding(nil), Params:[]v1alpha1.Param(nil), ServiceAccount:"test-sa", ServiceAccounts:[]v1alpha1.PipelineRunSpecServiceAccount(nil), Results:(*v1alpha1.Results)(nil), Status:"", Timeout:(*v1.Duration)(0xc00011a2a0), PodTemplate:v1alpha1.PodTemplate{NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), Volumes:[]v1.Volume(nil)}, NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil)}, Status:v1alpha1.PipelineRunStatus{Status:v1beta1.Status{ObservedGeneration:0, Conditions:v1beta1.Conditions{apis.Condition{Type:"Succeeded", Status:"False", Severity:"", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0xbf56633b03a66af0, ext:82513790, loc:(*time.Location)(0x4314120)}}}, Reason:"PipelineRunTimeout", Message:"PipelineRun \"test-pipeline-retry-run-with-timeout\" failed to finish within \"&Duration{Duration:12h0m0s,}\""}}}, Results:(*v1alpha1.Results)(nil), StartTime:(*v1.Time)(0xc0006a63e0), CompletionTime:(*v1.Time)(nil), TaskRuns:map[string]*v1alpha1.PipelineRunTaskRunStatus{"hello-world-1":(*v1alpha1.PipelineRunTaskRunStatus)(0xc0006a6400)}}}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'Failed' 'PipelineRun "test-pipeline-retry-run-with-timeout" failed to finish within "&Duration{Duration:12h0m0s,}"'
E0911 16:40:28.063814   72124 event.go:259] Could not construct reference to: '&v1alpha1.PipelineRun{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"test-pipeline-retry-run-with-timeout", GenerateName:"", Namespace:"foo", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"tekton.dev/pipeline":"test-pipeline-retry"}, Annotations:map[string]string{}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1alpha1.PipelineRunSpec{PipelineRef:v1alpha1.PipelineRef{Name:"test-pipeline-retry", APIVersion:""}, Resources:[]v1alpha1.PipelineResourceBinding(nil), Params:[]v1alpha1.Param(nil), ServiceAccount:"test-sa", ServiceAccounts:[]v1alpha1.PipelineRunSpecServiceAccount(nil), Results:(*v1alpha1.Results)(nil), Status:"", Timeout:(*v1.Duration)(0xc00011b2b8), PodTemplate:v1alpha1.PodTemplate{NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), Volumes:[]v1.Volume(nil)}, NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil)}, Status:v1alpha1.PipelineRunStatus{Status:v1beta1.Status{ObservedGeneration:0, Conditions:v1beta1.Conditions{apis.Condition{Type:"Succeeded", Status:"False", Severity:"", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0xbf56633b03cd9260, ext:85079769, loc:(*time.Location)(0x4314120)}}}, Reason:"PipelineRunTimeout", Message:"PipelineRun \"test-pipeline-retry-run-with-timeout\" failed to finish within \"&Duration{Duration:12h0m0s,}\""}}}, Results:(*v1alpha1.Results)(nil), StartTime:(*v1.Time)(0xc0006a76e0), CompletionTime:(*v1.Time)(nil), TaskRuns:map[string]*v1alpha1.PipelineRunTaskRunStatus{"hello-world-1":(*v1alpha1.PipelineRunTaskRunStatus)(0xc0006a7700)}}}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'Failed' 'PipelineRun "test-pipeline-retry-run-with-timeout" failed to finish within "&Duration{Duration:12h0m0s,}"'
PASS
panic: Log in goroutine after TestReconcileWithFailingConditionChecks has completed

goroutine 210 [running]:
testing.(*common).logDepth(0xc00068c400, 0xc00067a240, 0xb2, 0x3)
	/usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:634 +0x51a
testing.(*common).log(...)
	/usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:614
testing.(*common).Logf(0xc00068c400, 0x2d6df2f, 0x2, 0xc000726ac0, 0x1, 0x1)
	/usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:649 +0x91
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write(0x3191660, 0xc00068c400, 0x0, 0xc00086e000, 0xb3, 0x400, 0xc00000efa0, 0xc000726ab0, 0xc00072cc00)
	/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0x120
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write(0xc0001e87b0, 0x0, 0xbf56633b048c1fa0, 0x5d0c539, 0x4314120, 0x2d9c8b3, 0x27, 0xc0002244b0, 0x46, 0x1, ...)
	/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x1c5
github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00057af20, 0x0, 0x0, 0x0)
	/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x1e8
github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log(0xc0001301a8, 0x0, 0x2d83fa0, 0x15, 0xc000081dd0, 0x1, 0x1, 0x0, 0x0, 0x0)
	/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x143
github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).Infof(...)
	/go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:138
github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).setTimer(0xc0000517c0, 0x3135e80, 0xc000777680, 0x34630b69302, 0xc00038fb40)
	/go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:281 +0x5ad
github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).waitRun(0xc0000517c0, 0x3135e80, 0xc000777680, 0x34630b8a000, 0xc00092e900, 0xc00038fb40)
	/go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:257 +0x2db
github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).WaitPipelineRun(0xc0000517c0, 0xc000777680, 0xc00092e900)
	/go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:243 +0xbe
created by github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.(*Reconciler).Reconcile
	/go/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun.go:144 +0x17b9
FAIL	github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun	0.264s

If I enable TestReconcilePropagateLabels or TestReconcile_InvalidPipelineRuns I get a DATA race:

E0911 16:47:00.388462   80223 event.go:259] Could not construct reference to: '&v1alpha1.PipelineRun{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"test-pipeline-run-with-timeout", GenerateName:"", Namespace:"foo", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"tekton.dev/pipeline":"test-pipeline"}, Annotations:map[string]string{}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1alpha1.PipelineRunSpec{PipelineRef:v1alpha1.PipelineRef{Name:"test-pipeline", APIVersion:""}, Resources:[]v1alpha1.PipelineResourceBinding(nil), Params:[]v1alpha1.Param(nil), ServiceAccount:"test-sa", ServiceAccounts:[]v1alpha1.PipelineRunSpecServiceAccount(nil), Results:(*v1alpha1.Results)(nil), Status:"", Timeout:(*v1.Duration)(0xc0001195f0), PodTemplate:v1alpha1.PodTemplate{NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), Volumes:[]v1.Volume(nil)}, NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil)}, Status:v1alpha1.PipelineRunStatus{Status:v1beta1.Status{ObservedGeneration:0, Conditions:v1beta1.Conditions{apis.Condition{Type:"Succeeded", Status:"False", Severity:"", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0xbf56639d17270c38, ext:73004131, loc:(*time.Location)(0x430d100)}}}, Reason:"PipelineRunTimeout", Message:"PipelineRun \"test-pipeline-run-with-timeout\" failed to finish within \"&Duration{Duration:12h0m0s,}\""}}}, Results:(*v1alpha1.Results)(nil), StartTime:(*v1.Time)(0xc0006ebb00), CompletionTime:(*v1.Time)(nil), TaskRuns:map[string]*v1alpha1.PipelineRunTaskRunStatus{}}}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'Failed' 'PipelineRun "test-pipeline-run-with-timeout" failed to finish within "&Duration{Duration:12h0m0s,}"'
==================
WARNING: DATA RACE
Read at 0x00c0006e8b43 by goroutine 123:
  testing.(*common).logDepth()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:629 +0x132
  testing.(*common).Logf()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:614 +0x90
  testing.(*T).Logf()
      <autogenerated>:1 +0x75
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.testingWriter.Write()
      /go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest/logger.go:130 +0x11f
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zaptest.(*testingWriter).Write()
      <autogenerated>:1 +0xa9
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*ioCore).Write()
      /go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/core.go:90 +0x1c4
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write()
      /go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/zapcore/entry.go:215 +0x1e7
  github.com/tektoncd/pipeline/vendor/go.uber.org/zap.(*SugaredLogger).log()
      /go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:234 +0x142
  github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).setTimer()
      /go/src/github.com/tektoncd/pipeline/vendor/go.uber.org/zap/sugar.go:138 +0x5ac
  github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).waitRun()
      /go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:257 +0x2da
  github.com/tektoncd/pipeline/pkg/reconciler.(*TimeoutSet).WaitPipelineRun()
      /go/src/github.com/tektoncd/pipeline/pkg/reconciler/timeout_handler.go:243 +0xbd

Previous write at 0x00c0006e8b43 by goroutine 108:
  testing.tRunner.func1()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:856 +0x354
  testing.tRunner()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:869 +0x17f

Goroutine 123 (running) created at:
  github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.(*Reconciler).Reconcile()
      /go/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun.go:144 +0x17b8
  github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun.TestReconcilePropagateLabels.func1()
      /go/src/github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun/pipelinerun_test.go:918 +0x582
  testing.tRunner()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:865 +0x163

Goroutine 108 (finished) created at:
  testing.(*T).Run()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:916 +0x65a
  testing.runTests.func1()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:1157 +0xa8
  testing.tRunner()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:865 +0x163
  testing.runTests()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:1155 +0x523
  testing.(*M).Run()
      /usr/local/Cellar/go/1.12.4/libexec/src/testing/testing.go:1072 +0x2eb
  main.main()
      _testmain.go:68 +0x222
==================
E0911 16:47:00.406244   80223 event.go:259] Could not construct reference to: '&v1alpha1.PipelineRun{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"test-pipeline-retry-run-with-timeout", GenerateName:"", Namespace:"foo", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"tekton.dev/pipeline":"test-pipeline-retry"}, Annotations:map[string]string{}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1alpha1.PipelineRunSpec{PipelineRef:v1alpha1.PipelineRef{Name:"test-pipeline-retry", APIVersion:""}, Resources:[]v1alpha1.PipelineResourceBinding(nil), Params:[]v1alpha1.Param(nil), ServiceAccount:"test-sa", ServiceAccounts:[]v1alpha1.PipelineRunSpecServiceAccount(nil), Results:(*v1alpha1.Results)(nil), Status:"", Timeout:(*v1.Duration)(0xc00024e380), PodTemplate:v1alpha1.PodTemplate{NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), Volumes:[]v1.Volume(nil)}, NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil)}, Status:v1alpha1.PipelineRunStatus{Status:v1beta1.Status{ObservedGeneration:0, Conditions:v1beta1.Conditions{apis.Condition{Type:"Succeeded", Status:"False", Severity:"", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0xbf56639d183697d8, ext:90799949, loc:(*time.Location)(0x430d100)}}}, Reason:"PipelineRunTimeout", Message:"PipelineRun \"test-pipeline-retry-run-with-timeout\" failed to finish within \"&Duration{Duration:12h0m0s,}\""}}}, Results:(*v1alpha1.Results)(nil), StartTime:(*v1.Time)(0xc000437c80), CompletionTime:(*v1.Time)(nil), TaskRuns:map[string]*v1alpha1.PipelineRunTaskRunStatus{"hello-world-1":(*v1alpha1.PipelineRunTaskRunStatus)(0xc000437ca0)}}}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'Failed' 'PipelineRun "test-pipeline-retry-run-with-timeout" failed to finish within "&Duration{Duration:12h0m0s,}"'
E0911 16:47:00.408880   80223 event.go:259] Could not construct reference to: '&v1alpha1.PipelineRun{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"test-pipeline-retry-run-with-timeout", GenerateName:"", Namespace:"foo", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"tekton.dev/pipeline":"test-pipeline-retry"}, Annotations:map[string]string{}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1alpha1.PipelineRunSpec{PipelineRef:v1alpha1.PipelineRef{Name:"test-pipeline-retry", APIVersion:""}, Resources:[]v1alpha1.PipelineResourceBinding(nil), Params:[]v1alpha1.Param(nil), ServiceAccount:"test-sa", ServiceAccounts:[]v1alpha1.PipelineRunSpecServiceAccount(nil), Results:(*v1alpha1.Results)(nil), Status:"", Timeout:(*v1.Duration)(0xc0001184e0), PodTemplate:v1alpha1.PodTemplate{NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil), SecurityContext:(*v1.PodSecurityContext)(nil), Volumes:[]v1.Volume(nil)}, NodeSelector:map[string]string(nil), Tolerations:[]v1.Toleration(nil), Affinity:(*v1.Affinity)(nil)}, Status:v1alpha1.PipelineRunStatus{Status:v1beta1.Status{ObservedGeneration:0, Conditions:v1beta1.Conditions{apis.Condition{Type:"Succeeded", Status:"False", Severity:"", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0xbf56639d185ee058, ext:93440706, loc:(*time.Location)(0x430d100)}}}, Reason:"PipelineRunTimeout", Message:"PipelineRun \"test-pipeline-retry-run-with-timeout\" failed to finish within \"&Duration{Duration:12h0m0s,}\""}}}, Results:(*v1alpha1.Results)(nil), StartTime:(*v1.Time)(0xc00000fb40), CompletionTime:(*v1.Time)(nil), TaskRuns:map[string]*v1alpha1.PipelineRunTaskRunStatus{"hello-world-1":(*v1alpha1.PipelineRunTaskRunStatus)(0xc00000fb60)}}}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'Failed' 'PipelineRun "test-pipeline-retry-run-with-timeout" failed to finish within "&Duration{Duration:12h0m0s,}"'
FAIL
FAIL	github.com/tektoncd/pipeline/pkg/reconciler/pipelinerun	0.263s

So perhaps there are two different issues happening here?

@bobcatfish
Copy link
Collaborator

Okay so I think I've got a fix for this and there seem to be ~3 different things going wrong:

  1. Using the logger in the timeout_handler at all seems to cause races (I'm guessing it's not threadsafe?)
  2. The timeout_handler depends on calling GetRunKey on TaskRuns and PipelineRuns, accessing properties of those objects is also not threadsafe
  3. The tests are being run with -race but we're not catching the error. Looks like the the image we're using has version 1.11 of go, I'm guessing that newer versions are better at catching races?
docker run --entrypoint go gcr.io/tekton-releases/tests/test-runner@sha256:a4a64b2b70f85a618bbbcc6c0b713b313b2e410504dee24c9f90ec6fe3ebf63f version      
go version go1.11.4 linux/amd64

So my suggested plan of action:

  • Add quick fixes for (1) and (2), then open an issue to look into threadsafe logging from the timeout handler
  • Update gcr.io/tekton-releases/tests/test-runner to use a newer version of go

@afrittoli
Copy link
Member

Okay so I think I've got a fix for this and there seem to be ~3 different things going wrong:

1. Using the logger in the timeout_handler at all seems to cause races (I'm guessing it's not threadsafe?)

My guess here was that in some test cases the timeout_handler subroutine is not released and it tries to log after the test has finished - but I was not able to prove this yet - nor pinpoint the specific cause for the subroutine to stay alive and logging.

We can try removing logging at all from the timeout handler, but we might send logging back via a channel perhaps, to avoid having zero logging from the timeout handler.

2. The timeout_handler depends on calling `GetRunKey` on TaskRuns and PipelineRuns, accessing properties of those objects is also not threadsafe

Good catch. I'm not sure why TestReconcilePropagateLabels or TestReconcile_InvalidPipelineRuns are causing issue specifically though.

3. The tests _are_ being run with `-race` but we're not catching the error. Looks like the the image we're using has version 1.11 of `go`, I'm guessing that newer versions are better at catching races?
docker run --entrypoint go gcr.io/tekton-releases/tests/test-runner@sha256:a4a64b2b70f85a618bbbcc6c0b713b313b2e410504dee24c9f90ec6fe3ebf63f version      
go version go1.11.4 linux/amd64

For the logging issue at least I read that go used to eat log messages silently before, now it's complaining if a subroutine logs something that will never be logged.

So my suggested plan of action:

* Add quick fixes for (1) and (2), then open an issue to look into threadsafe logging from the timeout handler

* Update `gcr.io/tekton-releases/tests/test-runner` to use a newer version of `go`

Sounds good!

@bobcatfish
Copy link
Collaborator

Created #1307 to add thread safe logging back in

bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Sep 13, 2019
The function `GetLogMessages` isn't used anywhere. I had tried to remove
it to see if it was causing the data race in tektoncd#1124 - it _isnt_ but still
it's not being used anywhere so why not remove :)
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Sep 13, 2019
Logging in the timeout handler was added as part of tektoncd#731 cuz it helped
us debug when the timeout handler didn't work as expected. Unfortunately
it looks like the logger we're using can't be used in multiple go
routines (uber-go/zap#99 may be related).
Removing this logging to fix tektoncd#1124, hopefully can find a safe way to add
logging back in tektoncd#1307.
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Sep 13, 2019
GetRunKey is accessed in goroutines via the timeout_handler; accessing
attributes of an object in a goroutine is not threadsafe. This is used
as a key in a map, so for now replacing this with a value that should be
unique but also threadsafe to fix tektoncd#1124
@bobcatfish
Copy link
Collaborator

Created tektoncd/plumbing#71 to update the base image we're using with Prow

tekton-robot pushed a commit that referenced this issue Sep 15, 2019
The function `GetLogMessages` isn't used anywhere. I had tried to remove
it to see if it was causing the data race in #1124 - it _isnt_ but still
it's not being used anywhere so why not remove :)
tekton-robot pushed a commit that referenced this issue Sep 15, 2019
Logging in the timeout handler was added as part of #731 cuz it helped
us debug when the timeout handler didn't work as expected. Unfortunately
it looks like the logger we're using can't be used in multiple go
routines (uber-go/zap#99 may be related).
Removing this logging to fix #1124, hopefully can find a safe way to add
logging back in #1307.
tekton-robot pushed a commit that referenced this issue Sep 15, 2019
GetRunKey is accessed in goroutines via the timeout_handler; accessing
attributes of an object in a goroutine is not threadsafe. This is used
as a key in a map, so for now replacing this with a value that should be
unique but also threadsafe to fix #1124
tekton-robot pushed a commit that referenced this issue Sep 16, 2019
This Pipeline will be triggered via prow over in the tektoncd/plumbing
repo every night. It will create releases of all images normally
released when doing official releases, plus also the image used for
building with ko, and tag them with the date and commit they were built
at, and will create the release.yaml as well.

This Pipeline is missing a few things that are in the manual release
Pipeline - due to #1124 unit tests have a race condition, due to #1205
the linting is flakey and it would be frustrating to lose a whole
nightly release, and finally due to using v0.3.1 it's not possible to
use workingDir, which is required by the golang build Task.

The Pipelines and Tasks have been updated to work with Tekton Pipelines
v0.3.1 because that's what we're using in our official cluster (since
currently Prow requires it).

Made release instructions more oriented toward someone actually
making a release vs. a random person trying to run the same pipeline
against their own infrastructure.

Removed example Runs b/c it's much simpler to invoke
via `tkn`, or Prow (these were falling out of date with how we were
actually using the Pipelines/Tasks as well).

Removed the `gcs-uploader-image` PipelineResource which is no longer
being used.

Fixes #860
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Issues or PRs related to testing kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants