Skip to content

Commit

Permalink
Repro race causing tektoncd#2815
Browse files Browse the repository at this point in the history
This commit hacks in a test that tries to reproduce the issue in tektoncd#2815.
Even then, it only reproduced the issue every once in a while - running
doit.sh will usually cause the test to fail eventually (but it's running
a test that spawns 200 subtests 1000 times!)

(The other way to more easily make this failure happen is to introduce
a sleep into InformedWatcher.addConfigMapEvent, i.e. delay processing of
the config map by the config store such that the test code always
executes first.)

I believe that what is happening is:
1. When the config store informed watcher starts, it starts go routines
   to handle incoming events
2. When this all starts for the first time, the shared informer will
   trigger "on changed" (added) events for the config maps so that
   the config store will process their values.
3. There is a race: if the goroutines started in (1) happen after the
   assertions in the test, then the config store will not be updated
   with the config map values and will resort to the defaults
   (i.e. not the CONFIGURED default in the configmap, but the harded
   coded fallback default, literally `default` in the case of the
   service account

What this means is that there is a (very small) chance that if a TaskRun
is created immediately after a controller starts up, the config store
might not have yet loaded the default values and will create the TaskRun
using the fall back hard coded default values and not whatever is
configured in the config maps (this would apply to all values
configurable via config maps).

I think this might be the same problem that is causing the knative/pkg
folks to see a flakey test knative/pkg#1907

If you look at the code that fails in that case:
https://github.com/knative/pkg/blob/c5296cd61e9f34e183f8010a40dc866f895252b9/configmap/informed_watcher_test.go#L86-L92

The comment says:

> When Start returns the callbacks should have been called with the version of the objects that is available.

So it seems the intention is that when Start returns, the callbacks have
been called (and in the config store case, this means the callbacks that
process the config map), but the flakes in this test and the knative
test are showing that this isn't the case.

I think the fix is to introduce that guarantee; i.e. that after Start
has returned, the callbacks have been called with the available version
of the objects.
  • Loading branch information
bobcatfish committed Nov 24, 2020
1 parent 7b5b2fa commit 1fbc388
Show file tree
Hide file tree
Showing 5 changed files with 49 additions and 3 deletions.
6 changes: 6 additions & 0 deletions doit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set -xe

for i in `seq 1 1000`; do
go test -count=1 ./pkg/reconciler/taskrun/... -run Test_UseConfigMapQuickly -mod=vendor
done

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 30 additions & 0 deletions pkg/reconciler/taskrun/taskrun_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,36 @@ func eventFromChannelUnordered(c chan string, wantEvents []string) error {
return fmt.Errorf("too many events received")
}

func Test_UseConfigMapQuickly(t *testing.T) {
defaultSAName := "pipelines"
d := test.Data{
ConfigMaps: []*corev1.ConfigMap{
{
ObjectMeta: metav1.ObjectMeta{Name: config.GetDefaultsConfigName(), Namespace: system.GetNamespace()},
Data: map[string]string{
"default-service-account": defaultSAName,
"default-timeout-minutes": "60",
"default-managed-by-label-value": "tekton-pipelines",
},
},
},
}
names.TestingSeed()
for i := 0; i < 200; i++ {
t.Run(fmt.Sprintf("quickly-%d", i), func(t *testing.T) {
t.Parallel()
testAssets, _ := getTaskRunController(t, d)
c := testAssets.Controller
ctx := c.Reconciler.GetConfigStoreContext(testAssets.Ctx)
cfg := config.FromContextOrDefaults(ctx)
if cfg.Defaults.DefaultServiceAccount != defaultSAName {
t.Log(cfg.Defaults)
t.Errorf("The configMap configured default service account %s has not been loaded, %s is being used instead; probably there has been a race! Please add this repro to #2815", defaultSAName, cfg.Defaults.DefaultServiceAccount)
}
})
}
}

func TestReconcile_ExplicitDefaultSA(t *testing.T) {
taskRunSuccess := tb.TaskRun("test-taskrun-run-success", tb.TaskRunNamespace("foo"), tb.TaskRunSpec(
tb.TaskRunTaskRef(simpleTask.Name, tb.TaskRefAPIVersion("a1")),
Expand Down
3 changes: 3 additions & 0 deletions vendor/knative.dev/pkg/configmap/informed_watcher.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions vendor/knative.dev/pkg/controller/controller.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 1fbc388

Please sign in to comment.