Skip to content

Commit

Permalink
Fix TestDAGPipelineRun flakiness.
Browse files Browse the repository at this point in the history
_See also the linked issue for a detailed explanation of the issue this fixes._

This change alters the DAG tests in two meaningful ways:
1. Have the tasks sleep, to actually increase the likelihood of task execution overlap,
2. Use the sleep duration for the minimum delta in start times.

These changes combine should guarantee that the tasks *actually* executed in parallel,
but the second part also enables this test to be less flaky on busy clusters where
`5s` may not be sufficient for the task to start.

A fun anecdote to note here is that the Kubernetes [SLO for Pod startup
latency](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md#definition)
is `5s` at `99P`, which means Tekton has effectively zero room for overhead.

Fixes: #4418
  • Loading branch information
mattmoor authored and tekton-robot committed Dec 14, 2021
1 parent 50300d9 commit 1a15022
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 4 deletions.
10 changes: 8 additions & 2 deletions test/dag_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ import (
knativetest "knative.dev/pkg/test"
)

const sleepDuration = 30 * time.Second

// TestDAGPipelineRun creates a graph of arbitrary Tasks, then looks at the corresponding
// TaskRun start times to ensure they were run in the order intended, which is:
// |
Expand Down Expand Up @@ -73,9 +75,13 @@ spec:
steps:
- image: busybox
script: 'echo $(params["text"])'
- image: busybox
# Sleep for N seconds so that we can check that tasks that
# should be run in parallel have overlap.
script: 'sleep %d'
- image: busybox
script: 'ln -s $(resources.inputs.repo.path) $(resources.outputs.repo.path)'
`, namespace))
`, namespace, int(sleepDuration.Seconds())))
if _, err := c.TaskClient.Create(ctx, echoTask, metav1.CreateOptions{}); err != nil {
t.Fatalf("Failed to create echo Task: %s", err)
}
Expand Down Expand Up @@ -242,7 +248,7 @@ func verifyExpectedOrder(ctx context.Context, t *testing.T, c clientset.TaskRunI
s1 := taskRuns[1].Status.StartTime.Time
s2 := taskRuns[2].Status.StartTime.Time
absDiff := time.Duration(math.Abs(float64(s2.Sub(s1))))
if absDiff > (time.Second * 5) {
if absDiff > sleepDuration {
t.Errorf("Expected parallel tasks to execute more or less at the same time, but they were %v apart", absDiff)
}
}
11 changes: 9 additions & 2 deletions test/v1alpha1/dag_test.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
//go:build e2e
// +build e2e

/*
Expand Down Expand Up @@ -34,6 +35,8 @@ import (
knativetest "knative.dev/pkg/test"
)

const sleepDuration = 30 * time.Second

// TestDAGPipelineRun creates a graph of arbitrary Tasks, then looks at the corresponding
// TaskRun start times to ensure they were run in the order intended, which is:
// |
Expand Down Expand Up @@ -75,9 +78,13 @@ spec:
steps:
- image: busybox
script: 'echo $(params["text"])'
- image: busybox
# Sleep for N seconds so that we can check that tasks that
# should be run in parallel have overlap.
script: 'sleep %d'
- image: busybox
script: 'ln -s $(resources.inputs.repo.path) $(resources.outputs.repo.path)'
`, namespace))
`, namespace, int(sleepDuration.Seconds())))
if _, err := c.TaskClient.Create(ctx, echoTask, metav1.CreateOptions{}); err != nil {
t.Fatalf("Failed to create echo Task: %s", err)
}
Expand Down Expand Up @@ -244,7 +251,7 @@ func verifyExpectedOrder(ctx context.Context, t *testing.T, c clientset.TaskRunI
s1 := taskRuns[1].Status.StartTime.Time
s2 := taskRuns[2].Status.StartTime.Time
absDiff := time.Duration(math.Abs(float64(s2.Sub(s1))))
if absDiff > (time.Second * 5) {
if absDiff > sleepDuration {
t.Errorf("Expected parallel tasks to execute more or less at the same time, but they were %v apart", absDiff)
}
}

0 comments on commit 1a15022

Please sign in to comment.