Fix `TestDAGPipelineRun` flakiness. #4419

mattmoor · 2021-12-14T02:29:54Z

See also the linked issue for a detailed explanation of the issue this fixes.

This change alters the DAG tests in two meaningful ways:

Have the tasks sleep, to actually increase the likelihood of task execution overlap,
Use the sleep duration for the minimum delta in start times.

These changes combine should guarantee that the tasks actually executed in parallel, but the second part also enables this test to be less flaky on busy clusters where 5s may not be sufficient for the task to start.

A fun anecdote to note here is that the Kubernetes SLO for Pod startup latency is 5s at 99P, which means Tekton had effectively zero room for overhead. 😅

Fixes: #4418

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

Docs included if any changes are user facing
Tests included if any functionality added or changed
Follows the commit message standard
Meets the Tekton contributor standards (including
functionality, content, code)
Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

NONE

linux-foundation-easycla · 2021-12-14T02:29:58Z

The committers are authorized under a signed CLA.

✅ Matt Moore (2bd7909)

mattmoor · 2021-12-14T02:31:04Z

Paging @dlorenc on the CLA bit 😅

mattmoor · 2021-12-14T03:28:27Z

awesome... tabs vs. spaces 🤦

_See also the linked issue for a detailed explanation of the issue this fixes._ This change alters the DAG tests in two meaningful ways: 1. Have the tasks sleep, to actually increase the likelihood of task execution overlap, 2. Use the sleep duration for the minimum delta in start times. These changes combine should guarantee that the tasks *actually* executed in parallel, but the second part also enables this test to be less flaky on busy clusters where `5s` may not be sufficient for the task to start. A fun anecdote to note here is that the Kubernetes [SLO for Pod startup latency](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md#definition) is `5s` at `99P`, which means Tekton has effectively zero room for overhead. Fixes: tektoncd#4418

afrittoli · 2021-12-14T15:51:36Z

/test check-pr-has-kind-label

dlorenc · 2021-12-14T16:59:54Z

/lgtm

afrittoli · 2021-12-14T18:50:28Z

test/v1alpha1/dag_test.go

@@ -34,6 +35,8 @@ import (
 	knativetest "knative.dev/pkg/test"
 )

+const sleepDuration = 30 * time.Second


30s feels like a long time to wait in a test?

Waiting for a rerun of the e2e tests is a lot longer 😉

I'm happy to lower this, but not sure what you are comfortable with. 5s is the 99P scheduling latency on K8s when there's available capacity, and running these tests with t.Parallel() on KinD things get busy quickly. I've seen 9s apart in recently memory, but think I've seen up to 13s.

(to be clear the 9s and 13s are failures with what's at HEAD)

I reduced this to 15s, which is larger than I can recall seeing this fail with.

imjasonh · 2021-12-14T20:03:47Z

/lgtm

dlorenc · 2021-12-14T20:48:37Z

/approve

tekton-robot · 2021-12-14T20:48:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dlorenc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dlorenc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

afrittoli · 2021-12-14T20:51:13Z

/test check-pr-has-kind-label

tekton-robot added release-note-none Denotes a PR that doesnt merit a release note. kind/bug Categorizes issue or PR as related to a bug. labels Dec 14, 2021

tekton-robot requested review from jerop and vdemeester December 14, 2021 02:30

tekton-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 14, 2021

mattmoor force-pushed the sleep-test branch from aea2644 to 04a921c Compare December 14, 2021 02:42

mattmoor force-pushed the sleep-test branch from 04a921c to 2bd7909 Compare December 14, 2021 05:22

tekton-robot assigned dlorenc Dec 14, 2021

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2021

afrittoli reviewed Dec 14, 2021

View reviewed changes

Reduce the sleep to 15s

1845fba

tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2021

tekton-robot assigned imjasonh Dec 14, 2021

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2021

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2021

tekton-robot merged commit 9a7a331 into tektoncd:main Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `TestDAGPipelineRun` flakiness. #4419

Fix `TestDAGPipelineRun` flakiness. #4419

mattmoor commented Dec 14, 2021 •

edited

Loading

linux-foundation-easycla bot commented Dec 14, 2021 •

edited

Loading

mattmoor commented Dec 14, 2021

mattmoor commented Dec 14, 2021

afrittoli commented Dec 14, 2021

dlorenc commented Dec 14, 2021

afrittoli Dec 14, 2021

mattmoor Dec 14, 2021

mattmoor Dec 14, 2021

mattmoor Dec 14, 2021

imjasonh commented Dec 14, 2021

dlorenc commented Dec 14, 2021

tekton-robot commented Dec 14, 2021

afrittoli commented Dec 14, 2021

Fix TestDAGPipelineRun flakiness. #4419

Fix TestDAGPipelineRun flakiness. #4419

Conversation

mattmoor commented Dec 14, 2021 • edited Loading

Submitter Checklist

Release Notes

linux-foundation-easycla bot commented Dec 14, 2021 • edited Loading

mattmoor commented Dec 14, 2021

mattmoor commented Dec 14, 2021

afrittoli commented Dec 14, 2021

dlorenc commented Dec 14, 2021

afrittoli Dec 14, 2021

Choose a reason for hiding this comment

mattmoor Dec 14, 2021

Choose a reason for hiding this comment

mattmoor Dec 14, 2021

Choose a reason for hiding this comment

mattmoor Dec 14, 2021

Choose a reason for hiding this comment

imjasonh commented Dec 14, 2021

dlorenc commented Dec 14, 2021

tekton-robot commented Dec 14, 2021

afrittoli commented Dec 14, 2021

Fix `TestDAGPipelineRun` flakiness. #4419

Fix `TestDAGPipelineRun` flakiness. #4419

mattmoor commented Dec 14, 2021 •

edited

Loading

linux-foundation-easycla bot commented Dec 14, 2021 •

edited

Loading