Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jenkinsfiles: Increase VM boot timeout #19458

Merged
merged 1 commit into from
Apr 18, 2022

Conversation

pchaigno
Copy link
Member

This pull request increases the VM boot timeout while decreasing the overall timeout :mindblown:

We currently run the vagrant-ci-start.sh script with a 15m timeout and retry twice if it fails. That takes up to 45m in total if all attempts fail, as in frequently happening in CI right now. In particular, if the script simply fails because it's taking on average more than 15m then it is likely to fail all three times.

This pull request instead increases the timeout from 15m to 25m and removes the retries. The goal is obviously to succeed on the first try :p

Ideally, we would investigate why it is now taking longer to start the VM. But this issue has been happening for a long time. And because of the retries, we probably didn't even notice the increase at the beginning: if it takes on average 15min, it might fail half the time and the test might still succeed most of the time. That is, the retries participate to hide the increase.

This commit increases the VM boot timeout while decreasing the overall
timeout :mindblown:

We currently run the vagrant-ci-start.sh script with a 15m timeout and
retry twice if it fails. That takes up to 45m in total if all attempts
fail, as in frequently happening in CI right now. In particular, if the
script simply fails because it's taking on average more than 15m then
it is likely to fail all three times.

This commit instead increases the timeout from 15m to 25m and removes
the retries. The goal is obviously to succeed on the first try :p

Ideally, we would investigate why it is now taking longer to start the
VM. But this issue has been happening for a long time. And because of
the retries, we probably didn't even notice the increase at the
beginning: if it takes on average 15min, it might fail half the time and
the test might still succeed most of the time. That is, the retries
participate to hide the increase.

Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Apr 15, 2022
@pchaigno pchaigno requested a review from a team as a code owner April 15, 2022 14:19
@pchaigno pchaigno requested a review from nebril April 15, 2022 14:19
@pchaigno
Copy link
Member Author

/test-jenkins

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've often seen three failures in a row because the appropriate image couldn't be pulled, so for those cases this should fail earlier and hopefully not fail (given that the single download time is longer). So this makes sense to me to try out 🚀

@pchaigno
Copy link
Member Author

Marked for backports given we're hitting this in all stable branches, even v1.9.

@pchaigno
Copy link
Member Author

k8s-1.21-kernel-5.4 failed with known flakes #16122 and #16852. Other Jenkins tests are passing. We have two reviews to cover what should be a fairly minor change, so I think it's okay to skip the review from @cilium/ci-structure. Marking ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 18, 2022
@joestringer joestringer merged commit cfec27a into cilium:master Apr 18, 2022
@pchaigno pchaigno deleted the fix-netnext-vm-provisioning branch April 18, 2022 17:37
@tklauser tklauser added backport-pending/1.10 backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed needs-backport/1.10 labels Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants