-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
increase nodeadm network resilience #1699
Conversation
491ee9c
to
f29fbe1
Compare
so.MaxAttempts = 15 | ||
so.MaxBackoff = 1 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default imds.New
uses the standard retryer but sets MaxBackoff
as 1
and does not change the MaxAttempts
from 3
, so we're just mimicking this and bumping the attempts
/ci |
@ndbaker1 the workflow that you requested has completed. 🎉
|
/ci |
@ndbaker1 the workflow that you requested has completed. 🎉
|
bumped some account limits and reran the ci :) |
After=network-online.service | ||
Wants=network-online.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how much of a bottleneck this will create in the systemd unit tree. Can you grab a before and after?
systemd-analyze plot > tree.svg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm mostly interested in whether cloud-init.service
already has some kind of dependency on one of the network targets:
systemctl list-dependencies cloud-init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
silly me got the network-online.target
swapped with network-online.service
🙃,
getting the data now, will follow up offline
f29fbe1
to
f6b4f87
Compare
@Issacwww I don't disagree, we'd have to move the definition for this dimension of the matrix out of the workflow file into a variable passed by the bot. I'll see what I can do 👍 |
* increase retries and total retry window for imds client * set nodeadm config step to wait for `network-online`
f6b4f87
to
c1053a1
Compare
the $ systemctl show network-online.target -p After
After=network.target systemd-networkd-wait-online.service cloud-init.service so per the network online docs we should just exercise normal retries. this PR should be more resilient than the existing AL2 retry, which is 10 tries with a linear 1s backoff |
Issue #, if available:
Description of changes:
increase retries and total retry window for imds client
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Testing Done
See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.