Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet.service: Wait for network-online.target #1250

Merged

Conversation

justaugustus
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

(Cherry pick of #1249.)

Whenever kubeadm detects a system that has systemd-resolved running, it would
provision the kubelet on the local node with a resolv.conf overwrite -
/run/systemd/resolve/resolv.conf.

However, some kubeadm users have discovered an issue during system boot.
The kubelet can end up in a race with the systemd-resolved service and actually
startup loads with empty or incorrect resolve.conf files.

The race is caused by the fact that the kubelet.service file does not indicate
dependence on the network-online.target.

To fix this we add network-online.target as a dependency and wait for its
initialization to complete before starting the kubelet.

Which issue(s) this PR fixes:

Closes #1248
ref: kubernetes/kubeadm#2111

Special notes for your reviewer:

/cc @neolit123
/cc @kubernetes/release-engineering @kubernetes/build-admins
/cc @kubernetes/sig-node-bugs
/assign @tpepper @saschagrunert @hasheddan
/priority important-longterm

Does this PR introduce a user-facing change?

systemd would now start the kubelet service after the network-online.target is reached.

Whenever kubeadm detects a system that has systemd-resolved running,
it would provision the kubelet on the local node with a resolv.conf
overwrite /run/systemd/resolve/resolv.conf.

However, some kubeadm users have discovered an issue during system boot.
The kubelet can end up in a race with the systemd-resolved service and
actually startup loads with empty or incorrect resolve.conf files.

The race is caused by the fact that the kubelet.service file does not
indicate dependence on the network-online.target.

To fix this we add network-online.target as a dependency and wait for
its initialization to complete before starting the kubelet.

Signed-off-by: Stephen Augustus <saugustus@vmware.com>
Co-authored-by: Rostislav M. Georgiev <rostislavg@vmware.com>
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 24, 2020
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. labels Apr 24, 2020
@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 24, 2020
Copy link
Contributor

@hasheddan hasheddan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hasheddan, justaugustus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@justaugustus
Copy link
Member Author

/retest

1 similar comment
@justaugustus
Copy link
Member Author

/retest

@saschagrunert
Copy link
Member

/test pull-release-cluster-up

@saschagrunert
Copy link
Member

I don't think that the failures are related to that change 🤔
/retest

@neolit123
Copy link
Member

i do not know the purpose of the build-admins branch, but LGTM.

@justaugustus
Copy link
Member Author

/retest

3 similar comments
@justaugustus
Copy link
Member Author

/retest

@justaugustus
Copy link
Member Author

/retest

@justaugustus
Copy link
Member Author

/retest

@justaugustus
Copy link
Member Author

/retest
@hasheddan @kubernetes/ci-signal -- Does this resemble any of the recent k/k failures?

@hasheddan
Copy link
Contributor

@justaugustus this doesn’t look familiar off the top of my head, but I will do some digging

@justaugustus
Copy link
Member Author

/retest

@justaugustus
Copy link
Member Author

/test pull-release-cluster-up
(failure fixed in #1314.)

@k8s-ci-robot k8s-ci-robot merged commit 7eb9247 into kubernetes:build-admins May 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants