sdn: only sync HostPorts when we need to #16692

dcbw · 2017-10-05T04:22:54Z

Which is the first time a pod is started, when there will be active hostports, or when there are current active hostports. Otherwise the syncer runs iptables-restore for no good reason.

@openshift/networking @knobunc @danwinship

danwinship · 2017-10-05T13:23:35Z

The commit message:

Which is the first time a pod is started, or when there is a transition
between 0 and >0 mappings. Otherwise the syncer runs iptables-restore
for no good reason.

doesn't match what the code actually does:

+	// Only sync if this is the first time syncing (to clear out stale mappings
+	// if kubelet crashed), or we used to have pod mappings, or we have new
+	// pod mappings. Otherwise don't bother.

And neither seems right; unless I'm misunderstanding, we should resync at startup, when a pod-with-hostports is added, or when a pod-with-hostports is deleted.

dcbw · 2017-10-05T15:49:28Z

And neither seems right; unless I'm misunderstanding, we should resync at startup, when a pod-with-hostports is added, or when a pod-with-hostports is deleted.

@danwinship I can clarify. It's actually when the first pod is started; we never sync hostports on startup (neither does kube...). Basically, if we crashed, then we'll ensure to resync when the node gets its first sandbox setup request, even if the pod being set up doesn't have any hostports.

dcbw · 2017-10-05T15:52:53Z

doesn't match what the code actually does:

@danwinship I don't follow... which part doesn't match what the code does? The "first time" is hostportsSynced = false, while the transitions are m.activeHostports and newActiveHostports.

danwinship · 2017-10-05T16:56:07Z

It's not the "first time"/"startup" thing I was talking about, it was the other clause.

The commit message says "first time, or when we transition from 0 to > 0 mappings". The code seems to implement "first time, or whenever there are > 0 mappings". Eg, the commit message implies that if there is already a pod with a hostport, and we add a second pod with a hostport, then we wouldn't resync (because we're going from 1 mapping to 2), but the code as written would resync in this case.

And in fact, the code as written would resync in the case where we have a pod with a hostport, and add a second pod without a hostport. But I don't see why we should resync in that case, since the hostports aren't actually changing.

Or else I'm totally misreading the code...

Which is the first time a pod is started, when there will be active hostports, or when there used to be active hostports. Otherwise the syncer runs iptables-restore for no good reason.

dcbw · 2017-10-05T21:10:13Z

Or else I'm totally misreading the code...

@danwinship I see what you mean; updated the code comments to match intent; no code changes. Essentially we want to never sync hostports unless there are hostports to sync, except that we do want to sync the first time a pod is started/killed on the node to ensure there aren't stale entries from a crash.

dcbw · 2017-10-05T22:07:35Z

@danwinship to your IRC question about why we bother syncing when the current pod doesn't have hostports, the answer is that if we start removing locking then runningPods could have changed and we'd want to get the latest port mappings. But that could also become irrelevant if we switched to the HostportManager instead of the HostportSyncer from upstream kube.

Except while that means less code for us, the HostportManager actually runs iptables-save, parses that, adds/removes the current pod, and then restores. And of course iptables could have changed in the meantime, so...

danwinship · 2017-10-06T14:45:15Z

if we start removing locking then runningPods could have changed and we'd want to get the latest port mappings

What locking? If we removed the serialization in podManager then everything would completely fall apart; eg, the SyncHostports() call for an earlier state could end up being called after the SyncHostports() call for a later state, reverting its changes.

And at any rate, we're not currently removing locking, so why not take advantage of that now, and then if something changes later that requires us to SyncHostports more often, then we can add back the extra syncing then?

dcbw · 2017-10-10T16:21:59Z

And at any rate, we're not currently removing locking, so why not take advantage of that now, and then if something changes later that requires us to SyncHostports more often, then we can add back the extra syncing then?

@danwinship that is true. However, SyncHostports() always operates with runningPods. So as long as that's kept up-to-date we won't run into the problem you describe. What I mean by removing locking is making it more granular, so that only runningPods is protected rather than the whole processCNIRequest() function. That ensures that SyncHostports() would be consistent with the current runningPods state.

danwinship · 2017-10-10T16:23:19Z

OK, dcbw is mostly-not-around for a while and there's a benefit to online of committing at least the part that is already fixed here.
/lgtm
/test

openshift-merge-robot · 2017-10-10T16:23:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, dcbw

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/network/OWNERS~~ [danwinship,dcbw]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

danwinship · 2017-10-10T16:24:20Z

/test all

danwinship · 2017-10-10T16:25:33Z

(oh, ha ha, go and add a comment right before I claim that you're not around :-P)

knobunc · 2017-10-10T19:21:09Z

/retest

eparis · 2017-10-10T19:48:32Z

/retest

openshift-merge-robot · 2017-10-10T21:06:51Z

/test all [submit-queue is verifying that this PR is safe to merge]

openshift-ci-robot · 2017-10-10T22:35:14Z

@dcbw: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/origin/unit	`b3fc39c`	link	`/test origin-ut`
ci/openshift-jenkins/origin/verify	`b3fc39c`	link	`/test origin-verify`
ci/openshift-jenkins/extended_networking_minimal	`b3fc39c`	link	`/test extended_networking_minimal`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2017-10-10T23:25:24Z

Automatic merge from submit-queue (batch tested with PRs 16741, 16692).

jupierce · 2017-10-11T14:59:26Z

@danwinship Can/should this be backported to 3.6?

danwinship · 2017-10-11T17:17:56Z

It could be... but the problem it's trying to fix is not known to affect anyone except Online. (And we don't know for sure that it will fix things for them yet.)

@knobunc

Automatic merge from submit-queue (batch tested with PRs 16741, 16692). sdn: only sync HostPorts when we need to Which is the first time a pod is started, when there will be active hostports, or when there are current active hostports. Otherwise the syncer runs iptables-restore for no good reason. @openshift/networking @knobunc @danwinship

@dcbw

Automatic merge from submit-queue (batch tested with PRs 17032, 17027). Fix the check for "pod has HostPorts" #16692 didn't work because `kubehostport.PodPortMapping.PortMappings` includes *all* ports that a pod declares, not just hostports, so `shouldSyncHostports` will return true if any pod declares that it listens on any port. (ie, always) This fixes that, which should make the code work in the way #16692 intended for it to work ("only call SyncHostPorts if there is at least one pod on the node that uses HostPorts"). It doesn't attempt to change it to "only call SyncHostPorts if the set of active HostPorts actually changed" because that would be larger and less suitable as a last-minute fix. @dcbw FYI

Which is the first time a pod is started, when there will be active hostports, or when there are current active hostports. Otherwise the syncer runs iptables-restore for no good reason. Backport of openshift#16692 to 3.6 branch

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 5, 2017

openshift-merge-robot assigned knobunc and rajatchopra Oct 5, 2017

openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2017

dcbw force-pushed the sdn-sync-hostports-less branch from b2eb393 to 2f58b6c Compare October 5, 2017 15:53

sdn: only sync HostPorts when we need to

b3fc39c

Which is the first time a pod is started, when there will be active hostports, or when there used to be active hostports. Otherwise the syncer runs iptables-restore for no good reason.

dcbw force-pushed the sdn-sync-hostports-less branch from 2f58b6c to b3fc39c Compare October 5, 2017 21:08

openshift-ci-robot assigned danwinship Oct 10, 2017

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 10, 2017

openshift-merge-robot merged commit e610f2a into openshift:master Oct 10, 2017

knobunc mentioned this pull request Oct 11, 2017

sdn: only sync HostPorts when we need to (backport to stage) #16814

Merged

knobunc deleted the sdn-sync-hostports-less branch October 18, 2017 12:27

danwinship mentioned this pull request Oct 25, 2017

Fix the check for "pod has HostPorts" #17032

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sdn: only sync HostPorts when we need to #16692

sdn: only sync HostPorts when we need to #16692

dcbw commented Oct 5, 2017 •

edited

Loading

danwinship commented Oct 5, 2017

dcbw commented Oct 5, 2017

dcbw commented Oct 5, 2017

danwinship commented Oct 5, 2017

dcbw commented Oct 5, 2017

dcbw commented Oct 5, 2017 •

edited

Loading

danwinship commented Oct 6, 2017

dcbw commented Oct 10, 2017

danwinship commented Oct 10, 2017

openshift-merge-robot commented Oct 10, 2017

danwinship commented Oct 10, 2017

danwinship commented Oct 10, 2017

knobunc commented Oct 10, 2017

eparis commented Oct 10, 2017

openshift-merge-robot commented Oct 10, 2017

openshift-ci-robot commented Oct 10, 2017 •

edited

Loading

openshift-merge-robot commented Oct 10, 2017

jupierce commented Oct 11, 2017 •

edited

Loading

danwinship commented Oct 11, 2017

sdn: only sync HostPorts when we need to #16692

sdn: only sync HostPorts when we need to #16692

Conversation

dcbw commented Oct 5, 2017 • edited Loading

danwinship commented Oct 5, 2017

dcbw commented Oct 5, 2017

dcbw commented Oct 5, 2017

danwinship commented Oct 5, 2017

dcbw commented Oct 5, 2017

dcbw commented Oct 5, 2017 • edited Loading

danwinship commented Oct 6, 2017

dcbw commented Oct 10, 2017

danwinship commented Oct 10, 2017

openshift-merge-robot commented Oct 10, 2017

danwinship commented Oct 10, 2017

danwinship commented Oct 10, 2017

knobunc commented Oct 10, 2017

eparis commented Oct 10, 2017

openshift-merge-robot commented Oct 10, 2017

openshift-ci-robot commented Oct 10, 2017 • edited Loading

openshift-merge-robot commented Oct 10, 2017

jupierce commented Oct 11, 2017 • edited Loading

danwinship commented Oct 11, 2017

dcbw commented Oct 5, 2017 •

edited

Loading

dcbw commented Oct 5, 2017 •

edited

Loading

openshift-ci-robot commented Oct 10, 2017 •

edited

Loading

jupierce commented Oct 11, 2017 •

edited

Loading