Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in GetCurrentNS temporarily #1131

Merged
merged 1 commit into from
Aug 21, 2020

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Aug 21, 2020

In GetCurrentNS, If there is a context-switch between
getCurrentThreadNetNSPath and GetNS, another goroutine may execute in
the original thread and change its network namespace, then the original
goroutine would get the updated network namespace, which could lead to
unexpected behavior, especially when GetCurrentNS is used to get the
host network namespace in netNS.Do.

Instead of using the provided netns argument, this patch fixes it by
getting the hostNS in advance with the OS thread locked.

Fixes #1113

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@tnqn
Copy link
Member Author

tnqn commented Aug 21, 2020

/test-all

@tnqn tnqn requested review from antoninbas and jianjuns August 21, 2020 12:44
@tnqn
Copy link
Member Author

tnqn commented Aug 21, 2020

@antoninbas This should fix the issue too without replacing the module in go.mod, please let me know if it makes sense to you.

In GetCurrentNS, If there is a context-switch between
getCurrentThreadNetNSPath and GetNS, another goroutine may execute in
the original thread and change its network namespace, then the original
goroutine would get the updated network namespace, which could lead to
unexpected behavior, especially when GetCurrentNS is used to get the
host network namespace in netNS.Do.

Instead of using the provided netns argument, this patch fixes it by
getting the hostNS in advance with the OS thread locked.
@tnqn
Copy link
Member Author

tnqn commented Aug 21, 2020

/test-all

@tnqn
Copy link
Member Author

tnqn commented Aug 21, 2020

/test-windows-networkpolicy

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if err := ns.WithNetNSPath(containerNetNS, func(hostNS ns.NetNS) error {

// This is a workaround for issue #1113, which is caused by https://github.com/containernetworking/plugins/issues/524.
// Instead of using the provided netns argument, which might not be the real hostNS, it fixes it by getting the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it fixes by getting -> this fix gets

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns I think "it fixes it" is correct here, even though I like your suggestion better. However, since this is your only comment and Quan may have started his weekend, I will merge this and proceed with the 0.9.1 release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem to me.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion for comments.

@antoninbas antoninbas merged commit 115a4b0 into antrea-io:master Aug 21, 2020
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Aug 21, 2020
In GetCurrentNS, If there is a context-switch between
getCurrentThreadNetNSPath and GetNS, another goroutine may execute in
the original thread and change its network namespace, then the original
goroutine would get the updated network namespace, which could lead to
unexpected behavior, especially when GetCurrentNS is used to get the
host network namespace in netNS.Do.

Instead of using the provided netns argument, this patch fixes it by
getting the hostNS in advance with the OS thread locked.
antoninbas pushed a commit that referenced this pull request Aug 21, 2020
In GetCurrentNS, If there is a context-switch between
getCurrentThreadNetNSPath and GetNS, another goroutine may execute in
the original thread and change its network namespace, then the original
goroutine would get the updated network namespace, which could lead to
unexpected behavior, especially when GetCurrentNS is used to get the
host network namespace in netNS.Do.

Instead of using the provided netns argument, this patch fixes it by
getting the hostNS in advance with the OS thread locked.
tnqn added a commit to tnqn/antrea that referenced this pull request Aug 27, 2020
In GetCurrentNS, If there is a context-switch between
getCurrentThreadNetNSPath and GetNS, another goroutine may execute in
the original thread and change its network namespace, then the original
goroutine would get the updated network namespace, which could lead to
unexpected behavior, especially when GetCurrentNS is used to get the
host network namespace in netNS.Do.

Instead of using the provided netns argument, this patch fixes it by
getting the hostNS in advance with the OS thread locked.
GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 22, 2020
In GetCurrentNS, If there is a context-switch between
getCurrentThreadNetNSPath and GetNS, another goroutine may execute in
the original thread and change its network namespace, then the original
goroutine would get the updated network namespace, which could lead to
unexpected behavior, especially when GetCurrentNS is used to get the
host network namespace in netNS.Do.

Instead of using the provided netns argument, this patch fixes it by
getting the hostNS in advance with the OS thread locked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some pods don't have L2 connectivity after all worker nodes are rebooted
5 participants