Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tap server fails if kernel interfaces already exist #315

Closed
Bolodya1997 opened this issue Jul 16, 2021 · 4 comments
Closed

Tap server fails if kernel interfaces already exist #315

Bolodya1997 opened this issue Jul 16, 2021 · 4 comments
Assignees

Comments

@Bolodya1997
Copy link

Bolodya1997 commented Jul 16, 2021

Expected Behavior

Tap chain element shouldn't fail if kernel interfaces already exist.

Current Behavior

Tap chain element starts failing if it has already created kernel interfaces but Request comes again for the same NSMgr with another Connection.Id.

Steps to Reproduce

  1. Client Requests NSM.
  2. Requests successes in Forwarder and starts returning.
  3. Request fails with timeout.
  4. Client Requests NSM again with another Request (but same id on 0 path segment).
  5. Tap server fails with VPPApiError: netlink error (-145)
  6. 4-5 reproduces while Client is running.

Failure Logs

NSMgr logs
VPP Forwarder logs

@Bolodya1997
Copy link
Author

It is actually related to the networkservicemesh/sdk#1020, but probably can be solved in some other way from the tap chain element side.

@edwarnicke
Copy link
Member

@Bolodya1997 is this resolved by networkservicemesh/sdk#1014 ?

@Bolodya1997
Copy link
Author

@edwarnicke
It looks like there is the following issue:

  1. Client performs a Request, it reaches Forwarder as id-1 - Forwarder creates a tap interface with id-client name and responses with it.
  2. Request timeout happens before response reaching the Client - no one at this point would call Close for the Forwarder (Resources leak until timeout if response fails to return to the Client sdk#1020).
  3. Client performs a Request, it reaches Forwarder as id-2 - Forwarder tries to create a tap interface with id-client name and fails because there is already an interface with such name.
  4. [3] repeats on every subsequent Request up to the Forwarder cleans tap interface on timeout.

So networkservicemesh/sdk#1014 doesn't solve this issue.

Actually we have here a problem that both id-1 and id-2 are requesting for the same tap interface. Normally this shouldn't happen, because:

  1. If Client restarts, it fetches old path and so Request reaches Forwarder as id-1.
  2. If Endpoint dies, id-1 is getting Closed during the healing, so id-2 is OK.
  3. If NSMgr restarts, Client restores Connection with old path, so Request reaches Forwarder as id-1.

So the problem here is the following - even if we reuse existing id-client interface for id-2, timeout will happen for id-1 and it would delete this interface. So we either need to somehow close id-1 without waiting for the timeout, or create some refcount(?) for the tap interface.

Thougths?

@glazychev-art
Copy link
Contributor

We don't see this problem anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants