no such device from LinkSetNsPid for veth #184

ghost · 2016-12-19T18:12:24Z

I create veth interface for my custom network configuration and some time i see no such device errors when i'm trying to call netlink.LinkSetNsPid on the newly created interface.

The interesting thing that i was able to obtain link handle via LinkByName or via LinkList before i called LinkSetNsPid, so device was created successfully. I also see devices created with ip link

I started debugging and added LinkList just after the error happened + another one with 1 second delay. What i see consistently (if the issue occurred) is pretty weird: the first output usually contains only few records with incorrect device index values, while the second one has all the records i has configured on the host with correct indexes.

1st output (just after failure), LinkList results has size 3:

time="2016-12-19T10:43:40Z" level=warning msg="=== debug interfaces, 1st attempt ==="
time="2016-12-19T10:43:40Z" level=debug msg="=== current interfaces list (count 3) ==="
time="2016-12-19T10:43:40Z" level=debug msg="interface record -> name: lo idx: 1"
time="2016-12-19T10:43:40Z" level=debug msg="interface record -> name: vethbada13B idx: 2"
time="2016-12-19T10:43:40Z" level=debug msg="interface record -> name: vethbada13A idx: 3"
time="2016-12-19T10:43:40Z" level=debug msg="=== end current interface list ==="

2nd attempt output (after 1 second), LinkList result has size 23 (expected)

time="2016-12-19T10:43:41Z" level=warning msg="=== debug interfaces, 2nd attempt ==="
time="2016-12-19T10:43:41Z" level=debug msg="=== current interfaces list (count 23) ==="
time="2016-12-19T10:43:41Z" level=debug msg="interface record -> name: lo idx: 1"
time="2016-12-19T10:43:41Z" level=debug msg="interface record -> name: eth0 idx: 2"
time="2016-12-19T10:43:41Z" level=debug msg="interface record -> name: docker0 idx: 3"
time="2016-12-19T10:43:41Z" level=debug msg="interface record -> name: eth1 idx: 4"
time="2016-12-19T10:43:41Z" level=debug msg="interface record -> name: vetha434e0B idx: 1036"
time="2016-12-19T10:43:41Z" level=debug msg="interface record -> name: vetha434e0A idx: 1037"

We start a lot of containers and this issue happens only in about 0.5-1% launches (100 crashes out of 20000 tasks last night). We run on AWS and our instances are recycled once in a few days, but even with that, the issue tends to happen on the same subset of hosts where it happened once. (3-5 hosts out of 200)

@vishvananda i'm not sure if it's go netlink bug or some kernel issue. If you have any recommendations what to look at next, it would be super helpful. I can add more output to the netlink as well.

Our configuration is

Linux mainvpc-r3.8xlarge-i-00e3fe8d965d0c577 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:09:17 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

vishvananda · 2017-01-05T21:16:09Z

when we setns it is only for the current thread. It looks like sometimes your code is running from a thread in a different namespace, giving you different results. First make sure you are using LockOSThread for the duration of the namespace get/set. The other possibility is that the runtime is starting a new os thread at some point (lock doesn't prevent that) which will be in an incorrect namespace. If that is the issue, it appears that the only safe way to do it is to exec a new process to do the work you need, lock thread, do work, then exit. See discussion here: vishvananda/netns#17

ghost · 2017-01-05T22:53:25Z

Thanks. We did exactly what you recommended (separate process).

ghost closed this as completed Jan 5, 2017

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no such device from LinkSetNsPid for veth #184

no such device from LinkSetNsPid for veth #184

ghost commented Dec 19, 2016 •

edited by ghost

Loading

vishvananda commented Jan 5, 2017

ghost commented Jan 5, 2017

no such device from LinkSetNsPid for veth #184

no such device from LinkSetNsPid for veth #184

Comments

ghost commented Dec 19, 2016 • edited by ghost Loading

vishvananda commented Jan 5, 2017

ghost commented Jan 5, 2017

ghost commented Dec 19, 2016 •

edited by ghost

Loading