Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay driver appears to be leaking veth devices #984

Closed
thomasem opened this issue Mar 2, 2016 · 6 comments · Fixed by #995
Closed

Overlay driver appears to be leaking veth devices #984

thomasem opened this issue Mar 2, 2016 · 6 comments · Fixed by #995

Comments

@thomasem
Copy link

thomasem commented Mar 2, 2016

Using Docker Swarm v1.1.0, Docker 1.10.1

From https://github.com/getcarina/feedback/issues/40:

⚠️ NOTE: You must create more than one container on the overlay
network to see this problem.

Steps to recreate:

  1. Start with a fresh one-segment cluster and get a baseline for already
    existing devices on the segment:

    $ docker run -it --rm --net=host cirros sh -c 'ip link'
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop
        link/ipip 0.0.0.0 brd 0.0.0.0
    3: ip_vti0@NONE: <NOARP> mtu 1428 qdisc noop
        link/ipip 0.0.0.0 brd 0.0.0.0
    4: sit0@NONE: <NOARP> mtu 1480 qdisc noop
        link/sit 0.0.0.0 brd 0.0.0.0
    5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
        link/ether 02:42:8a:78:65:5e brd ff:ff:ff:ff:ff:ff
    167: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether bc:76:4e:20:ce:6b brd ff:ff:ff:ff:ff:ff
    169: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether bc:76:4e:20:0e:ad brd ff:ff:ff:ff:ff:ff
    
  2. Create a new overlay network:

    $ docker network create --driver overlay test-veth
    7f06c598a5a3100f3731a1254fe013515b5462f87914a19aaef91226436996fa
    
  3. Spin up two new containers on this network:

    $ docker run -d --net=test-veth cirros sh -c 'while true; do echo "hello" && sleep 300; done'
    9913d13615c21a5a38fb317d2a03832ea502a58a8666f5179ee369da69f8214c
    
    $ docker run -d --net=test-veth cirros sh -c 'while true; do echo "hello" && sleep 300; done'
    cfb78dc596da1f6a4735de4831a1b43501ab2fd1a44fa55a5b867a82e749f560
    
    $ docker ps
    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
    cfb78dc596da        cirros              "sh -c 'while true; d"   1 seconds ago       Up 1 seconds                            6e1ac9f2-f779-42ea-9821-cd7b32bb6b7c-n1/sick_mestorf
    9913d13615c2        cirros              "sh -c 'while true; d"   5 seconds ago       Up 4 seconds                            6e1ac9f2-f779-42ea-9821-cd7b32bb6b7c-n1/boring_hugle
    1b11bf2f171f        carina/consul       "/bin/consul agent -b"   6 minutes ago       Up 6 minutes                            6e1ac9f2-f779-42ea-9821-cd7b32bb6b7c-n1/carina-svcd
    
  4. Check the segment's network namespace to see the newly created devices,
    docker_gwbridge, vethb5edef0, and veth79df616:

    $ docker run -it --rm --net=host cirros sh -c 'ip link'
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop
        link/ipip 0.0.0.0 brd 0.0.0.0
    3: ip_vti0@NONE: <NOARP> mtu 1428 qdisc noop
        link/ipip 0.0.0.0 brd 0.0.0.0
    4: sit0@NONE: <NOARP> mtu 1480 qdisc noop
        link/sit 0.0.0.0 brd 0.0.0.0
    5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
        link/ether 02:42:8a:78:65:5e brd ff:ff:ff:ff:ff:ff
    20: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
        link/ether 02:42:4a:2d:fe:8c brd ff:ff:ff:ff:ff:ff
    22: veth79df616: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge
        link/ether 46:e9:29:dc:37:ba brd ff:ff:ff:ff:ff:ff
    26: vethb5edef0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge
        link/ether c2:29:be:52:3b:78 brd ff:ff:ff:ff:ff:ff
    167: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether bc:76:4e:20:ce:6b brd ff:ff:ff:ff:ff:ff
    169: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether bc:76:4e:20:0e:ad brd ff:ff:ff:ff:ff:ff
    
  5. Delete one of the containers that was created and attached to the
    test-veth network:

    $ docker rm -fv 9913d
    9913d
    
  6. Check the segment's network namespace again, and notice that the list has
    the same number of veth devices as before, where it should be one short, and
    observe that veth79df616 is gone, replaced by veth1d6abe9.

    $ docker run -it --rm --net=host cirros sh -c 'ip link'
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop
        link/ipip 0.0.0.0 brd 0.0.0.0
    3: ip_vti0@NONE: <NOARP> mtu 1428 qdisc noop
        link/ipip 0.0.0.0 brd 0.0.0.0
    4: sit0@NONE: <NOARP> mtu 1480 qdisc noop
        link/sit 0.0.0.0 brd 0.0.0.0
    5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
        link/ether 02:42:8a:78:65:5e brd ff:ff:ff:ff:ff:ff
    18: veth1d6abe9: <BROADCAST,MULTICAST> mtu 1450 qdisc noop
        link/ether 02:42:0a:00:00:02 brd ff:ff:ff:ff:ff:ff
    20: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
        link/ether 02:42:4a:2d:fe:8c brd ff:ff:ff:ff:ff:ff
    26: vethb5edef0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge
        link/ether c2:29:be:52:3b:78 brd ff:ff:ff:ff:ff:ff
    167: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether bc:76:4e:20:ce:6b brd ff:ff:ff:ff:ff:ff
    169: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether bc:76:4e:20:0e:ad brd ff:ff:ff:ff:ff:ff
    
  7. Delete the last container that was spun up on the test-veth overlay
    network:

    $ docker rm -fv cfb78
    cfb78
    
  8. Check the segment's network namespace again and observe the same behavior,
    except this time vethb5edef0 is replaced by veth0ad699c:

    $ docker run -it --rm --net=host cirros sh -c 'ip link'
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop
        link/ipip 0.0.0.0 brd 0.0.0.0
    3: ip_vti0@NONE: <NOARP> mtu 1428 qdisc noop
        link/ipip 0.0.0.0 brd 0.0.0.0
    4: sit0@NONE: <NOARP> mtu 1480 qdisc noop
        link/sit 0.0.0.0 brd 0.0.0.0
    5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
        link/ether 02:42:8a:78:65:5e brd ff:ff:ff:ff:ff:ff
    18: veth1d6abe9: <BROADCAST,MULTICAST> mtu 1450 qdisc noop
        link/ether 02:42:0a:00:00:02 brd ff:ff:ff:ff:ff:ff
    19: veth0ad699c: <BROADCAST,MULTICAST> mtu 1450 qdisc noop
        link/ether ce:d4:cd:f3:92:86 brd ff:ff:ff:ff:ff:ff
    20: docker_gwbridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
        link/ether 02:42:4a:2d:fe:8c brd ff:ff:ff:ff:ff:ff
    167: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether bc:76:4e:20:ce:6b brd ff:ff:ff:ff:ff:ff
    169: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether bc:76:4e:20:0e:ad brd ff:ff:ff:ff:ff:ff
    
  9. Double check that all containers on that network were deleted:

    $ docker ps
    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
    1b11bf2f171f        carina/consul       "/bin/consul agent -b"   16 minutes ago      Up 16 minutes                           6e1ac9f2-f779-42ea-9821-cd7b32bb6b7c-n1/carina-svcd
    
@thomasem
Copy link
Author

thomasem commented Mar 2, 2016

I suspect this is in the overlay driver, since this isn't the case when using the default bridge driver.

@dvenza
Copy link

dvenza commented Mar 6, 2016

I see this issue too and had to reboot all Swarm hosts to get rid of the 300+ veth devices per host that accumulated and were driving the monitoring system crazy. Now they are building up again.

@blufor
Copy link

blufor commented Mar 7, 2016

You can run this as a quick fix:

 ip -4 -o l l | awk '/veth.*state\ DOWN/ { sub(/\@.*$/,"",$2); print $2 }' | xargs -L1 ip l del

it cleans all downed veth interfaces on your docker host

@aboch
Copy link
Contributor

aboch commented Mar 7, 2016

@thomasem
If you reproduce with daemon logs enabled, can you confirm you see the following:

DEBU[0753] Failed to retrieve interface (vethxxxxxxx)'s link on endpoint (<eid>) delete: Link not found

@mavenugo
Copy link
Contributor

mavenugo commented Mar 7, 2016

@thomasem also, can you paste the docker info & docker version output ? (on both swarm and engine) ?

@thomasem
Copy link
Author

thomasem commented Mar 8, 2016

Awesome. Thanks @aboch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants