Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker swarm problems with Mac OS X as nodes #38618

Closed
user121216 opened this issue Jan 22, 2019 · 6 comments
Closed

Docker swarm problems with Mac OS X as nodes #38618

user121216 opened this issue Jan 22, 2019 · 6 comments
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. platform/desktop

Comments

@user121216
Copy link

Description

With an CentOS as manager, it's only possible to join the swarm with one Mac OS X (10.13) at the same time otherwise the docker node engine is crashing (error stack see below).

Steps to reproduce the issue:

  1. Init swarm on CentOS docker swarm init --advertise-addr 192.168.5.10:2377
  2. Copy & paste token on Mac OS X docker swarm join --token #### 192.168.5.10:2377
  3. Copy & paste token on another Mac OS X docker swarm join --token #### 192.168.5.10:2377
  4. After joining the other Mac OS X docker engine crashes

Describe the results you received:

2019-01-22T13:08:24Z docker github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb.(*NetworkDB).rejoinClusterBootStrap(0xc421143320)
2019-01-22T13:08:24Z docker 	/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb/cluster.go:305 +0x2d0
2019-01-22T13:08:24Z docker github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb.(*NetworkDB).(github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb.rejoinClusterBootStrap)-fm()
2019-01-22T13:08:24Z docker 	/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb/cluster.go:175 +0x2c
2019-01-22T13:08:24Z docker github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb.(*NetworkDB).triggerFunc(0xc421143320, 0xdf8475800, 0xc4211f9320, 0xc420c38230)
2019-01-22T13:08:24Z docker 	/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb/cluster.go:256 +0x134
2019-01-22T13:08:24Z docker created by github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb.(*NetworkDB).clusterInit
2019-01-22T13:08:24Z docker 	/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/networkdb/cluster.go:178 +0x8e9
2019-01-22T13:08:24Z docker + failed_to_start
2019-01-22T13:08:24Z docker + tail /var/log/docker.log

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Server

Client:
 Version:	18.03.0-ce
 API version:	1.37
 Go version:	go1.9.4
 Git commit:	0520e24
 Built:	Wed Mar 21 23:09:15 2018
 OS/Arch:	linux/amd64
 Experimental:	false
 Orchestrator:	swarm

Server:
 Engine:
  Version:	18.03.0-ce
  API version:	1.37 (minimum version 1.12)
  Go version:	go1.9.4
  Git commit:	0520e24
  Built:	Wed Mar 21 23:13:03 2018
  OS/Arch:	linux/amd64
  Experimental:	false

Mac OS X (other Mac OS X similar)

Client: Docker Engine - Community
 Version:           18.09.1
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        4c52b90
 Built:             Wed Jan  9 19:33:12 2019
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.1
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       4c52b90
  Built:            Wed Jan  9 19:41:49 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Server

Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 3
Server Version: 18.03.0-ce
Storage Driver: devicemapper
 Pool Name: docker-thinpool
 Pool Blocksize: 524.3kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data Space Used: 843.1MB
 Data Space Total: 20.63GB
 Data Space Available: 19.79GB
 Metadata Space Used: 262.1kB
 Metadata Space Total: 213.9MB
 Metadata Space Available: 213.6MB
 Thin Pool Minimum Free Space: 2.063GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: i8e27383jrjxzc177ajyb75dv
 Is Manager: true
 ClusterID: 6xocuwj7cq44msuesxgy1z1hg
 Managers: 1
 Nodes: 7
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.5.10
 Manager Addresses:
  192.168.5.10:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.21.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.859GiB
Name: mac-os-swarm
ID: WD2E:56Q3:BOH6:FYL2:KVLY:RSY5:GRRL:XB5O:N3R2:Y2AA:K6NS:QSXM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Mac OS X (other Mac OS X similar)

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 18.09.1
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 96ec2177ae841256168fcf76954f7177af9446eb
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.125-linuxkit
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 9.743GiB
Name: linuxkit-025000000001
ID: HQI4:UTCZ:2RMS:IPQE:JEAQ:WAWM:E6KL:H2JM:RXJ5:QCLC:35JV:7464
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 24
 Goroutines: 54
 System Time: 2019-01-22T18:46:08.717538097Z
 EventsListeners: 2
HTTP Proxy: gateway.docker.internal:3128
HTTPS Proxy: gateway.docker.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Additional environment details (AWS, VirtualBox, physical, etc.):

The docker Name: linuxkit-025000000001 is for all Mac OS X installations the same. Probably this won't work well together?!

For testing I added 2 CentOS nodes and these are working fine (also the name is equal to the real hostname).

@selansen
Copy link
Contributor

I will take a look at the code with stack trace.
In the meantime , Can you test it with same docker version on both side ( MAC as well Server nodes)?

@thaJeztah thaJeztah added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. area/networking and removed area/swarm labels Jan 23, 2019
@user121216
Copy link
Author

I updated to the latest docker-ce version on server. No change.

@thaJeztah
Copy link
Member

Thanks for reporting! We were discussing this issue internally; the panic/crash happens here; https://github.com/docker/libnetwork/blob/b10559f6c05e73cc34b7221784ef09f6cf2c9a7a/networkdb/cluster.go#L291

And could happen if node.ID not yet present in the list (possibly a race condition), or if the node-ID is empty (which may be an issue in Docker for Mac). The daemon should print a message New memberlist node ... in the logs, which contains the NodeID; https://github.com/docker/libnetwork/blob/b10559f6c05e73cc34b7221784ef09f6cf2c9a7a/networkdb/networkdb.go#L261

@user121216 could you check if you can find that message in the daemon logs?

A pull request was opened to prevent the daemon from crashing in that situation, and to retry on the next chance; moby/libnetwork#2325, but would be good to know what's leading to that situation.

@user121216
Copy link
Author

Hi,
the problem is still there for 18.09.2.

Where can I find the daemon logs on Mac OS X?
~/Library/Containers/com.docker.docker/Data/vms/0/log is empty.

@thaJeztah
Copy link
Member

The fix for the panic was included in 18.09.4; see docker-archive#169

@user121216
Copy link
Author

In the current mac os x edge version it is fixed (contains Docker 19.03.0-beta3).
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. platform/desktop
Projects
None yet
Development

No branches or pull requests

4 participants