-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect Docker Swarm metrics in docker input plugin #3141
Conversation
Can you add unit tests before I review? |
closes #3125 |
@danielnelson I have added the unit test cases. Could you please review this? |
Will do but it might take me a bit to get to it. |
@danielnelson Sure. Thanks |
@danielnelson Any update? |
@adityacs Give me a couple weeks, as I'm trying focus on bugs right now for the 1.4 release. |
@danielnelson Sure. |
@danielnelson Could you please review this? |
plugins/inputs/docker/docker.go
Outdated
fields["swarm_service_mode"] = "global" | ||
fields["swarm_tasks_running"] = running[service.ID] | ||
fields["swarm_tasks_desired"] = tasksNoShutdown[service.ID] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the Replicated.Replicas is nil or another Mode is added, we should have an else condition that continues and perhaps logs (depending on if Replicas being nil is an error or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled this with "log"
plugins/inputs/docker/docker.go
Outdated
tags := map[string]string{} | ||
fields := make(map[string]interface{}) | ||
now := time.Now() | ||
tags["swarm_service_id"] = service.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider not including this since looks to be a random identifier string, which can cause high cardinality depending on how quickly it changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each service there will be unique ID. Since services are not something which frequently changes as in containers I think we can still keep this.
plugins/inputs/docker/docker.go
Outdated
tasksNoShutdown[task.ServiceID]++ | ||
} | ||
|
||
if _, nodeActive := activeNodes[task.NodeID]; nodeActive && task.Status.State == swarm.TaskStateRunning { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm new to Docker Swarm, but why do you check if the node is not in the shutdown state, instead of just recording the running status? It seems like almost always the task will not be running if the node is down.
plugins/inputs/docker/docker.go
Outdated
tags["swarm_service_id"] = service.ID | ||
tags["swarm_service_name"] = service.Spec.Name | ||
if service.Spec.Mode.Replicated != nil && service.Spec.Mode.Replicated.Replicas != nil { | ||
fields["swarm_service_mode"] = "replicated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think swarm_service_mode
should be a tag.
plugins/inputs/docker/docker.go
Outdated
} else if service.Spec.Mode.Global != nil { | ||
fields["swarm_service_mode"] = "global" | ||
fields["swarm_tasks_running"] = running[service.ID] | ||
fields["swarm_tasks_desired"] = tasksNoShutdown[service.ID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why non-shutdown tasks are the desired number of tasks. Shouldn't this be equal to the number of Nodes since global services are on every node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is chance that the on one of the nodes or on few nodes the task is not running. When the mode is "global", swarm tries to deploy containers(for swarm service it is tasks) on all nodes. However, there is a chance that on any of the node the container might not get started due to reasons like registry is not accessible from that node, /var/lib/docker/images directory is corrupted etc..
plugins/inputs/docker/docker.go
Outdated
tags := map[string]string{} | ||
fields := make(map[string]interface{}) | ||
now := time.Now() | ||
tags["swarm_service_id"] = service.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the measurement name is docker_swarm
, I think we should call the tags and fields without the swarm prefix: service_name
, service_mode
.
plugins/inputs/docker/docker.go
Outdated
@@ -82,6 +85,9 @@ var sampleConfig = ` | |||
## To use environment variables (ie, docker-machine), set endpoint = "ENV" | |||
endpoint = "unix:///var/run/docker.sock" | |||
|
|||
## Set to true to collect Swarm metrics(desired_replicas, running_replicas) | |||
swarm_enabled = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be named gather_services = true
or gather = ["services"]
since the similar docker command is simply docker service
.
This PR adds support for collecting swarm service metrics from docker swarm manager.
The following metrics will be collected.