Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect Docker Swarm metrics in docker input plugin #3141

Merged
merged 14 commits into from
Oct 3, 2017

Conversation

adityacs
Copy link
Contributor

@adityacs adityacs commented Aug 20, 2017

This PR adds support for collecting swarm service metrics from docker swarm manager.

The following metrics will be collected.

  1. Swarm service name
  2. Swarm service Id
  3. Desired replicas for the service
  4. Running replicas for the service
  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@danielnelson danielnelson added this to the 1.5.0 milestone Aug 21, 2017
@danielnelson
Copy link
Contributor

Can you add unit tests before I review?

@danielnelson
Copy link
Contributor

closes #3125

@danielnelson danielnelson added area/docker feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin labels Aug 22, 2017
@adityacs
Copy link
Contributor Author

@danielnelson I have added the unit test cases. Could you please review this?

@danielnelson
Copy link
Contributor

Will do but it might take me a bit to get to it.

@adityacs
Copy link
Contributor Author

@danielnelson Sure. Thanks

@adityacs
Copy link
Contributor Author

@danielnelson Any update?

@danielnelson
Copy link
Contributor

@adityacs Give me a couple weeks, as I'm trying focus on bugs right now for the 1.4 release.

@adityacs
Copy link
Contributor Author

@danielnelson Sure.

@adityacs
Copy link
Contributor Author

@danielnelson Could you please review this?

fields["swarm_service_mode"] = "global"
fields["swarm_tasks_running"] = running[service.ID]
fields["swarm_tasks_desired"] = tasksNoShutdown[service.ID]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the Replicated.Replicas is nil or another Mode is added, we should have an else condition that continues and perhaps logs (depending on if Replicas being nil is an error or not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled this with "log"

tags := map[string]string{}
fields := make(map[string]interface{})
now := time.Now()
tags["swarm_service_id"] = service.ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider not including this since looks to be a random identifier string, which can cause high cardinality depending on how quickly it changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each service there will be unique ID. Since services are not something which frequently changes as in containers I think we can still keep this.

tasksNoShutdown[task.ServiceID]++
}

if _, nodeActive := activeNodes[task.NodeID]; nodeActive && task.Status.State == swarm.TaskStateRunning {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to Docker Swarm, but why do you check if the node is not in the shutdown state, instead of just recording the running status? It seems like almost always the task will not be running if the node is down.

tags["swarm_service_id"] = service.ID
tags["swarm_service_name"] = service.Spec.Name
if service.Spec.Mode.Replicated != nil && service.Spec.Mode.Replicated.Replicas != nil {
fields["swarm_service_mode"] = "replicated"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think swarm_service_mode should be a tag.

} else if service.Spec.Mode.Global != nil {
fields["swarm_service_mode"] = "global"
fields["swarm_tasks_running"] = running[service.ID]
fields["swarm_tasks_desired"] = tasksNoShutdown[service.ID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why non-shutdown tasks are the desired number of tasks. Shouldn't this be equal to the number of Nodes since global services are on every node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is chance that the on one of the nodes or on few nodes the task is not running. When the mode is "global", swarm tries to deploy containers(for swarm service it is tasks) on all nodes. However, there is a chance that on any of the node the container might not get started due to reasons like registry is not accessible from that node, /var/lib/docker/images directory is corrupted etc..

tags := map[string]string{}
fields := make(map[string]interface{})
now := time.Now()
tags["swarm_service_id"] = service.ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the measurement name is docker_swarm, I think we should call the tags and fields without the swarm prefix: service_name, service_mode.

@@ -82,6 +85,9 @@ var sampleConfig = `
## To use environment variables (ie, docker-machine), set endpoint = "ENV"
endpoint = "unix:///var/run/docker.sock"

## Set to true to collect Swarm metrics(desired_replicas, running_replicas)
swarm_enabled = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be named gather_services = true or gather = ["services"] since the similar docker command is simply docker service.

@danielnelson danielnelson merged commit dd4299e into influxdata:master Oct 3, 2017
@adityacs adityacs mentioned this pull request Oct 28, 2017
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docker feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants