Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(grafana): Update dashboards for new telegraf #156

Merged
merged 1 commit into from
Oct 25, 2016

Conversation

jchauncey
Copy link
Member

@jchauncey jchauncey commented Oct 24, 2016

Tags for data collected by the kubernetes changed after the plguin was merged. So we needed to update the dashboards.

Test Steps:

  • clone pr
  • cd telegraf && make build push upgrade
  • cd ../grafana && make build push upgrade
  • open browser and goto http://grafana.mydomain.com login using admin/admin
  • The dashboards should have memory and cpu information

It is also useful to verify that we are not leaking connections with this PR as it will contain the new telegraf binary that contains the fix.

To check for that please ssh onto one of your worker nodes that is running the telegraf daemonset and run the following command netstat -tan | grep 10255 you should only see 1 or 2 connections open and it should never grow.

@jchauncey jchauncey added this to the v2.8 milestone Oct 24, 2016
@jchauncey jchauncey self-assigned this Oct 24, 2016
@jchauncey
Copy link
Member Author

will fix #153

@mboersma
Copy link
Member

I'm testing this as described above but running into a telegraf error:

$ kd logs -f deis-monitor-telegraf-kxpcn
Node Name set (minikube)
Node IP set (192.168.99.100)
Creating topic with URL: http://10.0.0.36:4151/topic/create?topic=metrics
Setting KUBERNETES_URL: http://192.168.99.100:10255
Building config.toml!
Finished building toml...
###########################################
...
# Set Service Input Configuration
[[inputs.nsq_consumer]]
  server = "10.0.0.36:4150"
  topic = "metrics"
  channel = "consumer"
  max_in_flight = 100
  data_format = "influx"
###########################################
###########################################
/usr/bin/telegraf: line 1: syntax error near unexpected token `<'
/usr/bin/telegraf: line 1: `<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous users does not have storage.objects.get access to object telegraf/telegraf.</Details></Error>'

@mboersma
Copy link
Member

It appears telegraf failed to be downloaded into the container:

root@7f3ec01de15b:/# which telegraf
/usr/bin/telegraf
root@7f3ec01de15b:/# cat /usr/bin/telegraf 
<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous users does not have storage.objects.get access to object telegraf/telegraf.</Details></Error>
root@7f3ec01de15b:/# 

@felixbuenemann
Copy link

felixbuenemann commented Oct 25, 2016

The calls in the telegraf Dockerfile should be using curl -fsSL instead of curl -sSL so that they return a bad exit code on errors:

curl -sSL https://storage.googleapis.com/telegraf/telegraf ; echo -e "\nExit code: $?"
<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous users does not have storage.objects.get access to object telegraf/telegraf.</Details></Error>
Exit code: 0

curl -fsSL https://storage.googleapis.com/telegraf/telegraf ; echo -e "\nExit code: $?"
curl: (22) The requested URL returned error: 403 Forbidden

Exit code: 22

@mboersma
Copy link
Member

Tested as recommended, all dashboards show updating CPU/memory stats and open connections look stable:

$ netstat -tan | grep 10255
tcp        0      0 :::10255                :::*                    LISTEN      
tcp        0      0 ::ffff:192.168.99.100:10255 ::ffff:172.17.0.9:55846 ESTABLISHED 

@mboersma mboersma added the LGTM1 label Oct 25, 2016
Tags for data collected by the kubernetes changed after the plguin was merged. So we needed to update the dashboards.
This PR also removes the kubernetes_health dashboard since we no longer
capture those metrics with the new kubernetes telegraf plugin.
@kmala kmala added the LGTM2 label Oct 25, 2016
@jchauncey jchauncey merged commit df1a2c0 into deis:master Oct 25, 2016
@jchauncey jchauncey deleted the update-dashboards branch October 25, 2016 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants