-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log structured monitoring metrics #5915
Merged
ruflin
merged 2 commits into
elastic:master
from
andrewkroh:feature/libbeat/use-structured-logger
Dec 20, 2017
Merged
Log structured monitoring metrics #5915
ruflin
merged 2 commits into
elastic:master
from
andrewkroh:feature/libbeat/use-structured-logger
Dec 20, 2017
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Use structured logging for the metrics that are periodically logged. - Add beat.info.uptime.ms to the list of gauges so that the total value is always reported rather than a difference. - Made a change to ensure that only non-zero counter values are reported at shutdown (this was bug introduced in my last refactoring). Note that zero-value gauges are reported which kind of makes the "Total non-zero metrics" message misleading. Log samples: ``` 2017-12-18T13:07:50.311-0500 INFO [monitoring] log/log.go:80 Starting metrics logging every 5s 2017-12-18T13:07:55.312-0500 INFO [monitoring] log/log.go:107 Non-zero metrics in the last 5s {"monitoring": {"metrics": {"beat.info.uptime.ms": 5018, "beat.memstats.gc_next": 5089280, "beat.memstats.memory_alloc": 2587160, "beat.memstats.memory_total": 190951808, "libbeat.config.module.running": 4, "libbeat.config.module.starts": 4, "libbeat.config.reloads": 1, "libbeat.output.type": "elasticsearch", "libbeat.pipeline.clients": 8, "libbeat.pipeline.events.active": 41, "libbeat.pipeline.events.filtered": 1, "libbeat.pipeline.events.published": 41, "libbeat.pipeline.events.retry": 82, "libbeat.pipeline.events.total": 42, "metricbeat.docker.info.events": 1, "metricbeat.docker.info.success": 1, "metricbeat.system.cpu.events": 1, "metricbeat.system.cpu.success": 1, "metricbeat.system.filesystem.events": 5, "metricbeat.system.filesystem.success": 5, "metricbeat.system.fsstat.events": 1, "metricbeat.system.fsstat.success": 1, "metricbeat.system.load.events": 1, "metricbeat.system.load.success": 1, "metricbeat.system.memory.events": 1, "metricbeat.system.memory.success": 1, "metricbeat.system.network.events": 20, "metricbeat.system.network.success": 20, "metricbeat.system.process.events": 10, "metricbeat.system.process.success": 10, "metricbeat.system.process_summary.events": 1, "metricbeat.system.process_summary.success": 1, "metricbeat.system.uptime.events": 1, "metricbeat.system.uptime.success": 1}}} 2017-12-18T13:07:58.156-0500 INFO [monitoring] log/log.go:115 Total non-zero metrics {"monitoring": {"metrics": {"beat.info.uptime.ms": 7862, "beat.memstats.gc_next": 5089280, "beat.memstats.memory_alloc": 2621032, "beat.memstats.memory_total": 190985680, "libbeat.config.module.running": 4, "libbeat.config.module.starts": 4, "libbeat.config.reloads": 1, "libbeat.output.type": "elasticsearch", "libbeat.pipeline.clients": 8, "libbeat.pipeline.events.active": 41, "libbeat.pipeline.events.filtered": 1, "libbeat.pipeline.events.published": 41, "libbeat.pipeline.events.retry": 82, "libbeat.pipeline.events.total": 42, "metricbeat.docker.info.events": 1, "metricbeat.docker.info.success": 1, "metricbeat.system.cpu.events": 1, "metricbeat.system.cpu.success": 1, "metricbeat.system.filesystem.events": 5, "metricbeat.system.filesystem.success": 5, "metricbeat.system.fsstat.events": 1, "metricbeat.system.fsstat.success": 1, "metricbeat.system.load.events": 1, "metricbeat.system.load.success": 1, "metricbeat.system.memory.events": 1, "metricbeat.system.memory.success": 1, "metricbeat.system.network.events": 20, "metricbeat.system.network.success": 20, "metricbeat.system.process.events": 10, "metricbeat.system.process.success": 10, "metricbeat.system.process_summary.events": 1, "metricbeat.system.process_summary.success": 1, "metricbeat.system.uptime.events": 1, "metricbeat.system.uptime.success": 1}}} 2017-12-18T13:07:58.156-0500 INFO [monitoring] log/log.go:116 Uptime: 7.867012418s 2017-12-18T13:07:58.156-0500 INFO [monitoring] log/log.go:93 Stopping metrics logging. ```
Instead of using flattened key names, write the metrics in "nested" format. There is less redundancy and it will be more machine friendly. Sample logs: ``` 2017-12-19T11:57:21.086-0500 INFO [monitoring] log/log.go:79 Starting metrics logging every 3s 2017-12-19T11:57:24.087-0500 INFO [monitoring] log/log.go:106 Non-zero metrics in the last 3s {"monitoring": {"metrics": {"beat":{"info":{"uptime":{"ms":3004}},"memstats":{"gc_next":4194304,"memory_alloc":2298976,"memory_total":12648056}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":1,"published":1,"total":1}}},"metricbeat":{"file_integrity":{"file":{"events":1,"success":1}}}}}} 2017-12-19T11:57:25.305-0500 INFO [monitoring] log/log.go:114 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"info":{"uptime":{"ms":4222}},"memstats":{"gc_next":4194304,"memory_alloc":3336376,"memory_total":20548168}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":0,"events":{"active":1,"published":1,"total":1}}}}}} 2017-12-19T11:57:25.305-0500 INFO [monitoring] log/log.go:115 Uptime: 4.226912514s 2017-12-19T11:57:25.305-0500 INFO [monitoring] log/log.go:92 Stopping metrics logging. ```
@ruflin I'm going back on what I said and have "unflattened" the key names. |
ruflin
approved these changes
Dec 20, 2017
@andrewkroh I rebased and merged as I assume that is what you had in mind. One thing I don't like about having multiple commits in 1 PR is that CI only tests the last commit and shows the others as just green. So in case the second commit is reverted later it's not clear if the first commit without the second would work. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use structured logging for the metrics that are periodically logged.
Add beat.info.uptime.ms to the list of gauges so that the total value is always reported rather than a difference.
Made a change to ensure that only non-zero counter values are reported at shutdown (this was bug introduced in my last refactoring). Note that zero-value gauges are reported which kind of makes the "Total non-zero metrics" message misleading.
Log samples:
To show the key structure more clearly here is a pretty-printed version of the logged JSON metrics.