-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NATS Monitoring Input Plugin #3186
Conversation
PR /cc @bjflanne |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look really good @levex, maybe we should name it just "nats" in the same way we did with rabbitmq and nsq. Here are a few other quick observations, let me know when you are ready for a final review:
plugins/inputs/nats_top/nats_top.go
Outdated
resp, err := http.Get(theServer) | ||
if err != nil { | ||
log.Printf("E! nats-top: Failed to HTTP GET %s", theServer) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can just return the error, Telegraf will log it, increment the internal stats regarding failed metrics, and also print the name of the plugin for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, will fix!
plugins/inputs/nats_top/nats_top.go
Outdated
return "Provides metrics about the state of a NATS server" | ||
} | ||
|
||
func (n *NatsTop) Start(acc telegraf.Accumulator) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is nothing to do in the Start and Stop functions you should probably not implement them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
plugins/inputs/nats_top/nats_top.go
Outdated
|
||
var sampleConfig = ` | ||
## The address of the monitoring end-point of the NATS server | ||
server = "http://localhost:1337" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLS support could be nice, with the normal HTTP options such as in the apache input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea, will add.
@danielnelson thanks for the quick review, i'll address your comments and get back to you. |
57cd848
to
0c05127
Compare
Signed-off-by: Levente Kurusa <levex@linux.com>
0c05127
to
480211f
Compare
Renamed it to |
The TLS options would only be needed for client certificate verification, it's not critical that it be added though so don't worry about adding it if you don't have the setup. |
I think I'll add it to my TODO list for now, I agree it'd be nice to have so I'll get back to it someday. |
Hi @danielnelson, any updates on this one? |
I'll try to review soon |
Thanks! No rush, just was confused about the TLS thing.
…On Monday, September 18, 2017, Daniel Nelson ***@***.***> wrote:
I'll try to review soon
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3186 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAz09IGVBMnHTivUrL5GVbbKAHoVHWiCks5sjt3WgaJpZM4PGYYL>
.
|
Don't worry about the TLS client cert support, we can add it later on. |
Any word on this being merged soon? I'd like to try it out. |
map[string]interface{}{ | ||
"in_msgs": stats.InMsgs, | ||
"out_msgs": stats.OutMsgs, | ||
"uptime": time.Since(stats.Start).Seconds(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's store this in nanoseconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've also made this more correct by using stats.Now
to calculate the uptime instead of time.Since
. Then uptime
is the uptime at the point the metrics were generated, not when Gather
was called. This also simplifies testing.
"out_bytes": stats.OutBytes, | ||
"mem": stats.Mem, | ||
"subscriptions": stats.Subscriptions, | ||
}, nil, time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a tag for the server using the url in the configuration, this will allow the plugin to be used multiple times if needed without conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"connections": stats.Connections, | ||
"total_connections": stats.TotalConnections, | ||
"in_bytes": stats.InBytes, | ||
"cpu_usage": stats.CPU, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the number of cpu's used? If so I would call it either cpu
to match upstream, or cpu_count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU
is a float32 containing CPU utilization (Cores
has the number of CPU cores). I've renamed the metric to cpu
to match upstream and the mem
metric (memory usage).
return err | ||
} | ||
|
||
acc.AddFields("nats", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we call the measurement nats_varz
in case we later want to support the other monitoring urls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return err | ||
} | ||
|
||
acc.AddFields("nats", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add in the missing numeric values from the /vars
endpoint or at least the ones that change over time (no need to add port). In particular, what about slow_consumers
, routes
, cores
, remotes
? I am guessing about what some of these mean so only the ones that seem like time series data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea: slow_consumers, routes, cores, and remotes added. None of the other fields in Varz
seem relevant/interesting.
theServer := fmt.Sprintf("%s/varz", n.Server) | ||
|
||
/* download the page we are intereted in */ | ||
resp, err := http.Get(theServer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add in a custom transport for this plugin? The best example to follow is the apache input. You don't have to add SSL support or user configurable timeouts, but make sure there is a timeout on the http.Client of 5 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (exposed the timeout as an option too)
@@ -33,6 +33,7 @@ github.com/kballard/go-shellquote d8ec1a69a250a17bb0e419c386eac1f3711dc142 | |||
github.com/matttproud/golang_protobuf_extensions c12348ce28de40eed0136aa2b644d0ee0650e56c | |||
github.com/miekg/dns 99f84ae56e75126dd77e5de4fae2ea034a468ca1 | |||
github.com/naoina/go-stringutil 6b638e95a32d0c1131db0e7fe83775cbea4a0d0b | |||
github.com/nats-io/gnatsd 393bbb7c031433e68707c8810fda0bfcfbe6ab9b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add this to docs/LICENSE_OF_DEPENDENCIES.md
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks for the review @danielnelson, I aim to address your comments this week. (ideally, tomorrow) |
I'm also interested in getting NATS support for telegraf merged, in fact I had made an input plugin before finding this PR. @levex do you want some help in addressing the review? or should I submit another PR? |
@levex is no longer working on this but I'm working with the same employer he was working for. I'll take over and work to get it merged. I'll respond to the outstanding items here and then propose a new PR. |
#3674 is the new PR which addresses the issues raised here. This PR can be closed. |
Hello,
I have a NATS bus in my setup and I'm looking to monitor its state via Grafana via InfluxDB, so I wrote a very simple Telegraf input plugin that would let me get information about the bus.
I feel the work is far from complete, but putting it here in case it gets more feedback!
Thanks!
Required for all PRs: