-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(plugins): Datadog for metrics collection #5372
Conversation
CI error not related to this PR: https://github.com/apache/apisix/runs/4064450756?check_suite_focus=true#step:14:680 |
t/plugin/datadog.t
Outdated
message received: apisix.upstream.latency | ||
message received: apisix.apisix.latency | ||
message received: apisix.ingress.size | ||
message received: apisix.egress.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add 3 more tests:
- request twice & check the dump
- configure namespace in the metadata and check the different namespace
- configure tags in the metadata and check the difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
apisix/plugins/datadog.lua
Outdated
end | ||
|
||
local function generate_tag(conf, const_tags) | ||
local tags = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it faster if we can use table.concat to concat the string instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right,
Lua strings are immutable
effectively it will be O(N^2) instead of O(N) without a table. updating accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimized
local entry = fetch_log(ngx, {}) | ||
entry.upstream_latency = ctx.var.upstream_response_time * 1000 | ||
entry.balancer_ip = ctx.balancer_ip or "" | ||
entry.route_name = ctx.route_name or "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fall back to route_id if the name is missing. And need a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code block for fallback logic is at L86
Okay, adding tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use route_name even there is only route_id? Therefore the name of tag is consistent between routes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfectly makes sense, Updated accordingly. Thanks for pointing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spacewander ecommend using route name, won't the same routes name be confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! @tzssangglass thanks for the honest concern. Yeah, in a badly designed system, it might be an issue to segregate metrics of multiple routes having the same route_name. But users can always pick multiple tags for grouping (for eg. route_name:abc, service_id:12), provided they are designing APISIX infra with service level abstraction (just one example, they can use "balancer_ip, consumer" also for filtering).
@spacewander recommended using name over id because it's more readable to the user and there was this requirement in the Prometheus plugin (#5149). Do you think that we should add a param prefer_name
at the plugin schema or we should defer it for the upcoming versions?
Please do let me know. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get. I think it is up to the community to decide if prefer_name
is needed like the prometheus plugin.I suggest adding a note about that it could be misleading when route name could be duplicated for multiple.
- **route_name**: Name specified in the route schema definition. If not present, it will fall back to the route id value. | ||
- Note: If multiple routes have the same name duplicated, we suggest you to visualize graphs on the Datadog dashboard over multiple tags that could compositely pinpoint a particular route/service. If it's still insufficient for your needs, feel free to drop a feature request at [apisix/issues](https://github.com/apache/apisix/issues). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tzssangglass I hope this would be sufficient. WDYT?
| ----------- | ------ | ----------- | ------- | ----- | ---------------------------------------------------------------------- | | ||
| host | string | optional | "127.0.0.1" | | The DogStatsD server host address | | ||
| port | integer | optional | 8125 | | The DogStatsD server host port | | ||
| namespace | string | optional | "apisix" | | Prefix for all the custom metrics sent by APISIX agent. Useful for finding entities for metric graph. e.g. (apisix.request.counter) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bisakhmondal please add doc and examples for how to configure host
and port
, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @moonming do you mean how to update datadog metadata by calling the respective endpoint? Or Is it regarding the setup of a local dogstatsd agent?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I means how to set the datadog endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, take this as an example: If I'm a newbie to use Apache APISIX and the datadog plugin, I'd like to know all the necessary steps after Apache APISIX runs successfully.
Like when I should update its metadata, and how I should setup its data with a Route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got your point. Thanks for the elaborate explanation.
The plugin has a default metadata schema that assumes dogstatsd is available at localhost:8125. If someone needs to reconfigure that I have covered that on the blog which is soon going to be exported at apisix blog
Here is the link: https://blog.bisakh.com/blog/Monitoring-with-Datadog-in-Apache-APISIX
Do you think, we should add it here (the Steps to Run Datadog Agent
section & Custom Configuration
)? Thanks guys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I have read your blog[1], I think the Custom Configuration
part is what @moonming and I need 😄
[1] https://blog.bisakh.com/blog/Monitoring-with-Datadog-in-Apache-APISIX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool Cool, I'll open a PR with the code snippet and some extra details. Thanks, @juzhiyuan @moonming
What this PR does / why we need it:
This PR proves the backbone for integrating Datadog plugin with Apache APISIX.
At present it provides the following metrics:
To enable datadog plugin via route, just run the following snippet
Pre-submission checklist: