-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add base xDS REST SD and kuma_sd implementation #8844
Conversation
f2a4b33
to
0344ae4
Compare
I think the go version in |
0344ae4
to
91af23e
Compare
We did not update go.mod since we do not require specific go 1.15 or 1.16 issues. We do not guarantee that Prometheus works properly with go 1.14 or less. |
@roidelapluie – makes sense, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have started reviewing this but I have a major question first:
Why do you use refresh? Refresh is used for simple service discoveries. Fully fledged service discoveries like this should directly implement the discoverer interface and not rely on refresh. Look at consul for an example.
Thanks for taking a look! I used refresh as I thought it was preferred for simple polling-based SDs, like triton, docker, AWS, etc. Reading through the Consul SD now it seems like it's doing some sort of persistent watching + rpc calls on a few channels, sort of like a gRPC stream? Is that correct? Since the xDS HTTP API is polling based, refresh seemed applicable. Are you concerned about the applicability of the "skip refresh" functionality I've added here? |
Consul uses http long polling, which sounds like I read from the xds doc: |
That seems to be like the watch ID in consul. |
That makes a lot of sense – is this is set up via the
If so, that shouldn't be a huge change. If it's more complicated, happy to look more into it. |
Long polling means that you make an HTTP request and you get the answer when the server decides (e.g. when there is an update). It means that we do not need to do a request every 30s, and of there is an update after 10s, we directly get the update. Then, we start another query, which might return only after 2 minutes (the next xds change). |
Got it – ok, I'll look into if there are any changes necessary to support this on the xDS server side and then update this to use long polling. Thanks for the feedback – please let me any other questions that come up in the meantime. |
f915a0b
to
222c82a
Compare
@roidelapluie I've updated this PR to use HTTP long polling, which is available in the Kuma 1.2.0 release. Please have another look when you have a chance. Thanks! |
222c82a
to
cc939ac
Compare
I want to thank you already! I will get to it ASAP! |
cc939ac
to
e381025
Compare
0547e44
to
56ee89f
Compare
@roidelapluie I think I have addressed all your comments, namely:
Thanks again + let me know if anything else shows up in your testing :) |
56ee89f
to
42e32b3
Compare
Hey @roidelapluie, have you had time to continue with your testing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few reviews. I am unsure we should change the metrics_path, but I am aware that some kubernetes SD change the scheme so I guess it is okay.
In general, I have the impression that there is a lot of debug level logging. We might want to see this down a bit.
Please verify that all comments start with a cap and end with a full stop.
af8aa55
to
5743d4b
Compare
9348a27
to
1aab11d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, we are getting very close
docs/configuration/configuration.md
Outdated
```yaml | ||
# Address of the Kuma Control Plane's MADS xDS server. | ||
server: <string> | ||
# An arbitrary identifier to send to the MADS server, which should be unique to this instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understamding we could remove this parameter for now and not expose it to the user.
discovery/xds/xds.go
Outdated
prometheus.MustRegister(kumaFetchDuration, kumaFetchSkipUpdateCount, kumaFetchFailuresCount) | ||
|
||
// Register protobuf types that need to be marshalled/ unmarshalled. | ||
_ = protoTypes.RegisterMessage((&v3.DiscoveryRequest{}).ProtoReflect().Type()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the empty assignment? if it is an error, should we deal with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, this should be more like a MustRegister..
27342db
to
9617595
Compare
Signed-off-by: austin ce <austin.cawley@gmail.com>
Signed-off-by: austin ce <austin.cawley@gmail.com>
Signed-off-by: austin ce <austin.cawley@gmail.com>
Signed-off-by: austin ce <austin.cawley@gmail.com>
Signed-off-by: austin ce <austin.cawley@gmail.com>
9617595
to
3593b20
Compare
Thanks for your patience and good reviews @roidelapluie – I think I've addressed all the feedback now. |
Thanks! |
Thank you so much @roidelapluie ! |
- Do not generate __meta_server label, since it is unavailable in Prometheus. - Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md, so users could click it and read the docs without the need to search the corresponding docs. - Remove kumaTarget struct, since it is easier generating labels for discovered targets directly from the response returned by Kuma. This simplifies the code. - Store the generated labels for discovered targets inside atomic.Value. This allows reading them from concurrent goroutines without the need to use mutex. - Use synchronouse requests to Kuma instead of long polling, since there is a little sense in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval. - Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used. - Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus. - Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability. - Remove unused fields from discoveryRequest and discoveryResponse. - Update tests. - Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config. - Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback, since these are public types. Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label, since usually this label is set by the Prometheus itself to __address__ after the relabeling phase. See https://www.robustperception.io/life-of-a-label/ Updates #3389 See prometheus/prometheus#7919 and prometheus/prometheus#8844 as a reference implementation in Prometheus
- Do not generate __meta_server label, since it is unavailable in Prometheus. - Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md, so users could click it and read the docs without the need to search the corresponding docs. - Remove kumaTarget struct, since it is easier generating labels for discovered targets directly from the response returned by Kuma. This simplifies the code. - Store the generated labels for discovered targets inside atomic.Value. This allows reading them from concurrent goroutines without the need to use mutex. - Use synchronouse requests to Kuma instead of long polling, since there is a little sense in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval. - Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used. - Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus. - Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability. - Remove unused fields from discoveryRequest and discoveryResponse. - Update tests. - Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config. - Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback, since these are public types. Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label, since usually this label is set by the Prometheus itself to __address__ after the relabeling phase. See https://www.robustperception.io/life-of-a-label/ Updates #3389 See prometheus/prometheus#7919 and prometheus/prometheus#8844 as a reference implementation in Prometheus
Implementation of a Kuma Service Discovery mechanism via the xDS REST protocol.
Since this is new code, the protobuf library used is the official
google.golang.com/protobuf
lib, which is where the community would like to migrate to, as per this ML thread. The usage is entirely isolated from theprompb
package where thegogo/proto
package is currently used. When this transition is complete in Prometheus, it may make sense to include the Kuma MADS v1.proto
and compile it here, but since this is a "frozen" API, and the dependency on the official client already exists transitively, it did not seem too bad to include the compiled version here in the meantime.Closes #7919
Screenshots
Target synchronization
Targets in Prometheus
Scraped Envoy metrics