Add base xDS REST SD and kuma_sd implementation #8844

austince · 2021-05-18T20:49:29Z

Implementation of a Kuma Service Discovery mechanism via the xDS REST protocol.

Since this is new code, the protobuf library used is the official google.golang.com/protobuf lib, which is where the community would like to migrate to, as per this ML thread. The usage is entirely isolated from the prompb package where the gogo/proto package is currently used. When this transition is complete in Prometheus, it may make sense to include the Kuma MADS v1 .proto and compile it here, but since this is a "frozen" API, and the dependency on the official client already exists transitively, it did not seem too bad to include the compiled version here in the meantime.

Closes #7919

Screenshots

Target synchronization

Targets in Prometheus

Scraped Envoy metrics

austince · 2021-05-18T21:37:06Z

I think the go version in go.mod is out of date (1.14) – I had to tidy with go 1.16 to pass the CI check. Did I miss this note somewhere?

roidelapluie · 2021-05-18T22:57:56Z

I think the go version in go.mod is out of date (1.14) – I had to tidy with go 1.16 to pass the CI check. Did I miss this note somewhere?

We did not update go.mod since we do not require specific go 1.15 or 1.16 issues.

We do not guarantee that Prometheus works properly with go 1.14 or less.
We also guarantee that some packages like TSDB can still be used with go 1.15.

austince · 2021-05-18T23:05:13Z

@roidelapluie – makes sense, thanks!

roidelapluie

I have started reviewing this but I have a major question first:

Why do you use refresh? Refresh is used for simple service discoveries. Fully fledged service discoveries like this should directly implement the discoverer interface and not rely on refresh. Look at consul for an example.

discovery/refresh/refresh.go

austince · 2021-05-19T21:50:53Z

I have started reviewing this but I have a major question first:

Why do you use refresh? Refresh is used for simple service discoveries. Fully fledged service discoveries like this should directly implement the discoverer interface and not rely on refresh. Look at consul for an example.

Thanks for taking a look!

I used refresh as I thought it was preferred for simple polling-based SDs, like triton, docker, AWS, etc. Reading through the Consul SD now it seems like it's doing some sort of persistent watching + rpc calls on a few channels, sort of like a gRPC stream? Is that correct?

Since the xDS HTTP API is polling based, refresh seemed applicable. Are you concerned about the applicability of the "skip refresh" functionality I've added here?

roidelapluie · 2021-05-19T21:56:32Z

Consul uses http long polling, which sounds like I read from the xds doc: Synchronous (long) polling via REST endpoints is also available for the xDS singleton APIs.

roidelapluie · 2021-05-19T21:58:34Z

		VersionInfo:   rc.latestVersion,
		ResponseNonce: rc.latestNonce,

That seems to be like the watch ID in consul.

austince · 2021-05-19T22:03:17Z

Consul uses http long polling, which sounds like I read from the xds doc: Synchronous (long) polling via REST endpoints is also available for the xDS singleton APIs.

That makes a lot of sense – is this is set up via the http.Transport? Never used long polling before.

IdleConnTimeout: 2 * time.Duration(watchTimeout),
DialContext: conntrack.NewDialContextFunc(
  conntrack.DialWithTracing(),
  conntrack.DialWithName("consul_sd"),
),

If so, that shouldn't be a huge change. If it's more complicated, happy to look more into it.

roidelapluie · 2021-05-19T22:07:32Z

Long polling means that you make an HTTP request and you get the answer when the server decides (e.g. when there is an update).

It means that we do not need to do a request every 30s, and of there is an update after 10s, we directly get the update. Then, we start another query, which might return only after 2 minutes (the next xds change).

austince · 2021-05-19T22:13:47Z

Got it – ok, I'll look into if there are any changes necessary to support this on the xDS server side and then update this to use long polling. Thanks for the feedback – please let me any other questions that come up in the meantime.

austince · 2021-06-17T21:38:48Z

@roidelapluie I've updated this PR to use HTTP long polling, which is available in the Kuma 1.2.0 release. Please have another look when you have a chance. Thanks!

roidelapluie · 2021-06-21T22:59:37Z

I want to thank you already! I will get to it ASAP!

austince · 2021-06-30T21:52:17Z

@roidelapluie I think I have addressed all your comments, namely:

removed unnecessary labels (versions, server)
removed version config options so that only protocol v3 and MADS v1 are implemented
- the few places where ProtocolVersion is kept internally are mostly placeholders to not completely lock the implementation in to v3, though can remove those too if you'd like
defaulted client_id to the hostname if not specified
wrapped errors in kuma_sd and removed the confusing xds_sd prefixes

Thanks again + let me know if anything else shows up in your testing :)

austince · 2021-07-06T15:39:23Z

Hey @roidelapluie, have you had time to continue with your testing?

roidelapluie

Here are a few reviews. I am unsure we should change the metrics_path, but I am aware that some kubernetes SD change the scheme so I guess it is okay.

In general, I have the impression that there is a lot of debug level logging. We might want to see this down a bit.

Please verify that all comments start with a cap and end with a full stop.

discovery/xds/xds.go

discovery/xds/kuma.go

discovery/xds/client_test.go

discovery/xds/client.go

roidelapluie

Thanks, we are getting very close

util/osutil/hostname.go

roidelapluie · 2021-07-20T21:15:28Z

docs/configuration/configuration.md

+```yaml
+# Address of the Kuma Control Plane's MADS xDS server.
+server: <string>
+# An arbitrary identifier to send to the MADS server, which should be unique to this instance.


From my understamding we could remove this parameter for now and not expose it to the user.

discovery/xds/xds_test.go

discovery/xds/xds.go

roidelapluie · 2021-07-20T21:17:57Z

discovery/xds/xds.go

+	prometheus.MustRegister(kumaFetchDuration, kumaFetchSkipUpdateCount, kumaFetchFailuresCount)
+
+	// Register protobuf types that need to be marshalled/ unmarshalled.
+	_ = protoTypes.RegisterMessage((&v3.DiscoveryRequest{}).ProtoReflect().Type())


Do we need the empty assignment? if it is an error, should we deal with it?

True, this should be more like a MustRegister..

discovery/xds/kuma.go

discovery/xds/client.go

Signed-off-by: austin ce <austin.cawley@gmail.com>

austince · 2021-07-21T18:31:53Z

Thanks for your patience and good reviews @roidelapluie – I think I've addressed all the feedback now.

roidelapluie · 2021-07-23T07:46:43Z

Thanks!

austince · 2021-07-23T14:12:16Z

Thank you so much @roidelapluie !

- Do not generate __meta_server label, since it is unavailable in Prometheus. - Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md, so users could click it and read the docs without the need to search the corresponding docs. - Remove kumaTarget struct, since it is easier generating labels for discovered targets directly from the response returned by Kuma. This simplifies the code. - Store the generated labels for discovered targets inside atomic.Value. This allows reading them from concurrent goroutines without the need to use mutex. - Use synchronouse requests to Kuma instead of long polling, since there is a little sense in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval. - Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used. - Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus. - Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability. - Remove unused fields from discoveryRequest and discoveryResponse. - Update tests. - Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config. - Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback, since these are public types. Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label, since usually this label is set by the Prometheus itself to __address__ after the relabeling phase. See https://www.robustperception.io/life-of-a-label/ Updates #3389 See prometheus/prometheus#7919 and prometheus/prometheus#8844 as a reference implementation in Prometheus

austince changed the title ~~Feat/discovery xds~~ Add base xDS REST SD and kuma_sd implementation May 18, 2021

austince force-pushed the feat/discovery-xds branch 5 times, most recently from f2a4b33 to 0344ae4 Compare May 18, 2021 21:23

austince force-pushed the feat/discovery-xds branch from 0344ae4 to 91af23e Compare May 18, 2021 21:41

roidelapluie reviewed May 19, 2021

View reviewed changes

discovery/refresh/refresh.go Outdated Show resolved Hide resolved

austince marked this pull request as draft May 19, 2021 22:13

This was referenced Jun 7, 2021

feat(mads) add support for HTTP long polling kumahq/kuma#2121

Merged

Add WithIdleConnTimeout HTTP client option prometheus/common#308

Merged

austince force-pushed the feat/discovery-xds branch 4 times, most recently from f915a0b to 222c82a Compare June 17, 2021 21:14

austince marked this pull request as ready for review June 17, 2021 21:37

austince force-pushed the feat/discovery-xds branch from 222c82a to cc939ac Compare June 17, 2021 23:36

austince requested a review from roidelapluie June 18, 2021 12:06

austince force-pushed the feat/discovery-xds branch from cc939ac to e381025 Compare June 28, 2021 13:59

austince force-pushed the feat/discovery-xds branch 2 times, most recently from 0547e44 to 56ee89f Compare June 30, 2021 21:00

austince force-pushed the feat/discovery-xds branch from 56ee89f to 42e32b3 Compare July 6, 2021 15:39

austince requested a review from roidelapluie July 6, 2021 15:59

roidelapluie reviewed Jul 6, 2021

View reviewed changes

austince force-pushed the feat/discovery-xds branch 2 times, most recently from af8aa55 to 5743d4b Compare July 7, 2021 20:16

austince force-pushed the feat/discovery-xds branch 2 times, most recently from 9348a27 to 1aab11d Compare July 20, 2021 20:20

roidelapluie reviewed Jul 20, 2021

View reviewed changes

austince force-pushed the feat/discovery-xds branch 2 times, most recently from 27342db to 9617595 Compare July 21, 2021 16:48

austince added 5 commits July 21, 2021 12:55

Extract and export GetFQDN()

5bdfba1

Signed-off-by: austin ce <austin.cawley@gmail.com>

Add base xDS discovery and kuma SD implementation

d0ffe2e

Signed-off-by: austin ce <austin.cawley@gmail.com>

Add tests for xDS discovery

0544bdd

Signed-off-by: austin ce <austin.cawley@gmail.com>

Add config tests for kuma SD

bbc951f

Signed-off-by: austin ce <austin.cawley@gmail.com>

Add documentation for kuma_sd configuration

3593b20

Signed-off-by: austin ce <austin.cawley@gmail.com>

austince force-pushed the feat/discovery-xds branch from 9617595 to 3593b20 Compare July 21, 2021 16:55

austince requested a review from roidelapluie July 21, 2021 18:31

roidelapluie approved these changes Jul 23, 2021

View reviewed changes

roidelapluie merged commit 79d354a into prometheus:main Jul 23, 2021

austince deleted the feat/discovery-xds branch July 23, 2021 14:12

austince mentioned this pull request Aug 2, 2021

Provide even more native integration with Prometheus kumahq/kuma#961

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add base xDS REST SD and kuma_sd implementation #8844

Add base xDS REST SD and kuma_sd implementation #8844

austince commented May 18, 2021 •

edited

Loading

austince commented May 18, 2021

roidelapluie commented May 18, 2021

austince commented May 18, 2021

roidelapluie left a comment

austince commented May 19, 2021

roidelapluie commented May 19, 2021

roidelapluie commented May 19, 2021

austince commented May 19, 2021 •

edited

Loading

roidelapluie commented May 19, 2021

austince commented May 19, 2021

austince commented Jun 17, 2021

roidelapluie commented Jun 21, 2021

austince commented Jun 30, 2021

austince commented Jul 6, 2021

roidelapluie left a comment

roidelapluie left a comment

roidelapluie Jul 20, 2021

roidelapluie Jul 20, 2021

austince Jul 21, 2021

austince commented Jul 21, 2021

roidelapluie commented Jul 23, 2021

austince commented Jul 23, 2021

Add base xDS REST SD and kuma_sd implementation #8844

Add base xDS REST SD and kuma_sd implementation #8844

Conversation

austince commented May 18, 2021 • edited Loading

Screenshots

austince commented May 18, 2021

roidelapluie commented May 18, 2021

austince commented May 18, 2021

roidelapluie left a comment

Choose a reason for hiding this comment

austince commented May 19, 2021

roidelapluie commented May 19, 2021

roidelapluie commented May 19, 2021

austince commented May 19, 2021 • edited Loading

roidelapluie commented May 19, 2021

austince commented May 19, 2021

austince commented Jun 17, 2021

roidelapluie commented Jun 21, 2021

austince commented Jun 30, 2021

austince commented Jul 6, 2021

roidelapluie left a comment

Choose a reason for hiding this comment

roidelapluie left a comment

Choose a reason for hiding this comment

roidelapluie Jul 20, 2021

Choose a reason for hiding this comment

roidelapluie Jul 20, 2021

Choose a reason for hiding this comment

austince Jul 21, 2021

Choose a reason for hiding this comment

austince commented Jul 21, 2021

roidelapluie commented Jul 23, 2021

austince commented Jul 23, 2021

austince commented May 18, 2021 •

edited

Loading

austince commented May 19, 2021 •

edited

Loading