Make ECK IPv6 compatible #3654

pebrc · 2020-08-24T09:46:04Z

Summary of changes

put literal IPv6 addresses in brackets when used in URLs or literal form (e.g in readiness probes)
switch to correct variant for well known IPs like loopback or zero-valued IP
allow IPv6 in SAN validation and CSR
switch from IP base network.publish_host to DNS based:
- Reason: we used to inject the Pod IP via the downward API, this API does not correctly bracket IPv6 addresses leading to invalid configuration
- we were discussing to switch to DNS based publish_host anyway
- sticking to IPs would require us to change the publish_host once the Pod gets its IP and force another delete/create cycle for every node, which seems worse than just using a DNS name to begin with
Kibana and Enterprise Search are sensitive about the correct specification of the 'unspecified' ie. all-zero IP address (INADDR_ANY) to bind to all addresses on the local machine
- this PR introduces some form of IPv6 autodetection on operator startup by injecting the operator IP into the environment
- this information is then passed on into the configuration generation code for both applications to insert the correct IP
- we could amend this mechanism in the future by for example introducing an IPFamily attribute on the corresponding CRDs to allow users to express a preference in dual-stack environments or add a global flag on the operator level to do the same.
59c217f changes the discovery mechanism from IPs to DNS names. This change is not strictly necessary to support IPv6 I only changed it for consistency (see also Anya's comment Make ECK IPv6 compatible #3654 (comment)) It might carry some risk for slow cluster formation due to cached negative DNS lookups of not (yet) existing Pods and I am more than happy to revert it. We discussed out-of-band and the only potential benefit is that we might be able to remove the IP addresses from the certificates we generate and reduce the number of times we have to generate the certificates. At the same time this benefit is probably limited as certificates are one of the things Elasticsearch can reload on the fly. @nkvoll what are your thoughts in favour or against DNS based discovery?

Implications of switch to DNS names

@sebgl 's questions from #2830 still apply:

can DNS caching be an issue for nodes to discover themselves?

yes, negative lookup caching can be an issue: if a Pod was just created we might start seeing a delay until its name can be resolved if some entity did a negative lookup just before it was created

are there cases (security constraints) where Pods are not allowed/not supposed to rely on DNS?

Idk

any issue with the DNS server will lead to Elasticsearch not behaving correctly: are we ok with that?

I would assume that any issues with k8s DNS server will have more widespread effects than just Elasticsearch not behaving correctly, so I would say yes

what's the correct DNS name to use? , .

I believe it is <pod-name>.<statefulset-name> based on this

Breaking?

Should we mark this as breaking because we now include non-ready Pods in the headless service. I am on the fence because I believe the headless service should be considered an implementation detail and it is not meant for "public consumption". Having said that I am sure that there are users already relying on it, so at the very least we should highlight it in the release notes.

Testing

You need to run this against kind under Linux to test the IPv6 part (or set up your own vanilla IPv6 k8s from scratch).
I used the following spec:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  ipFamily: ipv6
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker

after reconfiguring Docker to use IPv6
cat /etc/docker/daemon.json

{
  "ipv6": true,
  "fixed-cidr-v6": "2001:db8:1::/64"
}

I am thinking we should introduce another test job to run the e2e tests against a IPv6 kind cluster (or change the existing job to IPv6) Will be a separate PR though

anyasabo · 2020-08-26T14:40:36Z

pkg/controller/elasticsearch/nodespec/statefulset.go

@@ -54,6 +54,8 @@ func HeadlessService(es *esv1.Elasticsearch, ssetName string) corev1.Service {
 					Port:     network.HTTPPort,
 				},
 			},
+			// allow nodes to discover themselves via DNS while they are booting up ie. are not ready yet
+			PublishNotReadyAddresses: true,


I think we will want to flag this in the release notes since it could cause issues if people were relying on the old behavior

pkg/controller/elasticsearch/settings/masters_test.go

anyasabo · 2020-08-26T14:50:35Z

This approach LGTM 👍

Question: Should we use DNS names in discovery as well to be consistent?

I think yes? But I haven't thought about it too much.

This PR also allows us to pick up #2833 since it adds DNS names to the cert

anyasabo · 2020-08-27T14:52:20Z

I was curious what the potential effects of negative caching might have and it became a bit complicated. CoreDNS by default has a 5s TTL. BUT the default config shipped with k8s has a 30s TTL (to match the dnsmasq based kubedns).

If you run coredns in cache mode (as you would for node local DNS), the default is 30 minutes for nxdomain responses. But in GKE, they configure node local DNS to have a 5s TTL for nxdomain responses.

Openshift (v4 at least) also runs with a 30s TTL.

It appears like most configs would have a 5-30s TTL, which seems fine to me.

pebrc · 2020-08-27T14:55:06Z

Elasticsearch/the JVM caches negative lookups for 10 seconds and positive ones for 60 in recent versions https://www.elastic.co/guide/en/elasticsearch/reference/current/networkaddress-cache-ttl.html

barkbay

I'm still doing some tests. I just left a minor comment but feel free to ignore it as I'm not an IPv6 expert.

barkbay · 2020-08-31T09:03:47Z

pkg/utils/net/ip_test.go

+			args: args{
+				ipStr: "::FFFF:129.144.52.38",
+			},
+			want: corev1.IPv4Protocol,


nit: I must confess I spent some time trying to understand the differences between IPv4-embedded IPv6 and IPv4-mapped IPv6, and wondering if this is what I would expect from ToIPFamily since ::FFFF:129.144.52.38 is an ipv6 address.

But I think I'm 👍 with the current code.

Yes I found that confusing too, but I am following net.ParseIP 's behaviour here.

barkbay · 2020-08-31T09:36:00Z

pkg/utils/net/ip.go

-func MaybeIPTo4(ipAddress net.IP) net.IP {
-	if ip := ipAddress.To4(); ip != nil {
+// IPToRFCForm normalizes the IP address given to fit the expected network byte order octet form described in
+// https://tools.ietf.org/html/rfc5280#section-4.2.1.6


IIUC in addition to the byte ordering requirement we also want to avoid compressed zero form of IPv6 addresses using "::" ?
Maybe we can also explain this in the comment ?

Well it is not removing the compressed form actually. Which is fine , with the exception of Enterprise Search which has currently an issue with compressed IPv6 IPs. I missed this because I was testing with manually configured overrides 😞 So thanks for pointing this out, will fix in another commit.

nkvoll · 2020-09-01T10:03:20Z

I think DNS based discovery makes sense. ES wants to discover a node with a name (which also usually has a PVC attached). The Pod is ephemeral and doesn't matter, and thus the same ephemeralness applies to its IP. Having bad TTLs for positive and negative hits seem like an operator error that also would cause issues for other services.

I would vote for creating another specific issue/PR for the switch to DNS-based discovery so it gets a little more (deserved) visibility.

This reverts commit 59c217f.

sebgl · 2020-09-04T08:40:48Z

cmd/manager/main.go

@@ -440,6 +441,7 @@ func startOperator(stopChan <-chan struct{}) error {
 	}
 	params := operator.Parameters{
 		Dialer:            dialer,
+		IPFamily:          net.ToIPFamily(os.Getenv(settings.EnvPodIP)),


I'm wondering if we should include that as an operator config setting, that defaults to autodetect if empty.
eg. --ip-family=ipv4|ipv6|""
So users can override it if for some reasons the IP family of the operator is not the ip family they want to use for the managed workload.

I thought this would be a future addition. But I think you are right it would be good to have an escape hatch right from the start. Will add it.

sebgl · 2020-09-04T08:41:35Z

pkg/controller/enterprisesearch/config.go

+	if ipFamily == corev1.IPv4Protocol {
+		return net.IPv4zero.String()
+	}
+	// Enterprise Search even in its most recent version 7.9.0 cannot properly handle contracted IPv6 addresses like "::"


sounds like we should open an issue in the enterprise search repo

Already done.

barkbay

I did some tests, deployed Elasticsearch,Kibana,APM Server, Enterprise Search and filebeats on Kind as suggested, it worked as expected.

Also 👍 to add a dedicated e2e test pipeline

Regarding questions around DNS caching I'm not sure I have a good answer. It might happen at the infra level, I guess understanding how it would disrupt Pods being restarted requires to understand how publish_host is used by the stack and the clients ?
Should we do some tests on an infra which makes an aggressive use of DNS caching to understand the edge cases ?

pkg/controller/elasticsearch/nodespec/readiness_probe.go

pkg/controller/kibana/config_settings.go

pebrc added 7 commits August 21, 2020 14:51

Support IPv6 in discovery configuration

1875fe8

Allow IPv6 in CSR (no tests)

8679497

experimental bracketing

fcdb4b9

experimental DNS based network.publish_host

ad05b50

experimental addition of pods DNS name to certs DNSNames

7f79e5d

fix unit tests

e71bea3

remove IPv4 form from validations

d55c409

botelastic bot added the triage label Aug 24, 2020

pebrc added the >bug Something isn't working label Aug 24, 2020

botelastic bot removed the triage label Aug 24, 2020

pebrc added the v1.3.0 label Aug 24, 2020

pebrc added 5 commits August 26, 2020 11:57

Autodetect IP family and use for Kibana and Enterprise Search

bbb183d

update readiness probes to use correct loopback address

b161280

fix missing fieldRef in manifest-generator

ecc9ed6

turn globbing off in readiness script to allow for unescaped brackets

cb550f5

turn globbing off in readiness script to allow for unescaped brackets

8a9b1e8

anyasabo reviewed Aug 26, 2020

View reviewed changes

pkg/controller/elasticsearch/settings/masters_test.go Show resolved Hide resolved

anyasabo added the release-highlight Candidate for the ECK release highlight summary label Aug 26, 2020

pebrc added 3 commits August 26, 2020 21:18

test fixes and test additions

6861f42

test IPv6 path in Kibana config

283aa79

Move discovery to DNS names

59c217f

pebrc marked this pull request as ready for review August 27, 2020 12:58

pebrc added >breaking and removed >breaking labels Aug 27, 2020

barkbay reviewed Aug 31, 2020

View reviewed changes

pebrc added 2 commits September 1, 2020 13:40

workaround Enteprise Search IPv6 issue

e59e3aa

Revert "Move discovery to DNS names"

d481ca5

This reverts commit 59c217f.

sebgl reviewed Sep 4, 2020

View reviewed changes

Add ip-family flag

c169ac4

barkbay approved these changes Sep 4, 2020

View reviewed changes

pkg/controller/elasticsearch/nodespec/readiness_probe.go Outdated Show resolved Hide resolved

pkg/controller/kibana/config_settings.go Outdated Show resolved Hide resolved

pebrc added 2 commits September 4, 2020 14:45

Reorg imports

eb3d488

Merge remote-tracking branch 'upstream/master' into ipv6

19af6d6

pebrc merged commit 8cf990a into elastic:master Sep 4, 2020

pebrc mentioned this pull request Sep 4, 2020

Move discovery to DNS names #3712

Closed

david-kow added the >breaking label Nov 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ECK IPv6 compatible #3654

Make ECK IPv6 compatible #3654

pebrc commented Aug 24, 2020 •

edited

Loading

anyasabo Aug 26, 2020

anyasabo commented Aug 26, 2020 •

edited

Loading

anyasabo commented Aug 27, 2020

pebrc commented Aug 27, 2020

barkbay left a comment

barkbay Aug 31, 2020 •

edited

Loading

pebrc Sep 1, 2020

barkbay Aug 31, 2020

pebrc Sep 1, 2020

nkvoll commented Sep 1, 2020

sebgl Sep 4, 2020

pebrc Sep 4, 2020

sebgl Sep 4, 2020

pebrc Sep 4, 2020

barkbay left a comment

Make ECK IPv6 compatible #3654

Make ECK IPv6 compatible #3654

Conversation

pebrc commented Aug 24, 2020 • edited Loading

Summary of changes

Implications of switch to DNS names

Breaking?

Testing

Choose a reason for hiding this comment

anyasabo commented Aug 26, 2020 • edited Loading

anyasabo commented Aug 27, 2020

pebrc commented Aug 27, 2020

barkbay left a comment

Choose a reason for hiding this comment

barkbay Aug 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkvoll commented Sep 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barkbay left a comment

Choose a reason for hiding this comment

pebrc commented Aug 24, 2020 •

edited

Loading

anyasabo commented Aug 26, 2020 •

edited

Loading

barkbay Aug 31, 2020 •

edited

Loading