-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ECK IPv6 compatible #3654
Make ECK IPv6 compatible #3654
Conversation
@@ -54,6 +54,8 @@ func HeadlessService(es *esv1.Elasticsearch, ssetName string) corev1.Service { | |||
Port: network.HTTPPort, | |||
}, | |||
}, | |||
// allow nodes to discover themselves via DNS while they are booting up ie. are not ready yet | |||
PublishNotReadyAddresses: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will want to flag this in the release notes since it could cause issues if people were relying on the old behavior
This approach LGTM 👍
I think yes? But I haven't thought about it too much. This PR also allows us to pick up #2833 since it adds DNS names to the cert |
I was curious what the potential effects of negative caching might have and it became a bit complicated. CoreDNS by default has a 5s TTL. BUT the default config shipped with k8s has a 30s TTL (to match the dnsmasq based kubedns). If you run coredns in cache mode (as you would for node local DNS), the default is 30 minutes for nxdomain responses. But in GKE, they configure node local DNS to have a 5s TTL for nxdomain responses. Openshift (v4 at least) also runs with a 30s TTL. It appears like most configs would have a 5-30s TTL, which seems fine to me. |
Elasticsearch/the JVM caches negative lookups for 10 seconds and positive ones for 60 in recent versions https://www.elastic.co/guide/en/elasticsearch/reference/current/networkaddress-cache-ttl.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still doing some tests. I just left a minor comment but feel free to ignore it as I'm not an IPv6 expert.
args: args{ | ||
ipStr: "::FFFF:129.144.52.38", | ||
}, | ||
want: corev1.IPv4Protocol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I must confess I spent some time trying to understand the differences between IPv4-embedded IPv6
and IPv4-mapped IPv6
, and wondering if this is what I would expect from ToIPFamily
since ::FFFF:129.144.52.38
is an ipv6 address.
But I think I'm 👍 with the current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I found that confusing too, but I am following net.ParseIP
's behaviour here.
func MaybeIPTo4(ipAddress net.IP) net.IP { | ||
if ip := ipAddress.To4(); ip != nil { | ||
// IPToRFCForm normalizes the IP address given to fit the expected network byte order octet form described in | ||
// https://tools.ietf.org/html/rfc5280#section-4.2.1.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC in addition to the byte ordering requirement we also want to avoid compressed zero form of IPv6 addresses using "::" ?
Maybe we can also explain this in the comment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it is not removing the compressed form actually. Which is fine , with the exception of Enterprise Search which has currently an issue with compressed IPv6 IPs. I missed this because I was testing with manually configured overrides 😞 So thanks for pointing this out, will fix in another commit.
I think DNS based discovery makes sense. ES wants to discover a node with a name (which also usually has a PVC attached). The Pod is ephemeral and doesn't matter, and thus the same ephemeralness applies to its IP. Having bad TTLs for positive and negative hits seem like an operator error that also would cause issues for other services. I would vote for creating another specific issue/PR for the switch to DNS-based discovery so it gets a little more (deserved) visibility. |
This reverts commit 59c217f.
cmd/manager/main.go
Outdated
@@ -440,6 +441,7 @@ func startOperator(stopChan <-chan struct{}) error { | |||
} | |||
params := operator.Parameters{ | |||
Dialer: dialer, | |||
IPFamily: net.ToIPFamily(os.Getenv(settings.EnvPodIP)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should include that as an operator config setting, that defaults to autodetect if empty.
eg. --ip-family=ipv4|ipv6|""
So users can override it if for some reasons the IP family of the operator is not the ip family they want to use for the managed workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this would be a future addition. But I think you are right it would be good to have an escape hatch right from the start. Will add it.
if ipFamily == corev1.IPv4Protocol { | ||
return net.IPv4zero.String() | ||
} | ||
// Enterprise Search even in its most recent version 7.9.0 cannot properly handle contracted IPv6 addresses like "::" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like we should open an issue in the enterprise search repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some tests, deployed Elasticsearch
,Kibana
,APM Server
, Enterprise Search
and filebeats
on Kind as suggested, it worked as expected.
Also 👍 to add a dedicated e2e test pipeline
Regarding questions around DNS caching I'm not sure I have a good answer. It might happen at the infra level, I guess understanding how it would disrupt Pods being restarted requires to understand how publish_host
is used by the stack and the clients ?
Should we do some tests on an infra which makes an aggressive use of DNS caching to understand the edge cases ?
Fixes #3649
Summary of changes
network.publish_host
to DNS based:publish_host
anywaypublish_host
once the Pod gets its IP and force another delete/create cycle for every node, which seems worse than just using a DNS name to begin withINADDR_ANY
) to bind to all addresses on the local machineIPFamily
attribute on the corresponding CRDs to allow users to express a preference in dual-stack environments or add a global flag on the operator level to do the same.Implications of switch to DNS names
@sebgl 's questions from #2830 still apply:
yes, negative lookup caching can be an issue: if a Pod was just created we might start seeing a delay until its name can be resolved if some entity did a negative lookup just before it was created
Idk
I would assume that any issues with k8s DNS server will have more widespread effects than just Elasticsearch not behaving correctly, so I would say yes
I believe it is
<pod-name>.<statefulset-name>
based on thisBreaking?
Should we mark this as breaking because we now include non-ready Pods in the headless service. I am on the fence because I believe the headless service should be considered an implementation detail and it is not meant for "public consumption". Having said that I am sure that there are users already relying on it, so at the very least we should highlight it in the release notes.
Testing
You need to run this against kind under Linux to test the IPv6 part (or set up your own vanilla IPv6 k8s from scratch).
I used the following spec:
after reconfiguring Docker to use IPv6
cat /etc/docker/daemon.json
I am thinking we should introduce another test job to run the e2e tests against a IPv6 kind cluster (or change the existing job to IPv6) Will be a separate PR though