Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch transport TLS certs verification to full #2833

Open
sebgl opened this issue Apr 8, 2020 · 0 comments
Open

Switch transport TLS certs verification to full #2833

sebgl opened this issue Apr 8, 2020 · 0 comments
Labels
>enhancement Enhancement of existing functionality

Comments

@sebgl
Copy link
Contributor

sebgl commented Apr 8, 2020

Related to #2823.

We experimented "full" TLS verification in #2659, and switched it back to "certificate" in #2831.

A bug in Elasticsearch leads to Elasticsearch presenting an outdated certificate with the wrong IP address during and after rolling upgrades.
As a result, full TLS verification leads to nodes not being able to trust each other when the bug happens.

We eventually want to reintroduce full verification, but we need to make sure we correctly handle the bug above. One option is to rely on DNS names instead of IP addresses, another one is to check the IP address in certificates before starting Elasticsearch.

Once we reintroduce full verification, we must make sure we don't break existing clusters impacted by the bug, even though we have a workaround for that bug. What may happen (that we want to avoid) is the following:

  • the change to full TLS verifications (and the TLS cert bugfix) leads to a rolling upgrade of the cluster
  • first Pod starts with full verification enabled: it cannot contact other nodes in the cluster impacted by the TLS cert bug
  • rolling upgrade can never progress

We may want to perform a double rolling upgrade instead, where we first upgrade all nodes to include the TLS certs bugfix, then upgrade all nodes to enable full TLS verification:

  • a cluster was created with ECK 1.0, it may or may not include Pods impacted by the bug, serving a certificate with the wrong IP
  • ECK version upgrade including the init container fix
  • the cluster gets a rolling upgrade, once done we are fairly confident no Pod is impacted by the TLS certs bug. At this point TLS verification is still set to "partial".
  • the cluster gets a second rolling upgrade to set TLS verification to "full", because we know no Pod is impacted by the TLS certs bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

No branches or pull requests

1 participant