Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

[elasticsearch] make service configurable #123

Merged
merged 12 commits into from
Jun 12, 2019
3 changes: 3 additions & 0 deletions elasticsearch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ helm install --name elasticsearch elastic/elasticsearch --version 7.0.1-alpha1 -
| `protocol` | The protocol that will be used for the readinessProbe. Change this to `https` if you have `xpack.security.http.ssl.enabled` set | `http` |
| `httpPort` | The http port that Kubernetes will use for the healthchecks and the service. If you change this you will also need to set [http.port](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-http.html#_settings) in `extraEnvs` | `9200` |
| `transportPort` | The transport port that Kubernetes will use for the service. If you change this you will also need to set [transport port configuration](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-transport.html#_transport_settings) in `extraEnvs` | `9300` |
| `service.type` | Type of elasticsearch service. [Service Types](https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types) | `ClusterIP` |
| `service.annotations` | Annotations that Kubernetes will use for the service. This will configure load balancer if `service.type` is `LoadBalancer` [Annotations](https://kubernetes.io/docs/concepts/services-networking/service/#ssl-support-on-aws) | `{}` |

kimxogus marked this conversation as resolved.
Show resolved Hide resolved
| `updateStrategy` | The [updateStrategy](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets) for the statefulset. By default Kubernetes will wait for the cluster to be green after upgrading each pod. Setting this to `OnDelete` will allow you to manually delete each pod during upgrades | `RollingUpdate` |
| `maxUnavailable` | The [maxUnavailable](https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget) value for the pod disruption budget. By default this will prevent Kubernetes from having more than 1 unhealthy pod in the node group | `1` |
| `fsGroup` | The Group ID (GID) for [securityContext.fsGroup](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) so that the Elasticsearch user can read from the persistent volume | `1000` |
Expand Down
13 changes: 10 additions & 3 deletions elasticsearch/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,15 @@ kind: Service
apiVersion: v1
metadata:
name: {{ template "uname" . }}
labels:
heritage: {{ .Release.Service | quote }}
release: {{ .Release.Name | quote }}
chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
app: "{{ template "uname" . }}"
annotations:
{{ toYaml .Values.service.annotations | indent 4 }}
spec:
type: {{ .Values.service.type }}
selector:
heritage: {{ .Release.Service | quote }}
release: {{ .Release.Name | quote }}
Expand All @@ -26,11 +34,10 @@ metadata:
release: {{ .Release.Name | quote }}
chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
app: "{{ template "uname" . }}"
annotations:
# Create endpoints also if the related pod isn't ready
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
Copy link
Contributor Author

@kimxogus kimxogus May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation is deprecated kubernetes/kubernetes#63742

spec:
clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve
# Create endpoints also if the related pod isn't ready
publishNotReadyAddresses: true
kimxogus marked this conversation as resolved.
Show resolved Hide resolved
selector:
app: "{{ template "uname" . }}"
ports:
Expand Down
4 changes: 2 additions & 2 deletions elasticsearch/templates/statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -241,8 +241,8 @@ spec:

cleanup () {
while true ; do
local master="$(http "/_cat/master?h=node")"
if [[ $master == "{{ template "uname" . }}"* && $master != "${NODE_NAME}" ]]; then
local master="$(http "/_cat/master?h=node" || echo "")"
kimxogus marked this conversation as resolved.
Show resolved Hide resolved
if [[ $master && $master != "${NODE_NAME}" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this? It just seems less safe than what was already in there. Yes it will fail if there is an empty string, however if the API returns anything else (or some kind of weird error) then it is going to be a non-empty string and this will pass.

Did you find an issue with the current logic that this fixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I migrated es cluster from a cluster(all nodes have master, ingest and data role) to this chart. Previous master was not prefixed by uname, so I had to wait several minutes during rollout caused by configuration change when nodes from both old and new cluster exists before exclude old nodes from cluster shard allocation.
I think this is unnecessary because --fail option is passed to curl(https://github.com/elastic/helm-charts/blob/master/elasticsearch/templates/statefulset.yaml#L239), script will exit if an error is returned from API($master won't have a value).
As h=node query param is passed, $master string will only contain node name of master node if it's not empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was my explanation not enough? Then I can restore this because this only happens in specific migration case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary because --fail option is passed to curl(https://github.com/elastic/helm-charts/blob/master/elasticsearch/templates/statefulset.yaml#L239), script will exit if an error is returned from API($master won't have a value).

Thanks for noticing this. This combination of set -e and curl --fail will mean that this script might exit early if the call fails. Ideally we want this loop to keep on running until we can see that a new master exists that isn't the current pod. Exiting early is not what we want to happen here.

My concern is whether or not the API will return an error when there is no master or not. If the API returns a weird message like "no master yet" with a 200 then it's possible for the script to exit too early. Looking at the code or Elasticsearch it looks like it will actually return a dash (-) if there is no master found.

Then I can restore this because this only happens in specific migration case.

Did it just mean that Kubernetes waited for the 120 second timeout while stopping each of the masters?

I think what actually makes a lot more sense is to check that it starts with {{ template "masterService" . }}. That way this will always point to the prefix for the right master even during migrations like in https://github.com/elastic/helm-charts/blob/master/elasticsearch/examples/migration/README.md

Was my explanation not enough?

Sorry for the slow reply. I'm currently on a work trip and only have a few quiet moments to sneak some work in. I'll be at Kubecon in Barcelona next week too so things will be slow for a little while. If you are at Kubecon come and say hi at the Elastic booth :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it just mean that Kubernetes waited for the 120 second timeout while stopping each of the masters?

Yes, I had to wait 120 seconds per each master node.

I think what actually makes a lot more sense is to check that it starts with {{ template "masterService" . }}. That way this will always point to the prefix for the right master even during migrations like in /elasticsearch/examples/migration/README.md@master

Changed to {{ template "masterService" . }}!

I'm currently on a work trip and only have a few quiet moments to sneak some work in. I'll be at Kubecon in Barcelona next week too so things will be slow for a little while. If you are at Kubecon come and say hi at the Elastic booth :)

I'm not going to this Kubecon, but I hope to visit and say hi someday :)

echo "This node is not master."
break
fi
Expand Down
3 changes: 3 additions & 0 deletions elasticsearch/tests/elasticsearch_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ def test_defaults():
# Service
s = r['service'][uname]
assert s['metadata']['name'] == uname
assert s['metadata']['annotations'] == {}
assert s['spec']['type'] == 'ClusterIP'
assert len(s['spec']['ports']) == 2
assert s['spec']['ports'][0] == {
'name': 'http', 'port': 9200, 'protocol': 'TCP'}
Expand All @@ -161,6 +163,7 @@ def test_defaults():
# Headless Service
h = r['service'][uname + '-headless']
assert h['spec']['clusterIP'] == 'None'
assert h['spec']['publishNotReadyAddresses'] == true
assert h['spec']['ports'][0]['name'] == 'http'
assert h['spec']['ports'][0]['port'] == 9200
assert h['spec']['ports'][1]['name'] == 'transport'
Expand Down
4 changes: 4 additions & 0 deletions elasticsearch/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ protocol: http
httpPort: 9200
transportPort: 9300

service:
type: ClusterIP
annotations: {}

updateStrategy: RollingUpdate

# This is the max unavailable setting for the pod disruption budget
Expand Down