[Autoscaling] Add Elasticsearch autoscaling controller #4173

barkbay · 2021-01-29T15:15:27Z

This PR introduces the Elasticsearch autoscaling controller.

This new controller manages the autoscaling policies, periodically polls the _autoscaling/capacity API and adjusts automatically the resources in the node sets.

Some parts of the code are not optimized, this PR is large, I tried to keep something which is easy to understand.
I also tried to add comments where needed to help the reader/reviewer.

Starting by the reconcileInternal function in pkg/controller/autoscaling/elasticsearch/driver.go might be a good starting point for a review.

The following improvements will be added in dedicated PRs:

Add a warning in case of overlapping roles.
Manage limits using ratios, only resource requests are supported for now.
Add a stabilization period
Make the poll period a setting, current default is 60s

Testing

This PR relies on the ES autoscaling client and requires to activate a trial (or a valid license)

Deciders are documented here.

Data tier

Unfortunately I don't have strong advices for testing storage deciders.
You can try to use Rally, but note that it will only exercise the reactive decider:

esrally race --pipeline=benchmark-only --track=eventdata --track-repository=eventdata --challenge=index-logs-fixed-daily-volume \
             --track-params="bulk_size:100,number_of_replicas:1,bulk_indexing_clients:1,daily_logging_volume:1GB" \
             --target-hosts=https://${ES}:9200 \
             --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'elastic',basic_auth_password:$PASSWORD"

You can also deploy some agents to create datastreams.

ML

You can use the following API call to create a ML job (it requires the sample data flights). Then do a POST _ml/data_frame/analytics/mljob/_start to start the job.

Logs and status

The expected resources for the node sets and some useful messages are printed in the logs but also stored in the elasticsearch.alpha.elastic.co/autoscaling-status annotation. You can use the following command to retrieve it:kubectl get -n elasticsearch.elasticsearch.k8s.elastic.co/mycluster -o jsonpath='{.metadata.annotations.elasticsearch\.alpha\.elastic\.co/autoscaling-status}' | jq

pebrc

I am still working through this but I thought I share a first set of comments. All minor.

pkg/controller/autoscaling/elasticsearch/autoscaler/autoscaler.go

pkg/controller/autoscaling/elasticsearch/controller.go

pkg/controller/autoscaling/elasticsearch/status/actual.go

pkg/controller/autoscaling/elasticsearch/driver.go

pkg/controller/autoscaling/elasticsearch/autoscaler/vertical.go

pkg/controller/autoscaling/elasticsearch/autoscaler/nodesets.go

charith-elastic

It's a lot to digest and I won't claim to have understood everything. But, overall LGTM 👍🏼

pkg/controller/autoscaling/elasticsearch/controller.go

pkg/controller/autoscaling/elasticsearch/driver.go

pebrc

I think I am 2/3 through ...

pkg/controller/elasticsearch/driver/autoscaling.go

pkg/controller/autoscaling/elasticsearch/autoscaler/nodesets.go

pebrc · 2021-02-04T14:32:37Z

pkg/controller/autoscaling/elasticsearch/autoscaler/linear_scaler.go

+	requiredCPUCapacityAsMilli := cpuRange.Min.MilliValue() + requiredAdditionalCPUCapacity
+
+	// Round up memory to the next core
+	requiredCPUCapacityAsMilli = roundUp(requiredCPUCapacityAsMilli, 1000)


Should we allow CPU capacity in smaller increments than one core?

I don't have a strong opinion. I always have preferred rounded values because it makes it easier to resonate about the performance model (e.g. how to size thread pools in the JVM with fractional number ?) But I agree that the user might want to have more accurate resource requests.
@elastic/cloud-k8s I'm curious if there are other opinions.

For Elasticsearch or Kibana, we would round up CPU capacity to an integer, but might be too much for lighter components, for example beats.
Also, if you just want to run the whole ECK stack on a laptop, users would prefer to have less granual option to define resource limits, just to enable them to test locally.

pkg/controller/autoscaling/elasticsearch/autoscaler/linear_scaler_test.go

pkg/controller/autoscaling/elasticsearch/autoscaler/offline.go

pebrc

Impressive work! I think I went through everything once. But I haven't tested as this is depending on the API PR IIUC

pkg/controller/autoscaling/elasticsearch/autoscaler/autoscaler_test.go

pkg/controller/autoscaling/elasticsearch/autoscaler/horizontal.go

pkg/controller/autoscaling/elasticsearch/controller.go

pkg/controller/autoscaling/elasticsearch/policy.go

pkg/controller/autoscaling/elasticsearch/reconcile.go

pkg/controller/autoscaling/elasticsearch/resources/resources_test.go

pkg/controller/autoscaling/elasticsearch/resources/resources.go

pkg/controller/autoscaling/elasticsearch/status/events.go

…roller

barkbay · 2021-02-10T10:43:06Z

Thanks all for the time you spent on this PR, I know it is not an easy one 🙇

Quick update:

I have been working on a PR to support request/limit ratio but I think it requires this one to be merged before I raise it.
Also this PR is blocked by this one.

pebrc

LGTM! I did only a very few tests over night and storage autoscaling worked as expected. But I hoped to be able to do some more tests. But I believe we can merge this as is and iterate if necessary.

pkg/controller/autoscaling/elasticsearch/controller.go

pkg/controller/autoscaling/elasticsearch/resources/resources.go

…roller

barkbay added >feature Adds or discusses adding a feature to the product v1.5.0 autoscaling labels Jan 29, 2021

[Autoscaling] Add Elasticsearch autoscaling controller

0e99651

barkbay force-pushed the autoscaling-pr/controller branch from 172cc8a to 0e99651 Compare January 29, 2021 15:22

pebrc reviewed Feb 3, 2021

View reviewed changes

charith-elastic reviewed Feb 4, 2021

View reviewed changes

pkg/controller/autoscaling/elasticsearch/autoscaler/vertical.go Outdated Show resolved Hide resolved

pkg/controller/autoscaling/elasticsearch/autoscaler/nodesets.go Outdated Show resolved Hide resolved

barkbay added 4 commits February 4, 2021 14:57

Improve naming and comments

d1b78d7

Fix compilation error

858446a

Elasticsearch is already in the autoscaling spec

7486448

Remove references to named tiers

26b6287

charith-elastic reviewed Feb 4, 2021

View reviewed changes

pkg/controller/autoscaling/elasticsearch/controller.go Outdated Show resolved Hide resolved

pkg/controller/autoscaling/elasticsearch/driver.go Outdated Show resolved Hide resolved

pebrc reviewed Feb 4, 2021

View reviewed changes

barkbay mentioned this pull request Feb 5, 2021

Controller version refactoring #4197

Merged

Improve naming and comments

450dbe5

pebrc reviewed Feb 5, 2021

View reviewed changes

barkbay added 11 commits February 5, 2021 11:53

Remove FairNodesManager abstraction

3f46298

Update nodeSetResourcesFromStatus

567c49d

Update comments

4519412

Preserve existing volume claim template

b71ce52

Rename PolicyStateType to AutoscalingEventType

2dbd006

Use defer tracing.Span(&ctx)()

d192a8f

Minor refactoring and comments update

8bd278e

nodeToAdd -> nodesToAdd

a681a73

Fix vertical limit message and add unit tests

e96a25e

Merge remote-tracking branch 'origin/master' into autoscaling-pr/cont…

25b0ab6

…roller

Fix compiler error

a4baa6b

pebrc approved these changes Feb 10, 2021

View reviewed changes

pkg/controller/autoscaling/elasticsearch/controller.go Outdated Show resolved Hide resolved

pkg/controller/autoscaling/elasticsearch/resources/resources.go Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/master' into autoscaling-pr/cont…

0daa97e

…roller

barkbay added 3 commits February 11, 2021 09:41

Use CheckCompatibility

0dd482d

Logging: only print the names of the nodeSets in debug mode

eb61c93

improve comment

4d8dfbc

barkbay merged commit f6d3a00 into elastic:master Feb 11, 2021

This was referenced Feb 18, 2021

Autoscaling: user notifications and status API #4024

Closed

Autoscaling: dynamic scaling of cpu requests #4021

Closed

Autoscaling: dynamic scaling of memory requests with storage deciders #4076

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Autoscaling] Add Elasticsearch autoscaling controller #4173

[Autoscaling] Add Elasticsearch autoscaling controller #4173

barkbay commented Jan 29, 2021 •

edited

Loading

pebrc left a comment

charith-elastic left a comment

pebrc left a comment

pebrc Feb 4, 2021

barkbay Feb 8, 2021

plaformsre Feb 9, 2021

pebrc left a comment

barkbay commented Feb 10, 2021

pebrc left a comment

[Autoscaling] Add Elasticsearch autoscaling controller #4173

[Autoscaling] Add Elasticsearch autoscaling controller #4173

Conversation

barkbay commented Jan 29, 2021 • edited Loading

Testing

Data tier

ML

Logs and status

pebrc left a comment

Choose a reason for hiding this comment

charith-elastic left a comment

Choose a reason for hiding this comment

pebrc left a comment

Choose a reason for hiding this comment

pebrc Feb 4, 2021

Choose a reason for hiding this comment

barkbay Feb 8, 2021

Choose a reason for hiding this comment

plaformsre Feb 9, 2021

Choose a reason for hiding this comment

pebrc left a comment

Choose a reason for hiding this comment

barkbay commented Feb 10, 2021

pebrc left a comment

Choose a reason for hiding this comment

barkbay commented Jan 29, 2021 •

edited

Loading