-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Autoscaling] Add Elasticsearch autoscaling controller #4173
Conversation
172cc8a
to
0e99651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still working through this but I thought I share a first set of comments. All minor.
pkg/controller/autoscaling/elasticsearch/autoscaler/autoscaler.go
Outdated
Show resolved
Hide resolved
pkg/controller/autoscaling/elasticsearch/autoscaler/autoscaler.go
Outdated
Show resolved
Hide resolved
pkg/controller/autoscaling/elasticsearch/autoscaler/vertical.go
Outdated
Show resolved
Hide resolved
pkg/controller/autoscaling/elasticsearch/autoscaler/vertical.go
Outdated
Show resolved
Hide resolved
pkg/controller/autoscaling/elasticsearch/autoscaler/nodesets.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lot to digest and I won't claim to have understood everything. But, overall LGTM 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am 2/3 through ...
pkg/controller/autoscaling/elasticsearch/autoscaler/nodesets.go
Outdated
Show resolved
Hide resolved
requiredCPUCapacityAsMilli := cpuRange.Min.MilliValue() + requiredAdditionalCPUCapacity | ||
|
||
// Round up memory to the next core | ||
requiredCPUCapacityAsMilli = roundUp(requiredCPUCapacityAsMilli, 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow CPU capacity in smaller increments than one core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion. I always have preferred rounded values because it makes it easier to resonate about the performance model (e.g. how to size thread pools in the JVM with fractional number ?) But I agree that the user might want to have more accurate resource requests.
@elastic/cloud-k8s I'm curious if there are other opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Elasticsearch or Kibana, we would round up CPU capacity to an integer, but might be too much for lighter components, for example beats.
Also, if you just want to run the whole ECK stack on a laptop, users would prefer to have less granual option to define resource limits, just to enable them to test locally.
pkg/controller/autoscaling/elasticsearch/autoscaler/linear_scaler_test.go
Outdated
Show resolved
Hide resolved
pkg/controller/autoscaling/elasticsearch/autoscaler/linear_scaler_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work! I think I went through everything once. But I haven't tested as this is depending on the API PR IIUC
pkg/controller/autoscaling/elasticsearch/resources/resources_test.go
Outdated
Show resolved
Hide resolved
Thanks all for the time you spent on this PR, I know it is not an easy one 🙇 Quick update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I did only a very few tests over night and storage autoscaling worked as expected. But I hoped to be able to do some more tests. But I believe we can merge this as is and iterate if necessary.
This PR introduces the Elasticsearch autoscaling controller.
This new controller manages the autoscaling policies, periodically polls the
_autoscaling/capacity
API and adjusts automatically the resources in the node sets.Some parts of the code are not optimized, this PR is large, I tried to keep something which is easy to understand.
I also tried to add comments where needed to help the reader/reviewer.
Starting by the
reconcileInternal
function inpkg/controller/autoscaling/elasticsearch/driver.go
might be a good starting point for a review.The following improvements will be added in dedicated PRs:
Testing
This PR relies on the ES autoscaling client and requires to activate a trial (or a valid license)
Deciders are documented here.
Data tier
Unfortunately I don't have strong advices for testing storage deciders.
You can try to use Rally, but note that it will only exercise the reactive decider:
You can also deploy some agents to create datastreams.
ML
You can use the following API call to create a ML job (it requires the sample data flights). Then do a
POST _ml/data_frame/analytics/mljob/_start
to start the job.Logs and status
The expected resources for the node sets and some useful messages are printed in the logs but also stored in the
elasticsearch.alpha.elastic.co/autoscaling-status
annotation. You can use the following command to retrieve it:kubectl get -n elasticsearch.elasticsearch.k8s.elastic.co/mycluster -o jsonpath='{.metadata.annotations.elasticsearch\.alpha\.elastic\.co/autoscaling-status}' | jq