Elasticsearch should auto-determine appropriate machine memory settings #65025
Labels
:Core/Infra/Core
Core issues without another label
:Delivery/Cloud
Cloud-specific packaging and deployment
Team:Core/Infra
Meta label for core/infra team
Team:Delivery
Meta label for Delivery team
Background
Currently we expect our on-prem users to appropriately set the size of the heap and the allocation of memory to ML. We also have cloud explicitly set heap based on a number of factors. Both of these cases create a bad experience. Our users should not need to accrue knowledge of the internal memory needs of a node including what features will consume heap vs. native memory. This creates a heavy burden for our users to get started in production and to upgrade as well as being an unrealistic ask of the uses we are wanting to adopt.
Since Elasticsearch itself has knowledge of the typical demands of various node roles on heap vs. native memory it should itself set the heap size and other relevant memory settings dependent on the node role and the memory available.
Additionally to make autoscaling successful we need to be able to have the cluster ask Cloud for nodes with specified total memory. In order to calculate the total memory required we need for ML nodes we need to start from the memory required by the native process to satisfy the existing jobs and work back to the total memory required for the container.
Proposal
Quite simply, the function of how determine appropriate memory settings for heap and ML is a combination of the role(s) of the node and available system/container memory. The latter information we have (or can easily get) already, but our JVM ergonomics code currently runs before settings parsing, thus we do not know what roles are applied to the given node. Existing settings parsing logic is complex and rather heavy weight as it requires loading all installed plugins (since they might register their own settings). This is overkill for our purposes so the determined way forward should be to simply implement the minimum required logic to parse
elasticsearch.yml
and pull out the node roles.The text was updated successfully, but these errors were encountered: