-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodetermine heap settings based on node roles and total system memory #65905
Autodetermine heap settings based on node roles and total system memory #65905
Conversation
This commit expands our JVM egonomics to also automatically determine appropriate heap size based on the total available system memory as well as the roles assigned to the node. Role determination is done via a naive parsing of elasticsearch.yml. No settings validation is done and only the 'node.roles' setting is taken into consideration. For heap purposes a node falls into one of four (4) categories: 1. A 'master-only' node. This is a node with only the 'master' role. 2. A 'ml-only' node. Similarly, a node with only the 'ml' role. 3. A 'data' node. This is basically the 'other' case. A node with any set of roles other than only master or only ml is considered a 'data' node, to include things like coordinating-only or "tie-breaker" nodes. 4. Unknown. This is the case if legacy settings are used. In this scenario we fallback to the old default heap options of 1GB. In all cases we short-circuit if a user provides explicit heap options so we only ever auto-determine heap if no existing heap options exist. Starting with this commit the default heap settings (1GB) are now removed from the default jvm.options which means we'll start auto- setting heap as the new default.
Pinging @elastic/es-delivery (Team:Delivery) |
distribution/src/config/jvm.options
Outdated
@@ -36,8 +36,8 @@ | |||
# Xms represents the initial size of the JVM heap | |||
# Xmx represents the maximum size of the JVM heap | |||
|
|||
-Xms${heap.min} | |||
-Xmx${heap.max} | |||
# -Xms${heap.min} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to think about how to change this. Should we remove this section entirely? Continue to include it also recommend folks stick with auto-determined heap size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to leave it in with a note, since until now our users have always had to configure it, and our training courses will expect it to exist. Maybe we can remove it in master but retain it with a comment in 7.x
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky. Either way (removing, or commenting out), we are going to potentially break automation here. For example, if there's automation in place that downloads our tar archive, explodes it, and uses sed
to replace the heap size.
That leads me to wonder out loud if we should not break users here. I'd propose the following: in 7.x, we leave this as-is in the jvm.options file. Perhaps we provide a warning when the heap size is set manually by a user, pointing them to documentation for machine-dependent heap sizes. In Cloud, we still want to activate the machine-dependent heap sizes, so we'd need deployments there to remove the -Xms/-Xmx
flags from their jvm.options file.
In 8.0, we remove this from the default jvm.options file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the concern, though I'm not sure about a warning. I don't think we intend to disallow setting the heap manually right? Yet a warning for setting the heap would seem like just that. Such a sed replacement in practice would seem fragile. It could break on any change to the default.
I'm conflicted. I don't see a way to warn some users without annoying others. But I do think this is something we can followup on, so in this PR I will restore the original value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was:
Perhaps we provide a warning when the heap size is set manually by a user, pointing them to documentation for machine-dependent heap sizes.
Not a warning that we will disallow setting the heap size manually, but a permanent warning that the heap size is being set manually and we recommend not for the optimal experience.
But I do think this is something we can followup on, so in this PR I will restore the original value.
Are you saying then that the default experience will be to not get machine-dependent heap sizing? That seems contrary to our goals here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave the jvm.options as is, the default will remain not getting machine-dependent heap sizing. If we emit a warning, any user who chooses to not want to use machine-dependent heap sizing will get an annoying warning they cannot use. Do we agree on those statements?
My suggestion was to leave the jvm.options as is (so as not to break the theoretical users that could be using sed to replace the heap size), and then follow up with whatever we decide to do regarding helping users convert within 7.x. This suggestion is because cloud will not rely on the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave the jvm.options as is, the default will remain not getting machine-dependent heap sizing. If we emit a warning, any user who chooses to not want to use machine-dependent heap sizing will get an annoying warning they cannot use. Do we agree on those statements?
Why can't they use it? The point of the warning is to tell them to stop setting the heap size manually, let Elasticsearch do it for them. The warning is useful to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not sure how "cannot use" got in my response, it was meant to be "cannot remove". That is it is a persistent warning that the only way to remove is to go with ES's memory choices. The warning is useful in that they find out they could be using our automatic heap sizing, but are we saying users will never tune heap anymore? I personally get annoyed if there are warnings I cannot remove with normal usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jason and I spoke and came to an agreement on the above discussion. @jasontedor Let me know if I missed or misunderstood anything.
We should add a warning when heap size is manually specified, since our goal is to have users use the auto heap. There are however some issues that prevent us from doing that in this PR. First is a technical issue: how do we pass back the message when the only output of the jvm options parser is the final jvm options?. The second issue is allowing cloud time to adapt to using auto sizing so that we do not spew errors across all cloud deployments. I created #66308 to ensure we followup with adding this warning.
Regarding whether to leave the heap options or remove them in a minor, we think the auto heap sizing is important enough that we don't want to have any more new users specifying heap. So, we should remove the Xmx/Xms from jvm.options
. Users should no longer be relying on modifying jvm.options
, but instead specifying these in jvm.options.d
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mark-vieira @jasontedor I've removed Xmx/Xms in 88a81a1, but left in a comment explaining where the heap size should be set if it needs to be.
@rjernst I wasn't able to get as far as adding packaging tests as fixing existing tests took priority. I think I've got the linux tests passing but there's a Windows test still failing. I think the root cause is the service blows up trying to start because it can't allocate enough heap, probably due to other contention on the machine, for example, the Gradle daemon is using 25G by default 😆 We might need to put in some conservative defaults across the board here so that running concurrent tests doesn't blow up memory. These new defaults are somewhat aggressive and assume the current host is single-purpose and there's not a lot of other contention for resources. Other than that the only remain item is documentation and that simply depends on to what degree we intend to document this. I suspect for 7.11 it will be minimal. |
The only packaging tests now failing are the Windows test classes that aren't applicable (no tests are run) but it's failing in the cleanup code due to a process still running it seems. Perhaps this is another side effect of us spinning up a JVM with a very large heap and that affecting sensitive timings in these tests. |
|
||
public class MachineDependentHeapTests extends LaunchersTestCase { | ||
|
||
public void testDetermineHeapSize() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadoc comments on the test cases help to ensure that the implementation of test matches its intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree this can be helpful on complex tests, I think it is overkill on simple tests, such as these, that the name of the class and method generally make obvious what is being tested. In this particular case, I've renamed the method to testDefaultHeapSize()
since it is testing the default when no configuration exists.
|
||
public void testMasterOnlyNode() { | ||
MachineDependentHeap.MachineNodeRole nodeRole = parseConfig(sb -> sb.append("node.roles: [master]")); | ||
assertEquals(nodeRole, MASTER_ONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to use assertThat
pretty much everywhere these days. Not saying you should change it, but I find the messages it generates to be more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to assertThat.
import static org.junit.Assert.assertNotEquals; | ||
import static org.junit.Assert.fail; | ||
|
||
public class NodeRoleParserTests extends LaunchersTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have tests for garbage input and invalid YAML too? I was going to suggest a JSON config, but the parser ought to parse that quite happily 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for invalid yaml (which is the same as garbage)
@@ -311,7 +311,7 @@ public void test73CustomJvmOptionsDirectoryFilesWithoutOptionsExtensionIgnored() | |||
startElasticsearch(); | |||
|
|||
final String nodesResponse = makeRequest(Request.Get("http://localhost:9200/_nodes")); | |||
assertThat(nodesResponse, containsString("\"heap_init_in_bytes\":1073741824")); | |||
assertThat(nodesResponse, not(containsString("\"heap_init_in_bytes\":536870912"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes makes me uncomfortable. Is it not possible that we might happen to calculate the heap to be this size, causing the test to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten the test to pass a bogus option, so that java should fail to start if the option is used.
* @throws IOException if unable to load elasticsearch.yml | ||
*/ | ||
public List<String> determineHeapSettings(Path configDir, List<String> userDefinedJvmOptions) throws IOException { | ||
if (userDefinedJvmOptions.stream().anyMatch(s -> s.startsWith("-Xms") || s.startsWith("-Xmx"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think check is sufficient. A user can also set -XX:MaxHeapSize
and -XX:MinHeapSize
for which -Xmx
and -Xms
are synonyms.
Note that in JvmErgonomics#choose
we start up a second JVM and extract whether or not a setting was passed in on the command line. We might want to reuse that here and then check if MinHeapSize
or MaxHeapSize
are passed on the command line. This might be more reliable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the method to find the final options so it is reused here.
// Strangely formatted config, so just return defaults and let startup settings validation catch the problem | ||
return MachineNodeRole.UNKNOWN; | ||
} catch (YAMLException ex) { | ||
throw new IllegalStateException("Unable to parse elasticsearch.yml:", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should contain the full path? Or if we should return UNKNOWN and let startup fail us here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like returning UNKOWN and letting Elasticsearch handle the misconfiguration.
roles = (List<String>) map.get("node.roles"); | ||
} | ||
} catch (ClassCastException ex) { | ||
throw new IllegalStateException("Unable to parse elasticsearch.yml. Expected 'node.roles' to be a list."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return UNKNOWN and let Elasticsearch handle it?
...on/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/MachineDependentHeap.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. I left some comments, but I also didn't review all the details, I'll leave the detailed review to someone else.
@rjernst Are you picking up this work?
Yes, I am picking this up. I will address all the comments soon. |
Since elastic#65905 Elasticsearch has determined the Java heap settings from node roles and total system memory. This change allows the total system memory used in that calculation to be overridden with a user-specified value. This is intended to be used when Elasticsearch is running on a machine where some other software that consumes a non-negligible amount of memory is running. For example, a user could tell Elasticsearch to assume it was running on a machine with 3GB of RAM when actually it was running on a machine with 4GB of RAM. The system property is `es.total_memory_bytes`, so, for example, could be specified using `-Des.total_memory_bytes=3221225472`. (It is specified in bytes rather than using a unit, because it needs to be parsed by startup code that does not have access to the utility classes that interpret byte size units.)
Since #65905 Elasticsearch has determined the Java heap settings from node roles and total system memory. This change allows the total system memory used in that calculation to be overridden with a user-specified value. This is intended to be used when Elasticsearch is running on a machine where some other software that consumes a non-negligible amount of memory is running. For example, a user could tell Elasticsearch to assume it was running on a machine with 3GB of RAM when actually it was running on a machine with 4GB of RAM. The system property is `es.total_memory_bytes`, so, for example, could be specified using `-Des.total_memory_bytes=3221225472`. (It is specified in bytes rather than using a unit, because it needs to be parsed by startup code that does not have access to the utility classes that interpret byte size units.)
... where is this formula actually documented? https://www.elastic.co/guide/en/elasticsearch/reference/7.15/important-settings.html#heap-size-settings gives zero descriptive info on how the sizes are actually determined. Perhaps at the very least this PR could be linked to so those who are even a little bit curious about the determination formula don't have to go digging for it? |
The actual formulas are considered an implementation detail, so they are expressly not documented since we don't want folks to rely on these things as they are likely to change in the future. If you're curious as to the details the formulas are defined here. |
That's fair I guess. IMO slightly misconstruing the term "implementation detail." Documenting that, e.g. "data nodes get 50% of total system memory when greater than 1 gigabyte up to a maximum of 31 gigabytes", maybe in a table format, is not an implementation detail. It is something that ES admins would want to know when deciding whether they should actually explicitly specify heap ops or let them be auto-determined. Exposing this back-of-the-envelope calculation side of the equation in documentation is not an implementation detail. The implementation detail in this case is how that determination is made such as how specifically machine resources are fetched/read. But if the human-readable definitions of what will be auto-determined should "not be relied upon", then I guess it does come close to meeting the definition of implementation detail. |
The motivation with this change is that admins shouldn't need to do this, they should defer to the auto-determination, which may change across releases. Realistically, the deciding factor on whether heap should be explicitly specified is if you encounter problematic performance with the automatic settings.
This is an accurate description of the intent here. Essentially heap size is either "whatever ES decides" (will obviously always be less than available system memory) or "whatever you explicitly set it to", choose one. |
Hi @mark-vieira 👋 greetings from Germany. I was wondering why you chose to assign same heap memory ratio to |
Thanks for the comment @ersinbyrktr. I've opened #88395 to have our performance team investigate improvements here. |
This commit expands our JVM egonomics to also automatically determine
appropriate heap size based on the total available system memory as well
as the roles assigned to the node. Role determination is done via a
naive parsing of elasticsearch.yml. No settings validation is done and
only the 'node.roles' setting is taken into consideration.
For heap purposes a node falls into one of four (4) categories:
set of roles other than only master or only ml is considered a 'data'
node, to include things like coordinating-only or "tie-breaker" nodes.
scenario we fallback to the old default heap options of 1GB.
In all cases we short-circuit if a user provides explicit heap options
so we only ever auto-determine heap if no existing heap options exist.
Starting with this commit the default heap settings (1GB) are now
removed from the default jvm.options which means we'll start auto-
setting heap as the new default.