-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8293114: GC should trim the native heap #10085
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
@tstuefe The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
/label remove shenandoah |
@tstuefe |
@tstuefe |
@tstuefe Nice work! I also looked into memory usage recently, I found that Here are some numbers I captured
|
Hi Zhengyu,
Thanks :)
A small problem with "C-Heap Retained" is that it also contains blocks that are either trimmed or have never been (fully) paged in, meaning, that number may not relate 1:1 to RSS. Glibc observability is really annoying. About the "AtExit", in our SAP JVM I added a simple "DumpInfoAtExit", which prints out what "jcmd VM.info" would give you. Basically, a hs-err file without the err part. That contains a whole lot of information, including NMT and process memory information. That may be a better way than adding individual XXAtExit flags, what do you think? Another thing I would like to do is to enable Peak usage numbers in NMT for release builds too. That way one can see at VM exit how much memory each subsystem used.
Interesting, since it means malloc peaks are more common then we think. All the more reason to have a feature like this, to trim the C-heap. Cheers, Thomas |
/label hotspot-gc |
@tstuefe |
I am not sure, because it overlaps many other XXAtExit options?
|
Another question: does it have to create another thread? can it piggyback to e.g. |
It does so on Shenandoah, but the problem is the time trimming takes. As I wrote, it can take <1ms to up to almost a second. You then block the service thread for that time. So, even on Shenandoah I am not sure I shouldn't use a different thread. |
Yea, you probably should not use |
Down Bot. Down. |
Do you need help moving this forward, @tstuefe? |
Agreed, I think you should open PR. |
Okay, cool. The reason I am asking it that glibc "memory leaks" are not uncommon in production cases. There are quite a few libraries that churn native memory (looks at Netty), even with internal pooling. Having something that is backportable to 21u and 17u would be a plus. |
Thanks, nice to see a confirmation that this is useful. This patch started out simple, and then I fear I started to seriously over-engineer the GC part of it. I'll give it a look next week to see if I can dumb it down. |
My main concern with this change is increased latency. You wrote "..concurrent malloc/frees are usually not blocked while trimming if they are satisfied from the local arena..". Not sure what "usually" means here and how many mallocs are satisfied from a local arena. But introducing pauses up to a second seems significant for some applications. The other question is that I still don't understand if glibc-malloc will ever call |
The trim performed automatically on some free() is one done in the 'chunk' you were freeing in. I share your concern. |
@simonis @robehn Thanks for thinking this through.
From looking at the sources (glibc 2.31), I see
So, malloc_trim will incovenience concurrent reallocs, and rarely frees, or allocations that cause arena stealing or allocating new arenas. I may have missed some cases, but it makes sense that glibc attempts to avoid locking as much as possible. About the "up to a second" - this was measured on my machine with ~32GB of reclaimable memory. Having that much floating garbage in the C-heap would hopefully be rare.
From looking at the sources, the glibc trims on free:
As you can see, trim only happens sometimes. I did experiments with mallocing, then freeing, 64K 30000 times:
Unfortunately, most of C-Heap allocations are a lot finer grained than 64K. Update I see we also lock on the malloc path if we don't pull the chunk from the tcache... . |
So I see that
Does current glibc honor that argument at all? Can we use that to control the incrementality of the trim? Since |
This was one of the first things I tried last year. Does have very little impact. IIRC it only applies to the main arena (the one using sbrk) and limits the amount by which the break was lowered? I may remember the details wrong, but my tests also showed very little effect. The bulk of memory reclamation comes from the glibc MADV_DONTNEED'ing free chunks. I think it does that without any bookkeeping, so for subsequent trims, it has no idea of how much memory of that range was paged in. So I think there is no way to implement a limiting trim via the size parameter. I though about that and I think the only way to implement a limited trim would be to add a new API, since you cannot use the existing one without breaking compatibility. I always meant to ask Florian about this. I will tomorrow. In any case, this would only be a solution for future glibcs, not for the ones that are around. |
@robehn @zhengyu123 @shipilev @simonis
So, in very few words, this patch
I'll do some more benchmarks over the next days, but honestly don't expect to see this raising above background noise. If I have time, I also will simulate heavy C-Heap activity to give the trim something to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursory review follows.
Generally, does the issue synopsis reflect what is going on correctly? This is not about "GC should trim" anymore, as we have a perfectly separate thread for this. In fact, we very specifically do not trim during GC :)
Related question if we want to use gc, trim
tag for this, or just trim
.
src/hotspot/os/aix/os_aix.cpp
Outdated
@@ -2986,3 +2986,8 @@ bool os::supports_map_sync() { | |||
} | |||
|
|||
void os::print_memory_mappings(char* addr, size_t bytes, outputStream* st) {} | |||
|
|||
// stubbed-out trim-native support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these comments should be more succinct. Example: "Native heap trimming is not implemented yet." (this tells it can be implemented in future)
src/hotspot/os/linux/os_linux.cpp
Outdated
int fordblks; | ||
int keepcost; | ||
}; | ||
typedef struct glibc_mallinfo (*mallinfo_func_t)(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be os::Linux::glibc_mallinfo
for consistency?
src/hotspot/os/linux/os_linux.cpp
Outdated
out->arena = (int) mi.arena; | ||
out->ordblks = (int) mi.ordblks; | ||
out->smblks = (int) mi.smblks; | ||
out->hblks = (int) mi.hblks; | ||
out->hblkhd = (int) mi.hblkhd; | ||
out->usmblks = (int) mi.usmblks; | ||
out->fsmblks = (int) mi.fsmblks; | ||
out->uordblks = (int) mi.uordblks; | ||
out->fordblks = (int) mi.fordblks; | ||
out->keepcost = (int) mi.keepcost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: please indent it so that =
are in the same column?
src/hotspot/os/linux/os_linux.cpp
Outdated
#ifdef __GLIBC__ | ||
return true; | ||
#else | ||
return false; // musl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid comments like // musl
-- it might mislead if we ever go for e.g. uClibc
and friends?
// | ||
// The mode is set as argument to GCTrimNative::initialize(). | ||
|
||
class NativeTrimmer : public ConcurrentGCThread { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit ugly it pretends to be ConcurrentGCThread
. Can it be just NamedThread
?
|
||
class NativeTrimmerThread : public ConcurrentGCThread { | ||
|
||
Monitor* _lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
?
@@ -174,6 +174,7 @@ class outputStream; | |||
f(full_gc_heapdump_post, " Post Heap Dump") \ | |||
\ | |||
f(conc_uncommit, "Concurrent Uncommit") \ | |||
f(conc_trim, "Concurrent Trim") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a leftover.
STATIC_ASSERT(_num_pools == 4); | ||
return !_pools[0].empty() || !_pools[1].empty() || | ||
!_pools[2].empty() || !_pools[3].empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STATIC_ASSERT(_num_pools == 4); | |
return !_pools[0].empty() || !_pools[1].empty() || | |
!_pools[2].empty() || !_pools[3].empty(); | |
for (int i = 0; i < _num_pools; i++) { | |
if (!_pools[i].empty()) return false; | |
} | |
return true; |
static void clean() { | ||
for (int i = 0; i < _num_pools; i++) { | ||
_pools[i].prune(); | ||
ThreadCritical tc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need ThreadCritical
here? I would have thought PauseMark
handles everything right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to pause. I introduced an empty check on all pools, but that has to happen under lock protection too. So I moved up the 4 ThreadCriticals from the prune functions to this function.
The point is to avoid pausing if nothing is done, which is most of the time. Also, instead of 4 calls to ThreadCritical, just one.
// Execute the native trim, log results. | ||
void execute_trim_and_log() const { | ||
assert(os::can_trim_native_heap(), "Unexpected"); | ||
const int64_t tnow = now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line looks redundant.
Maybe it would be cleaner to close this PR and open a new one, now that feature took another turn. |
Closing this PR in favour of a new, slightly renamed one, as Aleksey suggested |
any chance this feature can backport to jdk11 and 17? |
This feature (different PR, see #14781) has been backported to jdk 17 already. 11, its possible, but not a priority and needs to be negotiated with the maintainers. |
(Updated 2023-07-05 to reflect the current state of the patch)
This RFE adds the option to auto-trim the Glibc heap as part of the GC cycle. If the VM process suffered high temporary malloc spikes (regardless of whether from JVM- or user code), this could recover significant amounts of memory.
We discussed this a year ago [1], but the item got pushed to the bottom of my work pile, therefore, it took longer than I thought.
Motivation
The Glibc is reluctant to return memory to the OS, more so than other allocators. Temporary malloc spikes often carry over as permanent RSS increase. Note that C-heap retention is difficult to observe. Since it is freed memory, it won't appear in NMT; it is just a part of RSS.
This is, effectively, caching, and a performance tradeoff by the glibc. It makes a lot of sense with applications that cause high traffic on the C-heap (the typical native application). The JVM, however, clusters allocations and for a lot of use cases rolls its own memory management via mmap. And app's malloc load can fluctuate wildly, with temporary spikes and long idle periods.
To help, Glibc exports an API to trim the C-heap:
malloc_trim(3)
. With JDK 18 [2], SAP contributed a new jcmd command to manually trim the C-heap on Linux. This RFE adds a complementary way to trim automatically.Is this even a problem?
Yes.
The JVM clusters most native memory allocations and satisfies them with mmap. But there are enough C-heap allocations left to cause malloc spikes that are subject of memory retention. Note that one example are hotspot arenas themselves.
But many cases of high memory retention in Glibc I have seen in third-party JNI code. Libraries allocate large buffers via malloc as temporary buffers. In fact, since we introduced the jcmd "System.trim_native_heap", some of our customers started to call this command periodically in scripts to counter these issues.
How trimming works
Trimming is done via
malloc_trim(2)
.malloc_trim
will iterate over all arenas and trim each one subsequently. While doing that, it will lock the arena, which may cause some (but not all) subsequent actions on the same arenas to block. glibc also trims automatically on free, but that is very limited (see #10085 (comment) for details).malloc_trim
offers almost no way to control its behavior; in particular, no way to limit its runtime. Its runtime will depend on the size of the reclaimed memory. Not reclaiming anything is very fast (sub-ms). Reclaiming very large memory sections (many GB) may take considerably longer.When to trim?
We cannot use the ServiceThread to trim, since the runtime the trim takes is unknown. Therefore we need to do this fully concurrently in an own thread.
We trim in regular intervals, but pause the trimming during "important" phases: STW GC phases, or when doing bulk heap operations. Note that "pausing" here means we delay the start of the next trim to after the pause. If a trim is already running, there is no way to stop it.
How it works:
Patch adds new options (experimental for now, and shared among all GCs):
GCTrimNativeHeap
is off by default. If enabled, it will cause the VM to trim the native heap on full GCs as well as periodically. The period is defined byGCTrimNativeHeapInterval
.Examples:
This is an artificial test that causes two high malloc spikes with long idle periods. Observe how RSS recovers with trim but stays up without trim. The trim interval was set to 15 seconds for the test, and no GC was invoked here; this is periodic trimming.
(See here for parameters: run script )
Spring pet clinic boots up, then idles. Once with, once without trim, with the trim interval at 60 seconds default. Of course, if it were actually doing something instead of idling, trim effects would be smaller. But the point of trimming is to recover memory in idle periods.
(See here for parameters: run script )
Tests
Tested older Glibc (2.31), and newer Glibc (2.35) (
mallinfo()
vsmallinfo2()
), on Linux x64.The rest of the tests will be done by GHA and in our SAP nightlies.
Remarks
How about other allocators?
I have seen this retention problem mainly with the Glibc and the AIX libc. Muslc returns memory more eagerly to the OS. I also tested with jemalloc and found it also reclaims more aggressively, therefore I don't think MacOS or BSD are affected that much by retention either.
Trim costs?
Trim-native is a tradeoff between memory and performance. We pay
glibc.malloc.trim_threshold?
glibc has a tunable that looks like it could influence the willingness of Glibc to return memory to the OS, the "trim_threshold". In practice, I could not get it to do anything useful.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/10085/head:pull/10085
$ git checkout pull/10085
Update a local copy of the PR:
$ git checkout pull/10085
$ git pull https://git.openjdk.org/jdk.git pull/10085/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 10085
View PR using the GUI difftool:
$ git pr show -t 10085
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10085.diff
Webrev
Link to Webrev Comment