JDK-8293114: GC should trim the native heap #10085

tstuefe · 2022-08-30T14:35:26Z

(Updated 2023-07-05 to reflect the current state of the patch)

This RFE adds the option to auto-trim the Glibc heap as part of the GC cycle. If the VM process suffered high temporary malloc spikes (regardless of whether from JVM- or user code), this could recover significant amounts of memory.

We discussed this a year ago [1], but the item got pushed to the bottom of my work pile, therefore, it took longer than I thought.

Motivation

The Glibc is reluctant to return memory to the OS, more so than other allocators. Temporary malloc spikes often carry over as permanent RSS increase. Note that C-heap retention is difficult to observe. Since it is freed memory, it won't appear in NMT; it is just a part of RSS.

This is, effectively, caching, and a performance tradeoff by the glibc. It makes a lot of sense with applications that cause high traffic on the C-heap (the typical native application). The JVM, however, clusters allocations and for a lot of use cases rolls its own memory management via mmap. And app's malloc load can fluctuate wildly, with temporary spikes and long idle periods.

To help, Glibc exports an API to trim the C-heap: malloc_trim(3). With JDK 18 [2], SAP contributed a new jcmd command to manually trim the C-heap on Linux. This RFE adds a complementary way to trim automatically.

Is this even a problem?

Yes.

The JVM clusters most native memory allocations and satisfies them with mmap. But there are enough C-heap allocations left to cause malloc spikes that are subject of memory retention. Note that one example are hotspot arenas themselves.

But many cases of high memory retention in Glibc I have seen in third-party JNI code. Libraries allocate large buffers via malloc as temporary buffers. In fact, since we introduced the jcmd "System.trim_native_heap", some of our customers started to call this command periodically in scripts to counter these issues.

How trimming works

Trimming is done via malloc_trim(2). malloc_trim will iterate over all arenas and trim each one subsequently. While doing that, it will lock the arena, which may cause some (but not all) subsequent actions on the same arenas to block. glibc also trims automatically on free, but that is very limited (see #10085 (comment) for details).

malloc_trim offers almost no way to control its behavior; in particular, no way to limit its runtime. Its runtime will depend on the size of the reclaimed memory. Not reclaiming anything is very fast (sub-ms). Reclaiming very large memory sections (many GB) may take considerably longer.

When to trim?

We cannot use the ServiceThread to trim, since the runtime the trim takes is unknown. Therefore we need to do this fully concurrently in an own thread.

We trim in regular intervals, but pause the trimming during "important" phases: STW GC phases, or when doing bulk heap operations. Note that "pausing" here means we delay the start of the next trim to after the pause. If a trim is already running, there is no way to stop it.

How it works:

Patch adds new options (experimental for now, and shared among all GCs):

-XX:+GCTrimNativeHeap
-XX:GCTrimNativeHeapInterval=<seconds> (defaults to 60)

GCTrimNativeHeap is off by default. If enabled, it will cause the VM to trim the native heap on full GCs as well as periodically. The period is defined by GCTrimNativeHeapInterval.

Examples:

This is an artificial test that causes two high malloc spikes with long idle periods. Observe how RSS recovers with trim but stays up without trim. The trim interval was set to 15 seconds for the test, and no GC was invoked here; this is periodic trimming.

(See here for parameters: run script )

Spring pet clinic boots up, then idles. Once with, once without trim, with the trim interval at 60 seconds default. Of course, if it were actually doing something instead of idling, trim effects would be smaller. But the point of trimming is to recover memory in idle periods.

(See here for parameters: run script )

Tests

Tested older Glibc (2.31), and newer Glibc (2.35) (mallinfo() vs mallinfo2()), on Linux x64.

The rest of the tests will be done by GHA and in our SAP nightlies.

Remarks

How about other allocators?

I have seen this retention problem mainly with the Glibc and the AIX libc. Muslc returns memory more eagerly to the OS. I also tested with jemalloc and found it also reclaims more aggressively, therefore I don't think MacOS or BSD are affected that much by retention either.

Trim costs?

Trim-native is a tradeoff between memory and performance. We pay

The cost to do the trim depends on how much is trimmed. Time ranges on my machine between < 1ms for no-op trims, to ~800ms for 32GB trims.
The cost for re-acquiring the memory, should the memory be needed again, is the second cost factor.

glibc.malloc.trim_threshold?

glibc has a tunable that looks like it could influence the willingness of Glibc to return memory to the OS, the "trim_threshold". In practice, I could not get it to do anything useful.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8293114: GC should trim the native heap (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/10085/head:pull/10085
$ git checkout pull/10085

Update a local copy of the PR:
$ git checkout pull/10085
$ git pull https://git.openjdk.org/jdk.git pull/10085/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10085

View PR using the GUI difftool:
$ git pr show -t 10085

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10085.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2022-08-30T14:37:25Z

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-08-30T14:39:07Z

@tstuefe The following labels will be automatically applied to this pull request:

hotspot
serviceability
shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

tstuefe · 2022-08-31T08:24:48Z

/label remove shenandoah
/label remove serviceability

openjdk · 2022-08-31T08:26:40Z

@tstuefe
The shenandoah label was successfully removed.

openjdk · 2022-08-31T08:28:02Z

@tstuefe
The serviceability label was successfully removed.

mlbridge · 2022-08-31T08:31:00Z

Webrevs

zhengyu123 · 2022-08-31T14:26:34Z

@tstuefe Nice work!

I also looked into memory usage recently, I found that os::print_os_info() is very usefully. I would like to purpose to add a diagnostics flag, e.g. PrintProcessInfoAtExit to capture some insights of the process, such as RSS, Peak RSS, C-Heap and C-Heap Retained.

Here are some numbers I captured

	C-Heap Outstanding (K)	C-Heap Retained (K)
Compiler	106387	271692
Sunflow	103882	173929
Compress	37794	11605
CrytioAes	37579	30580
CryptoRsa	38318	73001
CryptoSignVerify	37790	38769
Derby	55389	186954
MpegAudio	40280	71743
ScimarkFFT.large	36698	8945
ScimarkFFT.small	37071	10824
ScimarkLU.large	38018	8529
ScimarkLU.small	39198	8585
ScimarkMonteCarlo	36946	6529
ScimarkSOR.large	36648	8787
ScimarkSOR.small	36569	9078
ScimarkSparse.large	36745	9182
ScimarkSparse.small	37253	8574
Serial.	38652	127231
Sunflow.test.	45450	56657
XmlTransform.	83116	289431
XmlValidation	59883	59883

I captured the number at JVM exit, so it might not reflect runtime characters. But some number is quite interesting, e.g. `Derby` retained set is about x3 of outstanding set.

tstuefe · 2022-09-01T05:25:27Z

Hi Zhengyu,

@tstuefe Nice work!

Thanks :)

I also looked into memory usage recently, I found that os::print_os_info() is very usefully. I would like to purpose to add a diagnostics flag, e.g. PrintProcessInfoAtExit to capture some insights of the process, such as RSS, Peak RSS, C-Heap and C-Heap Retained.

A small problem with "C-Heap Retained" is that it also contains blocks that are either trimmed or have never been (fully) paged in, meaning, that number may not relate 1:1 to RSS. Glibc observability is really annoying.

About the "AtExit", in our SAP JVM I added a simple "DumpInfoAtExit", which prints out what "jcmd VM.info" would give you. Basically, a hs-err file without the err part. That contains a whole lot of information, including NMT and process memory information. That may be a better way than adding individual XXAtExit flags, what do you think?

Another thing I would like to do is to enable Peak usage numbers in NMT for release builds too. That way one can see at VM exit how much memory each subsystem used.

Here are some numbers I captured

C-Heap Outstanding (K) C-Heap Retained (K)
Compiler 106387 271692
Sunflow 103882 173929
Compress 37794 11605
CrytioAes 37579 30580
CryptoRsa 38318 73001
CryptoSignVerify 37790 38769
Derby 55389 186954
MpegAudio 40280 71743
ScimarkFFT.large 36698 8945
ScimarkFFT.small 37071 10824
ScimarkLU.large 38018 8529
ScimarkLU.small 39198 8585
ScimarkMonteCarlo 36946 6529
ScimarkSOR.large 36648 8787
ScimarkSOR.small 36569 9078
ScimarkSparse.large 36745 9182
ScimarkSparse.small 37253 8574
Serial. 38652 127231
Sunflow.test. 45450 56657
XmlTransform. 83116 289431
XmlValidation 59883 59883

I captured the number at JVM exit, so it might not reflect runtime characters.
But some number is quite interesting, e.g. Derby retained set is about x3 of outstanding set.

Interesting, since it means malloc peaks are more common then we think. All the more reason to have a feature like this, to trim the C-heap.

Cheers, Thomas

tstuefe · 2022-09-01T05:27:34Z

/label hotspot-gc

openjdk · 2022-09-01T05:28:32Z

@tstuefe
The hotspot-gc label was successfully added.

zhengyu123 · 2022-09-02T20:02:31Z

Hi Zhengyu,

@tstuefe Nice work!

Thanks :)

I also looked into memory usage recently, I found that os::print_os_info() is very usefully. I would like to purpose to add a diagnostics flag, e.g. PrintProcessInfoAtExit to capture some insights of the process, such as RSS, Peak RSS, C-Heap and C-Heap Retained.

A small problem with "C-Heap Retained" is that it also contains blocks that are either trimmed or have never been (fully) paged in, meaning, that number may not relate 1:1 to RSS. Glibc observability is really annoying.

About the "AtExit", in our SAP JVM I added a simple "DumpInfoAtExit", which prints out what "jcmd VM.info" would give you. Basically, a hs-err file without the err part. That contains a whole lot of information, including NMT and process memory information. That may be a better way than adding individual XXAtExit flags, what do you think?

I am not sure, because it overlaps many other XXAtExit options?

Another thing I would like to do is to enable Peak usage numbers in NMT for release builds too. That way one can see at VM exit how much memory each subsystem used.

Here are some numbers I captured
C-Heap Outstanding (K) C-Heap Retained (K)
Compiler 106387 271692
Sunflow 103882 173929
Compress 37794 11605
CrytioAes 37579 30580
CryptoRsa 38318 73001
CryptoSignVerify 37790 38769
Derby 55389 186954
MpegAudio 40280 71743
ScimarkFFT.large 36698 8945
ScimarkFFT.small 37071 10824
ScimarkLU.large 38018 8529
ScimarkLU.small 39198 8585
ScimarkMonteCarlo 36946 6529
ScimarkSOR.large 36648 8787
ScimarkSOR.small 36569 9078
ScimarkSparse.large 36745 9182
ScimarkSparse.small 37253 8574
Serial. 38652 127231
Sunflow.test. 45450 56657
XmlTransform. 83116 289431
XmlValidation 59883 59883

I captured the number at JVM exit, so it might not reflect runtime characters.
But some number is quite interesting, e.g. Derby retained set is about x3 of outstanding set.

Interesting, since it means malloc peaks are more common then we think. All the more reason to have a feature like this, to trim the C-heap.

Cheers, Thomas

zhengyu123 · 2022-09-02T20:04:11Z

Another question: does it have to create another thread? can it piggyback to e.g. ServiceThread?

tstuefe · 2022-09-03T04:18:23Z

Another question: does it have to create another thread? can it piggyback to e.g. ServiceThread?

It does so on Shenandoah, but the problem is the time trimming takes. As I wrote, it can take <1ms to up to almost a second. You then block the service thread for that time. So, even on Shenandoah I am not sure I shouldn't use a different thread.

zhengyu123 · 2022-09-03T23:22:16Z

Another question: does it have to create another thread? can it piggyback to e.g. ServiceThread?

It does so on Shenandoah, but the problem is the time trimming takes. As I wrote, it can take <1ms to up to almost a second. You then block the service thread for that time. So, even on Shenandoah I am not sure I shouldn't use a different thread.

Yea, you probably should not use ShenandoahControlThread, it may delay GC from happening.

tstuefe · 2023-06-23T20:41:26Z

Down Bot. Down.

shipilev · 2023-06-30T10:23:29Z

Do you need help moving this forward, @tstuefe?

robehn · 2023-06-30T11:13:46Z

Do you need help moving this forward, @tstuefe?

Agreed, I think you should open PR.

tstuefe · 2023-06-30T12:27:16Z

Thanks @robehn and @shipilev !

This keeps getting pushed down the pile. I'll dust this up next week and post a PR.

shipilev · 2023-06-30T13:12:35Z

Okay, cool. The reason I am asking it that glibc "memory leaks" are not uncommon in production cases. There are quite a few libraries that churn native memory (looks at Netty), even with internal pooling. Having something that is backportable to 21u and 17u would be a plus.

tstuefe · 2023-06-30T15:48:42Z

Okay, cool. The reason I am asking it that glibc "memory leaks" are not uncommon in production cases. There are quite a few libraries that churn native memory (looks at Netty), even with internal pooling. Having something that is backportable to 21u and 17u would be a plus.

Thanks, nice to see a confirmation that this is useful.

This patch started out simple, and then I fear I started to seriously over-engineer the GC part of it. I'll give it a look next week to see if I can dumb it down.

simonis · 2023-07-03T10:25:49Z

My main concern with this change is increased latency. You wrote "..concurrent malloc/frees are usually not blocked while trimming if they are satisfied from the local arena..". Not sure what "usually" means here and how many mallocs are satisfied from a local arena. But introducing pauses up to a second seems significant for some applications.

The other question is that I still don't understand if glibc-malloc will ever call malloc_trim() automatically (and in that case introduce the latency anyway). The manpage says that malloc_trim() "..is automatically called by free(3) in certain circumstances; see the discussion of M_TOP_PAD and M_TRIM_THRESHOLD in mallopt(3).." but you reported that you couldn't observe any cleanup effect when playing around with M_TRIM_THRESHOLD. In the end, calling malloc_trim() periodically might even help to decrease latency if this prevents seldom, but longer automatic invocations of malloc_trim() by glibc itself.

robehn · 2023-07-03T17:39:59Z

My main concern with this change is increased latency. You wrote "..concurrent malloc/frees are usually not blocked while trimming if they are satisfied from the local arena..". Not sure what "usually" means here and how many mallocs are satisfied from a local arena. But introducing pauses up to a second seems significant for some applications.

The other question is that I still don't understand if glibc-malloc will ever call malloc_trim() automatically (and in that case introduce the latency anyway). The manpage says that malloc_trim() "..is automatically called by free(3) in certain circumstances; see the discussion of M_TOP_PAD and M_TRIM_THRESHOLD in mallopt(3).." but you reported that you couldn't observe any cleanup effect when playing around with M_TRIM_THRESHOLD. In the end, calling malloc_trim() periodically might even help to decrease latency if this prevents seldom, but longer automatic invocations of malloc_trim() by glibc itself.

The trim performed automatically on some free() is one done in the 'chunk' you were freeing in.
While the explicit call visits all 'chunks'. @jdksjolen can explain this more deeply.

I share your concern.
But as this is a opt-in and the benefits for a certain set of workloads overwhelms the risk of latency increases I'm for this change.

tstuefe · 2023-07-04T07:07:24Z

@simonis @robehn Thanks for thinking this through.

My main concern with this change is increased latency. You wrote "..concurrent malloc/frees are usually not blocked while trimming if they are satisfied from the local arena..". Not sure what "usually" means here and how many mallocs are satisfied from a local arena. But introducing pauses up to a second seems significant for some applications.

From looking at the sources (glibc 2.31), I see malloc_trim iterates over all arenas and locks each arena while trimming it. I also see this lock getting locked when:

creating, re-assigning arenas
on statistics (malloc_stats, mallinfo2, malloc_info)
on arena_get_retry() which AFAICS seems to be "stealing" from a neighboring arena if my arena is full
on the free path, when adding chunks to the fast bin, but guarded by __builtin_expect(.., 0), so probably a very rare path
on realloc'ing non-mmapped chunks.

So, malloc_trim will incovenience concurrent reallocs, and rarely frees, or allocations that cause arena stealing or allocating new arenas. I may have missed some cases, but it makes sense that glibc attempts to avoid locking as much as possible.

About the "up to a second" - this was measured on my machine with ~32GB of reclaimable memory. Having that much floating garbage in the C-heap would hopefully be rare.

The other question is that I still don't understand if glibc-malloc will ever call malloc_trim() automatically (and in that case introduce the latency anyway). The manpage says that malloc_trim() "..is automatically called by free(3) in certain circumstances; see the discussion of M_TOP_PAD and M_TRIM_THRESHOLD in mallopt(3).." but you reported that you couldn't observe any cleanup effect when playing around with M_TRIM_THRESHOLD. In the end, calling malloc_trim() periodically might even help to decrease latency if this prevents seldom, but longer automatic invocations of malloc_trim() by glibc itself.

From looking at the sources, the glibc trims on free:

the returned chunk may go into the thread local cache (tcache) or into the fastbin. In both cases, the chunk still counts as used. Nothing happens.
Otherwise, the returned chunk gets merged with its immediate neighbors (once, so not recursively). If the resulting size is larger than 64K, glibc calls trim, but only for the arena the chunk is contained in.

As you can see, trim only happens sometimes. I did experiments with mallocing, then freeing, 64K 30000 times:

done from 10 threads will leave me with a remainder of 300-500MB unreclaimed RSS after all is freed.
done from 1 thread, or setting MALLOC_ARENA_MAX=1, ends up reclaiming most of the memory.

Unfortunately, most of C-Heap allocations are a lot finer grained than 64K.

Update

I see we also lock on the malloc path if we don't pull the chunk from the tcache... .

shipilev · 2023-07-05T15:24:20Z

So I see that malloc_trim actually has the parameter pad:

       The pad argument specifies the amount of free space to leave
       untrimmed at the top of the heap.  If this argument is 0, only
       the minimum amount of memory is maintained at the top of the heap
       (i.e., one page or less).  A nonzero argument can be used to
       maintain some trailing space at the top of the heap in order to
       allow future allocations to be made without having to extend the
       heap with [sbrk(2)](https://man7.org/linux/man-pages/man2/sbrk.2.html).

Does current glibc honor that argument at all? Can we use that to control the incrementality of the trim? Since malloc_trim return value also describes if trimming was successful, we could probably come up with the adaptive scheme that guesses which pad to use, so that we don't trim the entirety of the memory, but still trim something.

tstuefe · 2023-07-05T17:21:34Z

So I see that malloc_trim actually has the parameter pad:
       The pad argument specifies the amount of free space to leave
       untrimmed at the top of the heap.  If this argument is 0, only
       the minimum amount of memory is maintained at the top of the heap
       (i.e., one page or less).  A nonzero argument can be used to
       maintain some trailing space at the top of the heap in order to
       allow future allocations to be made without having to extend the
       heap with [sbrk(2)](https://man7.org/linux/man-pages/man2/sbrk.2.html).
Does current glibc honor that argument at all? Can we use that to control the incrementality of the trim? Since malloc_trim return value also describes if trimming was successful, we could probably come up with the adaptive scheme that guesses which pad to use, so that we don't trim the entirety of the memory, but still trim something.

This was one of the first things I tried last year. Does have very little impact. IIRC it only applies to the main arena (the one using sbrk) and limits the amount by which the break was lowered? I may remember the details wrong, but my tests also showed very little effect.

The bulk of memory reclamation comes from the glibc MADV_DONTNEED'ing free chunks. I think it does that without any bookkeeping, so for subsequent trims, it has no idea of how much memory of that range was paged in. So I think there is no way to implement a limiting trim via the size parameter.

I though about that and I think the only way to implement a limited trim would be to add a new API, since you cannot use the existing one without breaking compatibility. I always meant to ask Florian about this. I will tomorrow.

In any case, this would only be a solution for future glibcs, not for the ones that are around.

tstuefe · 2023-07-05T17:36:59Z

@robehn @zhengyu123 @shipilev @simonis
Thank you all for your support and input. I dusted off the patch and simplified it:

removed the adaptive step-down logic, since that was overly involved and in my tests did not work that well
removed the expedite-trim logic.
Pauses now stack.

So, in very few words, this patch

adds an optional thread to execute trims at periodic intervals
can be temporarily paused.
I guarded sections that are vulnerable against concurrent work (GC STW phases) or that are doing build C-heap operations (e.g. monitor bulk deletion, stringtable cleanups, arena cleanups etc) with pauses.

I'll do some more benchmarks over the next days, but honestly don't expect to see this raising above background noise. If I have time, I also will simulate heavy C-Heap activity to give the trim something to do.

shipilev

Cursory review follows.

Generally, does the issue synopsis reflect what is going on correctly? This is not about "GC should trim" anymore, as we have a perfectly separate thread for this. In fact, we very specifically do not trim during GC :)

Related question if we want to use gc, trim tag for this, or just trim.

shipilev · 2022-09-26T13:47:29Z

src/hotspot/os/aix/os_aix.cpp

@@ -2986,3 +2986,8 @@ bool os::supports_map_sync() {
 }

 void os::print_memory_mappings(char* addr, size_t bytes, outputStream* st) {}
+
+// stubbed-out trim-native support


I think these comments should be more succinct. Example: "Native heap trimming is not implemented yet." (this tells it can be implemented in future)

shipilev · 2022-09-26T13:49:52Z

src/hotspot/os/linux/os_linux.cpp

+  int fordblks;
+  int keepcost;
+};
+typedef struct glibc_mallinfo (*mallinfo_func_t)(void);


Should be os::Linux::glibc_mallinfo for consistency?

shipilev · 2022-09-26T13:51:28Z

src/hotspot/os/linux/os_linux.cpp

+    out->arena = (int) mi.arena;
+    out->ordblks = (int) mi.ordblks;
+    out->smblks = (int) mi.smblks;
+    out->hblks = (int) mi.hblks;
+    out->hblkhd = (int) mi.hblkhd;
+    out->usmblks = (int) mi.usmblks;
+    out->fsmblks = (int) mi.fsmblks;
+    out->uordblks = (int) mi.uordblks;
+    out->fordblks = (int) mi.fordblks;
+    out->keepcost = (int) mi.keepcost;


Style: please indent it so that = are in the same column?

shipilev · 2022-09-26T13:53:15Z

src/hotspot/os/linux/os_linux.cpp

+#ifdef __GLIBC__
+  return true;
+#else
+  return false; // musl


Let's avoid comments like // musl -- it might mislead if we ever go for e.g. uClibc and friends?

shipilev · 2022-09-26T14:17:36Z

src/hotspot/share/gc/shared/gcTrimNativeHeap.cpp

+//
+// The mode is set as argument to GCTrimNative::initialize().
+
+class NativeTrimmer : public ConcurrentGCThread {


It is a bit ugly it pretends to be ConcurrentGCThread. Can it be just NamedThread?

shipilev · 2023-07-05T18:31:30Z

src/hotspot/share/gc/shared/trimNative.cpp

+
+class NativeTrimmerThread : public ConcurrentGCThread {
+
+  Monitor* _lock;


shipilev · 2023-07-05T18:34:50Z

src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp

@@ -174,6 +174,7 @@ class outputStream;
  f(full_gc_heapdump_post,                          "  Post Heap Dump")                \
                                                                                       \
  f(conc_uncommit,                                  "Concurrent Uncommit")             \
+  f(conc_trim,                                      "Concurrent Trim")                 \


Looks like a leftover.

shipilev · 2023-07-05T18:35:36Z

src/hotspot/share/memory/arena.cpp

+    STATIC_ASSERT(_num_pools == 4);
+    return !_pools[0].empty() || !_pools[1].empty() ||
+           !_pools[2].empty() || !_pools[3].empty();


Suggested change

STATIC_ASSERT(_num_pools == 4);

return !_pools[0].empty() || !_pools[1].empty() ||

!_pools[2].empty() || !_pools[3].empty();

for (int i = 0; i < _num_pools; i++) {

if (!_pools[i].empty()) return false;

}

return true;

shipilev · 2023-07-05T18:37:41Z

src/hotspot/share/memory/arena.cpp

  static void clean() {
-    for (int i = 0; i < _num_pools; i++) {
-      _pools[i].prune();
+    ThreadCritical tc;


Why do you need ThreadCritical here? I would have thought PauseMark handles everything right?

Unrelated to pause. I introduced an empty check on all pools, but that has to happen under lock protection too. So I moved up the 4 ThreadCriticals from the prune functions to this function.

The point is to avoid pausing if nothing is done, which is most of the time. Also, instead of 4 calls to ThreadCritical, just one.

shipilev · 2023-07-05T18:39:11Z

src/hotspot/share/gc/shared/trimNative.cpp

+  // Execute the native trim, log results.
+  void execute_trim_and_log() const {
+    assert(os::can_trim_native_heap(), "Unexpected");
+    const int64_t tnow = now();


This line looks redundant.

shipilev · 2023-07-05T18:44:39Z

Maybe it would be cleaner to close this PR and open a new one, now that feature took another turn.
I just realized my last review included comments I did not commit as review since last year! Oops.

tstuefe · 2023-07-06T05:33:07Z

Closing this PR in favour of a new, slightly renamed one, as Aleksey suggested

xtf2009 · 2024-01-09T03:33:39Z

any chance this feature can backport to jdk11 and 17?

tstuefe · 2024-01-09T07:27:54Z

any chance this feature can backport to jdk11 and 17?

This feature (different PR, see #14781) has been backported to jdk 17 already. 11, its possible, but not a priority and needs to be negotiated with the maintainers.

trim-native

03bd55e

tstuefe changed the title ~~trim-native~~ JDK-8293114: Let GC trim the native heap Aug 30, 2022

openjdk bot added serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Aug 30, 2022

tstuefe added 3 commits August 30, 2022 17:17

some simplifications

4b843df

Fixes and Simplifications

7c7a3e0

wip

864b3fb

tstuefe changed the title ~~JDK-8293114: Let GC trim the native heap~~ JDK-8293114: GC should trim the native heap Aug 31, 2022

tstuefe marked this pull request as ready for review August 31, 2022 08:23

openjdk bot added rfr Pull request is ready for review and removed shenandoah shenandoah-dev@openjdk.org labels Aug 31, 2022

openjdk bot removed the serviceability serviceability-dev@openjdk.org label Aug 31, 2022

openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Sep 1, 2022

tstuefe added 2 commits September 1, 2022 08:43

make tests more stable on slow hardware

cbf027a

reduce test runtime on slow hardware

35938c6

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jun 15, 2023

tstuefe added 6 commits July 4, 2023 09:34

Merge master

1e49fcf

Remove adaptive stepdown coding

aa9b2a7

wip

a451dc5

Merge branch 'master' into JDK-8293114-GC-trim-native

92b5a41

wip

a4837e1

Merge branch 'master' into JDK-8293114-GC-trim-native

eaf2457

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Jul 5, 2023

fix windows build

162b880

tstuefe marked this pull request as ready for review July 5, 2023 17:21

openjdk bot added the rfr Pull request is ready for review label Jul 5, 2023

shipilev reviewed Jul 5, 2023

View reviewed changes

tstuefe closed this Jul 6, 2023

tstuefe mentioned this pull request Jul 6, 2023

JDK-8293114: JVM should trim the native heap #14781

Closed

3 tasks


		class NativeTrimmerThread : public ConcurrentGCThread {

		Monitor* _lock;

JDK-8293114: GC should trim the native heap #10085

JDK-8293114: GC should trim the native heap #10085

Conversation

tstuefe commented Aug 30, 2022 • edited by openjdk bot Loading

Motivation

Is this even a problem?

How trimming works

When to trim?

How it works:

Examples:

Tests

Remarks

How about other allocators?

Trim costs?

glibc.malloc.trim_threshold?

Progress

Issue

Reviewing

Webrev

bridgekeeper bot commented Aug 30, 2022

openjdk bot commented Aug 30, 2022

tstuefe commented Aug 31, 2022

openjdk bot commented Aug 31, 2022

openjdk bot commented Aug 31, 2022

mlbridge bot commented Aug 31, 2022 • edited Loading

Webrevs

zhengyu123 commented Aug 31, 2022

tstuefe commented Sep 1, 2022

tstuefe commented Sep 1, 2022

openjdk bot commented Sep 1, 2022

zhengyu123 commented Sep 2, 2022

zhengyu123 commented Sep 2, 2022

tstuefe commented Sep 3, 2022

zhengyu123 commented Sep 3, 2022

tstuefe commented Jun 23, 2023

shipilev commented Jun 30, 2023

robehn commented Jun 30, 2023

tstuefe commented Jun 30, 2023

shipilev commented Jun 30, 2023

tstuefe commented Jun 30, 2023

simonis commented Jul 3, 2023

robehn commented Jul 3, 2023 • edited Loading

tstuefe commented Jul 4, 2023 • edited Loading

shipilev commented Jul 5, 2023

tstuefe commented Jul 5, 2023 • edited Loading

tstuefe commented Jul 5, 2023 • edited Loading

shipilev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shipilev commented Jul 5, 2023

tstuefe commented Jul 6, 2023

xtf2009 commented Jan 9, 2024 • edited by bridgekeeper bot Loading

tstuefe commented Jan 9, 2024

tstuefe commented Aug 30, 2022 •

edited by openjdk bot

Loading

mlbridge bot commented Aug 31, 2022 •

edited

Loading

robehn commented Jul 3, 2023 •

edited

Loading

tstuefe commented Jul 4, 2023 •

edited

Loading

tstuefe commented Jul 5, 2023 •

edited

Loading

tstuefe commented Jul 5, 2023 •

edited

Loading

xtf2009 commented Jan 9, 2024 •

edited by bridgekeeper bot

Loading