Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zip filesystems consume a lot of memory on bootstrap #35280

Open
gsmet opened this issue Aug 9, 2023 · 26 comments
Open

Zip filesystems consume a lot of memory on bootstrap #35280

gsmet opened this issue Aug 9, 2023 · 26 comments

Comments

@gsmet
Copy link
Member

gsmet commented Aug 9, 2023

In dev mode, when bootstrapping the application, the Zip filesystems opened for the jar archives during bootstrap are consuming a lot of memory because they are keeping a byte[] header around.

I'm not completely sure at which stage of the execution I'm at and if it's normal to have all these file systems open. But in any case, the fact that we are keeping a large byte[] around is problematic.

For reference:

    // Reads zip file central directory. Returns the file position of first
    // CEN header, otherwise returns -1 if an error occurred. If zip->msg != NULL
    // then the error was a zip format error and zip->msg has the error text.
    // Always pass in -1 for knownTotal; it's used for a recursive call.
    private byte[] initCEN() throws IOException {

And even more because we apparently keep several copies of them opened.

For the groovy jar, we end up with 2 instances of this byte array:

Screenshot from 2023-08-09 11-23-06
and
Screenshot from 2023-08-09 11-49-44

For the quarkus-bootstrap-gradle-resolver jar, it's even worse, it looks like we have 5 of them:

Screenshot from 2023-08-09 11-57-40
Screenshot from 2023-08-09 11-57-51
Screenshot from 2023-08-09 11-58-02
Screenshot from 2023-08-09 11-58-11
Screenshot from 2023-08-09 11-58-24

I can't upload the dump file but I can make it available if people are interested.

@gsmet
Copy link
Member Author

gsmet commented Aug 9, 2023

@geoand @aloubyansky @stuartwdouglas I thought this might be of interest to you

I'm working on another issue for which I will open something soon.

@geoand
Copy link
Contributor

geoand commented Aug 9, 2023

Nice investigation.

This is probably something we have never optimized, so I'm optimistic there are improvements to be had

@aloubyansky
Copy link
Member

Perhaps we could do a trick similar to the one in RunnerClassLoader, i.e. keeping a fixed number of those (depending on usage) open.

@geoand
Copy link
Contributor

geoand commented Aug 9, 2023

We might also try deduplicating jars

@aloubyansky
Copy link
Member

Right, we could try that first.

@gsmet
Copy link
Member Author

gsmet commented Aug 9, 2023

Also, I'm not completely sure if all these instances should be kept around and if we don't have some leak somewhere.

@gsmet
Copy link
Member Author

gsmet commented Aug 9, 2023

Btw, here is how I reproduce the issue and get the information (you might need to lower the memory limit once the other patch is in).

Apply:

diff --git a/integration-tests/maven/pom.xml b/integration-tests/maven/pom.xml
index b40f5ad482b..7b4e3b07514 100644
--- a/integration-tests/maven/pom.xml
+++ b/integration-tests/maven/pom.xml
@@ -97,6 +97,9 @@
             <plugin>
                 <groupId>org.apache.maven.plugins</groupId>
                 <artifactId>maven-surefire-plugin</artifactId>
+                <configuration>
+                    <skip>true</skip>
+                </configuration>
             </plugin>
 
             <plugin>
diff --git a/test-framework/maven/src/main/java/io/quarkus/maven/it/RunAndCheckMojoTestBase.java b/test-framework/maven/src/main/java/io/quarkus/maven/it/RunAndCheckMojoTestBase.java
index 80bf2c3118d..943d6632aaf 100644
--- a/test-framework/maven/src/main/java/io/quarkus/maven/it/RunAndCheckMojoTestBase.java
+++ b/test-framework/maven/src/main/java/io/quarkus/maven/it/RunAndCheckMojoTestBase.java
@@ -101,7 +101,7 @@ protected void run(boolean performCompile, LaunchMode mode, boolean skipAnalytic
         //running at once, if they add default to 75% of total mem we can easily run out
         //of physical memory as they will consume way more than what they need instead of
         //just running GC
-        args.add("-Djvm.args=-Xmx192m");
+        args.add("-Djvm.args='-Xmx128m -XX:+HeapDumpOnOutOfMemoryError'");
         running.execute(args, Map.of());
     }
 

Then run: mvn clean install -Dit.test=DevMojoIT#testThatTheApplicationIsReloadedMultiModule. You should get an OOM. Look for the generated .hprof file.

@aloubyansky
Copy link
Member

Just launching dev mode on a simple project, see the ArchivePathTree's being opened only once for each JAR.
When I enable continuous testing I see some JARs being opened twice, since it's a new app with its own hierarchy of classloaders.
In a simple example I don't see JARs being opened more than twice at least.

@aloubyansky
Copy link
Member

JARs could be opened outside of our application classloaders though. I've just checked it from the QuarkusClassLoader perspective.

@gsmet
Copy link
Member Author

gsmet commented Aug 10, 2023

@aloubyansky it's not always an ArchivePathTree. In the first example, the second one is not an ArchivePathTree.

In the second example, I'm not exactly sure what's happening. It looks as if there was several CL involved. Plus the same pattern as for the first one.
From the list, it seems to be a common pattern among the jars.

Note that it's running a test that uses live reload, so the app has been restarted a few times.

@aloubyansky
Copy link
Member

Oh, I see we end up creating more than one instance of ArchivePathTree's for the some JARs and then we open them once or twice.

@aloubyansky
Copy link
Member

Here is quick attempt to reduce the number of JARs being opened by our classloaders, still limited to ArchivePathTrees but the test already passes.

@aloubyansky
Copy link
Member

@Postremus
Copy link
Member

@aloubyansky
I also have an old branch of quarkus-fs-util implementing caching:
Postremus/quarkus-fs-util@1f712b8

This one requires though that the zfs is opened using the ZipUtils helper class.

@aloubyansky
Copy link
Member

Thanks @Postremus, that's good to know. We may want to have more control over which JARs should remain opened and cached though.

@stuartwdouglas
Copy link
Member

It seems like the compilation support is also holding onto these: stuartwdouglas@175665d

This is made worse now that we have one each for continuous testing and normal live reload (as the class paths are different).

I'm not sure what we can do here, its basically a trade off between hot reload time and memory use.

@aloubyansky
Copy link
Member

Interesting, thanks @stuartwdouglas, I'll give your changes a try too.
I haven't checked the effects of my changes on the boot time, but it reduces the number of ZIP FS node instances by 20% and the overall heap size by ~8% for a simple REST app. I'll refactor it a bit before opening a PR though.

@stuartwdouglas
Copy link
Member

I don't have any changes, the code I linked was the original PR that added this caching, it claimed a 100ms decrease in hot reload times due to caching.

@aloubyansky
Copy link
Member

Ah, I misunderstood then.

@geoand
Copy link
Contributor

geoand commented Aug 14, 2023 via email

@aloubyansky
Copy link
Member

JARs of all the non-reloadable dependencies? That could work theoretically. We'd also need to unpack the JARs applying all the classloading filtering config, since all the classpath elements will merge into one. It'll most probably slow down the initial boot and will be quite a change in the codebase though.

@geoand
Copy link
Contributor

geoand commented Aug 14, 2023

Yeah, I'm thinking of the non reloadable stuff and maybe even only a certain set of them we know are used very frequently and lead to enlarged memory consumption

@gsmet
Copy link
Member Author

gsmet commented Aug 14, 2023

I don’t think it would be a good trade off to slow down dev mode to save some memory. Dev mode needs to be fast.
I think avoiding opening several times the same zip would be beneficial for both so Alexey’s experiment is probably worth it.

@geoand
Copy link
Contributor

geoand commented Aug 14, 2023 via email

@aloubyansky
Copy link
Member

With the PR I opened the minimum Xmx for the test dropped down to 95m from 115m. It still remains a pretty simple strategy, which is pretty much "keep an open ZIP FS as a long there is at least one consumer holding on to it and close it when the last consumer is done with it".

@Sanne
Copy link
Member

Sanne commented Aug 18, 2023

With the PR I opened the minimum Xmx for the test dropped down to 95m from 115m. It still remains a pretty simple strategy, which is pretty much "keep an open ZIP FS as a long there is at least one consumer holding on to it and close it when the last consumer is done with it".

That's great! Very cool.

On a slight tangent - the size of those ZIP related buffers was the reason for which we did 266af76 , althought that only improved memory for the classloader used in production.

I still have it on my wishlist that we could improve the memory consumption and bootstrap times further if we were to reduce the number of jars that the bootstrap classloader needs to open (as it's not using such optimisations since it's the JVM boostrap classloader).

With this issue you're now making me think that such optimisation would also help memory consumption in dev-mode; perhaps someone hase time to do it? We'd only need to shade the jars from quarkus core to ensure there's only one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants