Classes loaded by Pods are leaked #268

stephenbrady · 2015-08-07T18:48:41Z

This repo produces a class leak with each invocation of test by boot watch.

https://github.com/stephenbrady/boot-pod-leak

In writing custom boot tasks, I see the same kind of class leak when using either boot.pod/make-pod with boot.pod/destroy-pod or boot.pod/pod-pool with :refresh.

The text was updated successfully, but these errors were encountered:

martinklepsch · 2015-10-22T19:54:41Z

Hey @stephenbrady thanks for the report!

How exactly did you detect this memory leak? Did you get java.lang.OutOfMemoryError: Java heap space when your heap filled up or did you get java.lang.OutOfMemoryError: PermGen space errors?

stephenbrady · 2015-10-22T20:55:20Z

Sure. By profiling in jvisualvm. I can see with each invocation of test that the number of classes in memory grows by roughly the number of new classes you would expect to see with each new loading of the clojure library. The biggest symptom is how long it takes to run each marginal invocation of test as the number of classes just grows and slows things down in the jvm.

This is with jdk8, so no perm gem memory issues.

martinklepsch · 2015-10-22T20:57:54Z

@stephenbrady Thanks, that confirms my understanding.

martinklepsch · 2015-10-23T08:14:58Z

@alandipert:

@martinklepsch, @micha re: mem core.async might be a culprit, https://github.com/projectodd/shimdandy#preventing-memory-leaks

I read that page but assumed that Boot does not use/depend on core.async and searching for classes in the YourKit classbrowser doesn't list any classes containing "async".

The program I use for testing also is very minimal and does not load any additional deps:

#!/usr/bin/env boot

(require '[boot.pod :as pod])

(defn -main [& args]
  (loop [i 0]
    (let [c-pod (pod/make-pod (get-env))]
      (boot.util/info "New pod taken: %s\n"  c-pod)
      (pod/with-eval-in c-pod
        (boot.util/info "Hi from pod no. %s.\n" ~i))
      (pod/destroy-pod c-pod)
      (recur (inc i)))))

Used with these $BOOT_JVM_OPTIONS it causes the program to quit after about 16 loops:

-Xmx100m -XX:MaxMetaspaceSize=200m -client -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xverify:none

tobias · 2015-10-23T13:58:05Z

I believe I have figured out what is causing this - I believe boot is
triggering a call to either clojure.lang.Var.cloneThreadBindingFrame
(which will happen if clojure.core/binding-conveyor-fn is called,
either via a call to clojure.core/future or one of the
clojure.core/send* fns), or clojure.lang.Var.getThreadBindings
(which will happen if clojure.core/get-thread-bindings is called,
which would most likely be via clojure.core/bound-fn). If either of
those Var methods are called, Var.dvals.get() is called, which
will set a ThreadLocal on the calling thread. In the test case,
we're doing all of the pod work on a single thread (the main thread),
so we end up creating those thread locals on that thread. If we had a
single copy of clojure.lang.Var loaded, this would be fine, since
there would only be one thread local, but we end up with one per pod,
which causes retention of the pod's class loader, preventing any
classes loaded in the pod from being unloaded.

Code that triggers either Var method call is perfectly fine clojure,
and we can't prevent it (nor do we want to). One fix for this would be
to wrap each shim in a thread, where all invokes against the shim
occur on that thread. Then we dispose of the thread once we are done
with the shim.

The following test case demonstrates that, and runs indefinitely for
me:

#!/usr/bin/env boot

(require '[boot.pod :as pod])

(defn -main [& args]
  (loop [i 0]
    (doto
        (Thread.
          #(let [c-pod (pod/make-pod (get-env))]
             (boot.util/info "New pod taken: %s\n"  c-pod)
             (pod/with-eval-in c-pod
               (boot.util/info "Hi from pod no. %s.\n" ~i))
             (pod/destroy-pod c-pod)))
      .start
      .join)
    (recur (inc i))))

The reason I haven't seen this with ShimDandy before is likely because
I never created many shims on the same thread.

I think I can fix this internally in ShimDandy, by adding a
thread/shim, with a queue passing actions to that thread for each
invoke. I don't want to add that context-switch overhead to all usages
of ShimDandy, but could add an option to newRuntime() that allows
you to use the thread-isolated implementation. I'll also need to add a
shutdown() method to the shim interface that stops the shim's
thread. I'd probably add a call to clojure.core/shutdown-agents
there as well, just so users don't have to do that themselves.

micha · 2015-10-23T14:10:56Z

@tobias that sounds great, thanks for the detailed explanation! In the destroy-pod function in boot we also call .close() on the shim classloader, I guess that's not really necessary?

tobias · 2015-10-23T14:19:56Z

@micha I'll see if I can get out a ShimDandy release today. It's probably a good idea to call .close(), since it closes any open files (according to the URLClassLoader javadoc).

tobias · 2015-10-23T18:01:56Z

#314 should fix it. I went with the simpler solution of ensuring Var.dvals was removed when we are done with an invoke call. That PR also uses the new .close() method on the shim, which invokes shutdown-agents for you.

1.2.0 removes an avenue for leaking memory, and provides a close method to call `shutdown-agents` on your behalf.

martinklepsch · 2015-10-25T10:22:37Z

Closed via #314

1.2.0 removes an avenue for leaking memory, and provides a close method to call `shutdown-agents` on your behalf.

This reverts commit dbadf7d.

stephenbrady · 2015-11-23T20:37:51Z

Thanks for the attention on this, and sorry I didn't follow up before the release. I tested this again with boot 2.4.2. The class leak has been (significantly) mitigated but not resolved. New classes are still being created at the rate of ~2,000 per watch invocation compared to ~10,000 before. The above minimal cases probably help isolate aspects of the leak, but they don't fully model the case that I reported which is of the variety of boot watch foo-task where foo-task creates and destroys a pod (whether via a pod pool or not).

Please see this repo again: https://github.com/stephenbrady/boot-pod-leak.

The difference in reproducing tests suggests that watch may directly or indirectly be involved.

martinklepsch · 2015-11-23T21:02:01Z

@stephenbrady unless you supply some JVM option like -XX:MaxMetaspaceSize=100m the number of classes loaded will grow until it hits the system's natural memory limits.

Probably good to document this on the JVM Options page.

martinklepsch · 2015-11-23T21:17:56Z

done: https://github.com/boot-clj/boot/wiki/JVM-Options#metaspace-grows-large-java-8

stephenbrady · 2015-11-23T22:29:16Z

Sorry, but that's not what I observe in the repo I published. All setting that JVM option does is trigger an OOM exception in Java8 after so many classes have been created and not released. I still see significant class leakage. GC is not releasing those classes when under pressure. Since the JVM chooses to OOM instead of pausing the JVM to GC those classes, it means there's a class leak.

martinklepsch · 2015-11-23T22:33:19Z

I can replicate the behaviour I described (i.e. classes are GCd) with your project (ref 9a1593a).

Also just to double check - is the output of boot -V correctly reporting v2.4.2?

EDIT No need to try the stuff below unless you want to test without having to run touch a bunch of times.

Can you try putting this into a file pod-loop and run chmod +x pod-loop; ./pod-loop?

#!/usr/bin/env boot

(require '[boot.pod :as pod])

(defn -main [& args]
  (println "Hello")
  (loop [i 0]
    (let [c-pod (pod/make-pod (get-env))]
      (boot.util/info "New pod taken: %s\n"  c-pod)
      (pod/with-eval-in c-pod
        (boot.util/info "Hi from pod no. %s.\n" ~i))
      (pod/destroy-pod c-pod)
      (recur (inc i)))))

stephenbrady · 2015-11-23T22:51:14Z

Hmm, an old version of boot was sneaking in as I was switching between repos. Sigh...sorry for the churn.

martinklepsch · 2015-11-23T22:53:00Z

No worries! 👍

tobias · 2015-11-23T22:53:09Z

@stephenbrady - I ran your test app, with a while loop in the shell that touched core.clj every half second, and it quickly reached a steady-state, and did not run out of memory (at least not yet):

This was with:

$ boot --version
#https://github.com/boot-clj/boot
#Mon Nov 23 17:51:23 EST 2015
BOOT_JVM_OPTIONS=-client -XX\:+TieredCompilation -XX\:TieredStopAtLevel\=1 -Xmx2g -XX\:+UseConcMarkSweepGC -XX\:+CMSClassUnloadingEnabled -Xverify\:none
BOOT_CLOJURE_NAME=org.clojure/clojure
BOOT_VERSION=2.4.2
BOOT_CLOJURE_VERSION=1.7.0
#App version: 2.4.2

$ java -version
openjdk version "1.8.0_65"
OpenJDK Runtime Environment (build 1.8.0_65-b17)
OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)

I'll let it run for a few hours and see if the behavior changes. Is there any thing else I could try to help recreate it?

tobias · 2015-11-23T22:53:35Z

Oops, I see y'all have figured it out. Ignore my last comment then :)

stephenbrady · 2015-11-23T22:57:21Z

Yep, I finally got the sawtooth graph too after I pinned down the boot configuration on my machine. Thanks again.

arichiardi · 2016-06-29T19:02:18Z

Hello folks, the problem is still there when it comes to using tasks that use pods (like boot-test) and don't properly cleanup executor threads and friends. In the linked PR Micha and I were thinking of adding destroy-hooks in order to allow custom code at destroy.

Does it sound like a good idea?

The problem for me happens when I run boot watch test in my app. I quickly reach OutOfMemoryError: Metaspace .

alandipert added the Bug label Aug 7, 2015

alandipert added this to the 2.3.0 milestone Aug 7, 2015

martinklepsch mentioned this issue Oct 7, 2015

Leaking memory on every compile martinklepsch/boot-garden#10

Closed

This was referenced Oct 23, 2015

shims can leak memory in some cases projectodd/shimdandy#4

Closed

Update to ShimDandy 1.2.0 #314

Closed

martinklepsch pushed a commit that referenced this issue Oct 24, 2015

Update to ShimDandy 1.2.0; Closes #314 & #268.

dbadf7d

1.2.0 removes an avenue for leaking memory, and provides a close method to call `shutdown-agents` on your behalf.

martinklepsch closed this as completed Oct 25, 2015

martinklepsch pushed a commit that referenced this issue Oct 26, 2015

Update to ShimDandy 1.2.0; Closes #314 & #268.

0756590

1.2.0 removes an avenue for leaking memory, and provides a close method to call `shutdown-agents` on your behalf.

micha added a commit that referenced this issue Oct 27, 2015

Revert "Update to ShimDandy 1.2.0; Closes #314 & #268."

bc68563

This reverts commit dbadf7d.

stephenbrady mentioned this issue Jun 26, 2016

Fix memory leak in pods #474

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classes loaded by Pods are leaked #268

Classes loaded by Pods are leaked #268

stephenbrady commented Aug 7, 2015

martinklepsch commented Oct 22, 2015

stephenbrady commented Oct 22, 2015

martinklepsch commented Oct 22, 2015

martinklepsch commented Oct 23, 2015

tobias commented Oct 23, 2015

micha commented Oct 23, 2015

tobias commented Oct 23, 2015

tobias commented Oct 23, 2015

martinklepsch commented Oct 25, 2015

stephenbrady commented Nov 23, 2015

martinklepsch commented Nov 23, 2015

martinklepsch commented Nov 23, 2015

stephenbrady commented Nov 23, 2015

martinklepsch commented Nov 23, 2015

stephenbrady commented Nov 23, 2015

martinklepsch commented Nov 23, 2015

tobias commented Nov 23, 2015

tobias commented Nov 23, 2015

stephenbrady commented Nov 23, 2015

arichiardi commented Jun 29, 2016

Classes loaded by Pods are leaked #268

Classes loaded by Pods are leaked #268

Comments

stephenbrady commented Aug 7, 2015

martinklepsch commented Oct 22, 2015

stephenbrady commented Oct 22, 2015

martinklepsch commented Oct 22, 2015

martinklepsch commented Oct 23, 2015

tobias commented Oct 23, 2015

micha commented Oct 23, 2015

tobias commented Oct 23, 2015

tobias commented Oct 23, 2015

martinklepsch commented Oct 25, 2015

stephenbrady commented Nov 23, 2015

martinklepsch commented Nov 23, 2015

martinklepsch commented Nov 23, 2015

stephenbrady commented Nov 23, 2015

martinklepsch commented Nov 23, 2015

stephenbrady commented Nov 23, 2015

martinklepsch commented Nov 23, 2015

tobias commented Nov 23, 2015

tobias commented Nov 23, 2015

stephenbrady commented Nov 23, 2015

arichiardi commented Jun 29, 2016