-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop running finalizers on exit #51466
base: master
Are you sure you want to change the base?
Conversation
Do we need to run a GC before then disabling it? |
Given the following backtrace, I don't know if running GC here could also freeze? That's why I added this disable at this point.
|
The comment in
is suspicious |
Ah. So it looks like after |
A GC-unsafe region is a place where allocations are safe
When threads are torn down (if they do get torn down at all) they end in such a way that GC will ignore them. If you wanted to ensure they get torn down correctly, there will be a new argument to |
No, that was only a theory as to why the thread whose backtrace I posted might have frozen in stop-the-world. The main issue is that the thread froze. If you look at the linked issue, the other backtraces suggest that some threads were stuck in This PR basically tries to duck the problem by turning off GC altogether, because we're exiting anyway. |
Those threads aren't stuck (sigprocmask can't get stuck as it is just a syscall), but they are spending most of their time there. The |
So just for some context me and @vchuravy believe this might be similar or exactly the same as the hangs we see atexit on the libuv thread pr. https://buildkite.com/julialang/julia-master/builds/27576#018a85fd-4452-4f20-bdbe-3cade4f7d0d1 |
This is getting a bit confusing.
Okay, 50914 looks pretty neat. But in this case, I'm not actually concerned about the threads shutting down gracefully or otherwise. I just don't want to freeze on exit.
It's not in 1.9, yeah. Looks like I should try to backport #41616 and also #47393? And that might prevent this freeze in stop-the-world on exit? Cc: @vchuravy for confirmation.
So @gbaraldi, you don't think that 41616 and 47393 will help? |
So looking at this issue and a couple others, I believe that running all finalizers like this at exit is not really tenable. Java and .NET went away from this because they were encountering the same issues as us. .net added something similar to our Their reasons were, you need to block all threads before running finalizers, because if you execute the finalizer of a live object it's now undefined. But blocking all threads is a syncronization issue, the decision was made to just not do it, and say that it's the users responsibility to ensure that resources get cleaned up before exiting. |
No longer running finalizers on exit is a more dramatic change that probably needs more thought/discussion. What this PR is doing is a much smaller change, no? |
What I'm saying is that I don't think this is enough. We are still running finalizers of objects that threads can still reach |
Running arbitrary code with GC disabled is certainly dangerous. |
Okay. So what's a good solution here? Stop running finalizers at exit as .NET did? |
Yeah, if people need to run things at ext they should use |
424e07d
to
719a8e0
Compare
I've changed this to remove the call to |
Will this be backported (to 1.10 at least)? |
Bump. |
Can/should we run PkgEval on this to see if there's an impact? |
@nanosoldier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a docs update + news
Grepping for jl_gc_run_all_finalizers
makes it show up in a few places. That document the behaviour.
Since we no longer run finalizers at exit.
Relies on finalizers running at exit.
045dc2a
to
5dde085
Compare
Add a `compat` note that as of Julia 1.11, finalizers will not be run on exit.
@vchuravy yeah agreed, re: "unpublicized" things. Some careful documentation can help, but it would still be better if it could be enforced in some way, to truly be a testing-only tool. Anyway, like I said, not worth holding this up. |
I agree that we should not run finalizers at exit. A finalizer runs when the following two conditions are met:
For (1), exiting does not necessarily mean we have reached that event. For (2), it's our decision when to run GC, and on exit we're dropping the whole heap so we can certainly decide not to run it. So neither of these conditions are met, so we should not run finalizers. |
commit 4796390 run all finalizers on exit. fixes #3624 |
(I guess disabling this won't re-break #3624 because now we have atexit_hooks right?) |
Disabling this will reopen #3624 because it will revert the fix for it |
Unfortunately, that seems to be true. E.g. one of the test failures on this PR is, I think, because of this (shown below)? function Process(cmd::Cmd, handle::Ptr{Cvoid})
this = new(cmd, handle, devnull, devnull, devnull,
typemin(fieldtype(Process, :exitcode)),
typemin(fieldtype(Process, :termsignal)),
ThreadSynchronizer())
finalizer(uvfinalize, this)
return this
end In this case (but not in all cases), I think we want to do the same thing at exit as we do when a Suppose we add: function atexit(f, x)
_x = WeakRef(x)
atexit() do
_x.value !== nothing && f(x)
end
nothing
end Then, would adding |
But another thread could be running a task that is working with that We could prevent that by suspending threads in Maybe we throw an |
We can have arbitrary many open files, and not flushed, so also flushing and thus closing can take an arbitrary long time, and thus exit(), if it done by triggering finalizers (or explicitly)?! That doesn't seems good. Also does the order of flusing matter (I think it might, at least sometimes), and do finalizers run in that wanted order? If they don't or can't then I don't think we should use them for this.
Why do those get away with not doing it? Don't they flush? All open files (file handles) are known, can we keep e.g. a queue of say last 10 not-yet flushed file handles and close them in the right order on exit? If we write to a file we add it to the front of the queue and flush the 11th file that then drops off it. On exit this then is a bounded (and very quick) problem. We could additionally do run( Maybe finalizers do this in effect, I'm not sure, but they are for more (more than just files of any kind), and also for open "files" i.e. file handles to the web. I'm not sure if it's worth flushing those, nor do I know if we have a good way of knowing which "files" are really files and local. Unclear what we should do for remote filesystems. |
They likely just don't flush? The C and C++ standards require running these finalizers (aka destructors) because stdout is flushed by them, and so yes, they will block on exit until that aspect of finalization is complete. But for many GC languages (e.g. Java in that list), failing to call flush and/or close may result in the file itself getting truncated anyways due to finalizer ordering. |
I was only asking for (those) GC languages, why they get away with it, and you seem to confirm they put the responsibility on the user, i.e. even if they had run finalizers (e.g. on exit) that would not have worked (always). So I think we should just not run the finalizers on exit. Destructors are a different story (in C++, they run unless you abort(), you can also do that, to exit...), C doesn't have them so responsibility is fully on the developer, and not on any standard to to it for them, just suggest what you should do. Mojo has destructors I believe, but goes further than C++, it deallocates (heap) memory earlier than C++ can, so I assume if closes (and implicitly flushes) files earlier too. We should likely emulate that for files, and for heap deallocation, when possible. |
C accidentally does have them, since it requires that |
I'm confused, Julia's exit calls jl_exit which calls C's exit, and it should flush files, thus no need to run finalizers for that?! Or is it about the wrong order, and C does it wrong? No, C doesn't have the destructor (concept), what I meant by "doesn't have", though file flushing/closing is sort of similar, but restricted to that, unlike in C++ where they are responsible for all cleanup (by default at least). In some cases we call C's _exit(1) (or even _exit(0) which doesn't flush, as designed, are we incorrectly calling that?). https://pubs.opengroup.org/onlinepubs/9699919799/functions/exit.html
It does though have that CX footnote:
Do we for sure use POSIX.1-2017 (extension?), is that something we can and should opt into? But do not do and miss out? https://pubs.opengroup.org/onlinepubs/9699919799/functions/_Exit.html Line 236 in a3effa9
This seems to be correct: Line 448 in a3effa9
but I'm not sure this is always correct: Line 236 in a3effa9
e.g. for SIGTERM too, i.e. exit code 143 = 128+15? Open issue for such here: https://stackoverflow.com/questions/18499497/how-to-process-sigterm-signal-gracefully |
It seems to me that the most pragmatic solutions here are:
I think that the atexit hook proposal is very straightforward and should be possible to implement. Maybe we can proceed with that, until/unless we get more information on using C's atexit hooks? Kiran: Rather than registering a new atexit hook for each file, you could also add the files to a per-thread WeakKeySet (WeakKeyDict{File,Nothing}) of open files, and then have a single atexit hook that flushes all the files in those dicts? This could reduce contention on the atexit hook lock, and also prevent a memory leak of those callbacks building up for every file that was ever opened. |
C's
Implementing the |
btw, I think packages like Arrow and CSV rely on finalizers running on exit to release mmap'd memory, since they mmap inputs by default |
One thought I had the other day. Currently it is assumed in a few places that finalizers run in a certain sequence. E.g. |
That might be possible, because we schedule the whole list without much care |
We should run them in the reverse of the order they were added (c.f. jl_gc_run_finalizers_in_list) if they both die at the same time. If two unrelated objects die around the same time, their order is unpredictable (since we may schedule the finalizers for one, and then, before running it, then schedule and run the finalizers for the other). Having a separate, single thread for finalizers to run on might fix some of that? |
Reverse order may help, but you can easily run into a situation where a younger object holds a reference to something older and depends on that state to still be valid. This reinforces my belief that we should do this PR instead of promising that finalizers will run before the end of the program. |
How would you construct that situation? Either they die at the same time, or the younger one dies first. The older one cannot die first, since there is a reference to it |
Isn't that the point? By running finalizers at exit (and not just those that are provable dead) we can construct such a situation? |
Closes #50038.
We've observed that when GC is triggered during exit, Julia can freeze during stop-the-world. This is a symptom of a general problem with running finalizers at exit -- program threads are still running and the various live objects are potentially in use. Pausing threads before running finalizers would possibly introduce other problems.
For this reason, as observed below, other languages (Java, .NET, Go) do not run finalizers on exit and with this PR, Julia will also stop running finalizers on exit.