Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API review: During shutdown, revisit finalization and provide a way to clean up resources #16028

Closed
kouvel opened this issue Jan 6, 2016 · 59 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@kouvel
Copy link
Member

kouvel commented Jan 6, 2016

Running finalizers on reachable objects during shutdown is currently unreliable. This is a proposal to fix that and provide a way to clean up resources on shutdown in a reliable way.

Issues observed on shutdown

Currently, a best-effort attempt is made to run finalizers for all finalizable objects during shutdown, including reachable objects. Running finalizers for reachable objects is not reliable, as the objects are in an undefined state.

  • In order to finalize reachable objects, threads must be blocked, since objects that are still reachable cannot be used during or after finalization. Later, threads are terminated without running any more user code.
  • Running user code in finalizers after blocking other threads is unreliable, as those threads may be blocked at an inopportune point, and may cause finalizers to block indefinitely or result in undefined behavior due to the undefined state of the object
    • Example from a user code perspective. Consider an object that writes to a network stream using some stateful communication protocol. The finalizer would write a termination value to the stream and close the stream. Suppose that the termination value indicates to the receiving end that all data has been written, while abruptly closing the stream without writing the termination value would indicate incomplete transmission due to disconnection or some other reason. Writing the termination value in the finalizer assumes that there are no more references to the object, indicating that all data has been written. Suppose that a background thread is using the object, writing data to the pipe. During shutdown, the background thread is blocked at some arbitrary point, and the object is still referenced. Writing the termination value to the pipe in the finalizer at that point may be invalid according to the protocol.
    • Example from a runtime perspective. If a thread is blocked during GC, and a finalizer tries to allocate something, it may block waiting for GC to complete, which will never happen. The finalizer itself may not even allocate anything, but even just jitting the finalizer method will trigger allocation. While this particular issue can be fixed separately, it demonstrates the unreliability of the current design not just from a user code perspective but from a runtime perspective.
  • Effectively, the best-effort attempt to run finalizers for reachable finalizable objects is not reliable.

Proposal

  • Don't run finalizers on shutdown (for reachable or unreachable objects)
  • Don't block threads on shutdown
  • Don't do a GC on shutdown (no change from current behavior)
    • Under this proposal, it is not guaranteed that all finalizable objects will be finalized before shutdown.
    • Doing a GC on shutdown and running finalizers for unreachable objects can guarantee that objects that are deterministically unreachable by the time of shutdown will be finalized. However, such objects should also be deterministically disposed before shutdown. For cases that require this, the user can trigger a GC explicitly and wait for finalizers before shutdown.
    • When there are background threads that are still running, there would be no guarantee on how many objects will be finalized anyway
  • Provide a public AssemblyLoadContext.Unloading event
    • An AssemblyLoadContext manages the lifetime of assemblies loaded under that context
    • Code should register for the event in the AssemblyLoadContext instance associated with the assembly
    • The event is raised when the GC determines that the AssemblyLoadContext instance is no longer referenced. For the default AssemblyLoadContext instance and for a custom instance installed as the default load context, the event will be raised before normal shutdown.
      • Abrupt shutdown due to unhandled exception, process kill, etc., will not raise any further Unloading events
      • As unloading an AssemblyLoadContext is not yet implemented, instances that have been used to load assemblies will currently live until the end of the process
    • No timeout. The timeout on waiting for finalizers to complete on shutdown was removed in CoreCLR some time back. In favor of treating blocking issues as program errors, no timeout will be used for this event either.
    • Event handler exceptions will crash the process. Any exception propagating out of an event handler will be treated as an unhandled exception.
    • Since other threads are not blocked before this, and may continue to run for a short period after the event is raised, event handlers may need to handle concurrency, and safeguard from using cleaned up resources from other threads

Behavioral change

public static void Main()
{
    var obj = new MyFinalizable();
}

private class MyFinalizable
{
    ~MyFinalizable()
    {
        Console.WriteLine("~MyFinalizable");
    }
}

Previous output:
~MyFinalizable

Typical output with the proposal above (running the finalizer is not guaranteed, but may run if a GC is triggered):
(empty)

Proposed API

namespace System.Runtime.Loader
{
    public abstract class AssemblyLoadContext
    {
        public event Action<AssemblyLoadContext> Unloading;
    }
}

Example

public class Logger
{
    private static readonly object s_lock = new object();
    private static bool s_isClosed = false;

    static Logger()
    {
        var currentAssembly = typeof(Loader).GetTypeInfo().Assembly;
        AssemblyLoadContext.GetLoadContext(currentAssembly).Unloading += OnAssemblyLoadContextUnloading;

        // Create log file based on configuration
    }

    private static void OnAssemblyLoadContextUnloading(AssemblyLoadContext sender)
    {
        // This may be called concurrently with WriteLine
        Close();
    }

    private static void Close()
    {
        lock (s_lock)
        {
            if (s_isClosed)
                return;
            s_isClosed = true;

            // Write remaining in-memory log messages to log file and close log file
        }
    }

    public static void WriteLine(string message)
    {
        lock (s_lock)
        {
            if (s_isClosed)
                return;

            // Save log message in memory, if buffer is full, write messages to log file
        }
    }
}
@kouvel kouvel self-assigned this Jan 6, 2016
@kouvel
Copy link
Member Author

kouvel commented Jan 6, 2016

CC @jkotas @stephentoub @terrajobst

@jkotas
Copy link
Member

jkotas commented Jan 7, 2016

The proposed AssemblyLoadContext.ProcessExit API is essentially exposing AppDomain.ProcessExit from full .NET Framework under different name.

It may be worth noting that the fundamental problem with finalization during shutdown came from inspiration by Java designs in the early days of .NET. Java have realized and fixed the fundamental problem many years ago by deprecating (disabling by default) finalization during shutdown. It is time to fix it for .NET Core as well.

Uncommenting the call to GC.Collect() would produce the previous output:

This is actually not guaranteed. The JIT is allowed to extend the lifetime of the local variable till the end of the method, and so the call to GC.Collect may still see MyFinalizable object as live.

@kouvel
Copy link
Member Author

kouvel commented Jan 7, 2016

This is actually not guaranteed. The JIT is allowed to extend the lifetime of the local variable till the end of the method, and so the call to GC.Collect may still see MyFinalizable object as live.

Oh right, I had meant to null it out, but I guess there might still be a reference somewhere. I'll just remove that part.

@clrjunkie
Copy link

For instance, if a thread is blocked during GC, and a finalizer tries to allocate something, it may block waiting for GC to complete, which will never happen. The finalizer itself may not even allocate anything, but even just jitting the finalizer method will trigger allocation.

"if a thread is blocked during GC", are you referring to the finalizer thread? Is that because the GC does not reclaim any memory during shutdown and if the finalizer thread attempts to make an allocation that would trigger a GC it would block waiting to be notified that the GC completed its work - a notification that would never occur since at this point GC requests would simply be ignored?

Provide a public ProcessExit event that is raised during shutdown.

What is the execution order of this event with respect to finalize methods execution?

The event would be raised on the finalizer thread

Will the ProcessExit event retain the same timeout semantics as “regular” finalization
2 sec timeout per finalize method (kill process on first timeout) 40 sec total timeout

Looking at AssemblyLoadContext, it appears that it's sole responsibility is well loading assemblies. Why not than simply call it AssemblyLoader (“Context” in .NET is usually used to model temporary state, especially during Async operations)

Why not put this event in System.Environment along HashShutdownStarted like in the full .NET?

@kouvel
Copy link
Member Author

kouvel commented Jan 8, 2016

"if a thread is blocked during GC", are you referring to the finalizer thread?

Finalizers are run in a separate finalizer thread. Here I'm referring to other threads running managed code. Other threads are blocked, then finalizers run on the finalizer thread.

Is that because the GC does not reclaim any memory during shutdown and if the finalizer thread attempts to make an allocation that would trigger a GC it would block waiting to be notified that the GC completed its work - a notification that would never occur since at this point GC requests would simply be ignored?

When there are background threads running managed code, and those threads are blocked, one of them may be blocked while doing a GC. When finalizers run on the finalizer thread, an allocation may get blocked until GC is complete but the thread doing the GC was blocked so it would never complete.

What is the execution order of this event with respect to finalize methods execution?

The event is raised before the final round of running finalizers.

Will the ProcessExit event retain the same timeout semantics as “regular” finalization
2 sec timeout per finalize method (kill process on first timeout) 40 sec total timeout

The timeout for waiting for finalizers had been disabled some time back in CoreCLR. If a finalizer blocks indefinitely, it should not be due to a CLR issue, so a timeout should not be necessary. Similarly, I'm proposing that a timeout is not necessary for the ProcessExit event. If it blocks indefinitely, then it should be due to a program error.

Looking at AssemblyLoadContext, it appears that it's sole responsibility is well loading assemblies. Why not than simply call it AssemblyLoader (“Context” in .NET is usually used to model temporary state, especially during Async operations)

There can be multiple contexts under which assemblies are loaded, each with custom load behavior. Loaded assemblies are associated with a context so that they can be unloaded (based on ref count) in batch for a particular context. I suppose "context" here may refer to the behavior with which assemblies are loaded.

Why not put this event in System.Environment along HashShutdownStarted like in the full .NET?

Environment may also be a good place to put it. AssemblyLoadContext was chosen because the ProcessExit event is the equivalent counterpart of AppDomain.ProcessExit in the desktop CLR, and AssemblyLoadContext takes some of the responsibilities of AppDomain in CoreCLR. Thoughts?

@clrjunkie
Copy link

@kouvel

Finalizers are run in a separate finalizer thread. Here I'm referring to other threads running managed code.

I know this. I'm just trying to understand your example, specifically in the context of shutdown.

When there are background threads running managed code, and those threads are blocked, one of them may be blocked while doing a GC. When finalizers run on the finalizer thread, an allocation may get blocked until GC is complete but the thread doing the GC was blocked so it would never complete.

Please help me understand the exact chain of events:

Let's assume we have three "regular" user code threads T1, T2, T3 and the Finalizer thread FT.

So:

  1. T1 executes some long running user code.
  2. T3 calls either Environment.Exit(0) or Process.Kill followed by WaitForExit();
  3. T2 executes user code that calls newobj which in turn calls GC code.
  4. GC Code executed by T2 suspends execution of T1 and possibly also T3 (if it is still executing a little bit after calling either shutdown methods).
  5. T1 and optionally T3 are now blocked at some arbitrary point, each in their respective code paths.
  6. GC Code executed by T2 completes and releases/starts FT which will 1) Append object references listed in the Finalization List on to the F-reachable Queue 2) Dequeue each object from F-reachable Queue and call it's Finalize method.
  7. GC Code executed by T2 resumes execution of T1 and T3, unwinds and continues executing it’s user code. (T1,T2,T3,FT are all running)

Questions:

a) What happens next? I assume the CLR will simply wait for T1, T2, T3 and FT to complete and exit the process.

b) What blocking issues can arise here? I mean even if FT executes a Finalize method that triggers a GC, I would expect that the FT Thread will continue and execute the GC code which will ultimately augment the F-reachable Queue, unwind and continue executing the current Finalize method.

Point being I don't expect FT and T2 to be dependent one on another.

c) How all this play out when GC Code is invoked asynchronously (e.g concurrent GC)

The event is raised before the final round of running finalizers.

What do you mean by "final round”? This should be documented in the spec.

The timeout for waiting for finalizers had been disabled some time back in CoreCLR.

This should be documented as well.

Similarly, I'm proposing that a timeout is not necessary for the ProcessExit event. If it blocks indefinitely, then it should be due to a program error.

I agree, and in a way it's better because on the full .NET there is no alert if the finalizer thread gets killed, which makes debugging very hard.

Nothing is written to event log when this happens:

class Program
{
    static WorkItem WorkItem = new WorkItem();
    static void Main(string[] args)
    {
        WorkItem.DoWork();
    }
}

class WorkItem
{
    public void DoWork() 
    {
        Console.WriteLine("Done");            
    }

    ~WorkItem()
    {            
        Console.WriteLine("Finalizing");
        Thread.Sleep(5000);
        Console.WriteLine("Finalized");
    }
}

Therefore, I suggest writing an error to the event log or stderr when such a situation occurs that includes some context on the blocking finalize method.

Writing robust graceful shutdown code is a hard problem. Debugging it in production is even harder.

Loaded assemblies are associated with a context so that they can be unloaded (based on ref count) in batch for a particular context.

Is this something new? As I recall once an Assembly is loaded into an AppDomain it can't be unloaded, you can only discard the AppDomain.. Any written references?

I suppose "context" here may refer to the behavior with which assemblies are loaded.

Then "context" is more of a loading parameter per "type of load" in any case I don't see how appending the word "Context" to the class helps to clarify this.

AppDomain.ProcessExit in the desktop CLR, and AssemblyLoadContext takes some of the responsibilities of AppDomain in CoreCLR. Thoughts?

I think "AppDomain" and "Assembly" are two different concepts that should be encapsulated separately.

@kouvel
Copy link
Member Author

kouvel commented Jan 8, 2016

@clrjunkie,

  1. GC Code executed by T2 completes and releases/starts FT which will 1) Append object references listed in the Finalization List on to the F-reachable Queue 2) Dequeue each object from F-reachable Queue and call it's Finalize method.

a) What happens next? I assume the CLR will simply wait for T1, T2, T3 and FT to complete and exit the process.

b) What blocking issues can arise here? I mean even if FT executes a Finalize method that triggers a GC, I would expect that the FT Thread will continue and execute the GC code which will ultimately augment the F-reachable Queue, unwind and continue executing the current Finalize method.

After FT enters its shutdown sequence, it tries to block T1, T2, and T3 before running finalizers. Suppose T2 has just started a GC (see gc.cpp, GarbageCollectGeneration, GC is now flagged as started), then as part of trying to disable preemptive GC, it detects the shutdown case and blocks for shutdown (see https://github.com/dotnet/coreclr/blob/a3a2d323653e08b7592ed70070f13833b2994828/src/vm/threadsuspend.cpp#L3423).

A finalizer that does allocation and needs to allocate more space will go into gc.cpp, try_allocate_more_space, where it will block forever waiting for GC to complete. The thread running shutdown code would wait forever for the finalizer thread to signal completion, leading to a hang during shutdown.

While this and other similar occurrences could technically be considered bugs that could be fixed separately, the issue is more about blocking T1, T2, and T3 before running finalizers. Finalizers can't be reliably expected to complete successfully when some threads are arbitrarily blocked. Finalizers may rely on running in a state where the object is not referenced anymore, which is not the case here. So the state of the object may not match expectations and anything could happen as a result, especially when unmanaged resources are involved. Although, it probably wouldn't be a common case that something bad would happen.

Point being I don't expect FT and T2 to be dependent one on another.

They shouldn't be dependent on one another, but FT blocking T2 creates the issue.

What do you mean by "final round”? This should be documented in the spec.

I'll add this and a general note about timeouts to the issue description.

After FT enters its shutdown sequence, it blocks other threads, and does one last round of calling finalizers (see https://github.com/dotnet/coreclr/blob/a3a2d323653e08b7592ed70070f13833b2994828/src/vm/finalizerthread.cpp#L843).

Therefore, I suggest writing an error to the event log or stderr when such a situation occurs that includes some context on the blocking finalize method.

I believe this is by design on desktop CLR. The timeout is documented. In any case, the timeout was removed in CoreCLR, and with this proposal, the finalizer is not guaranteed to run at all. The expectation is that such cases will either use the dispose pattern, or use the ProcessExit event to clean up resources in cases where it is not clear when the process is about to exit (such as returning from Main).

Is this something new? As I recall once an Assembly is loaded into an AppDomain it can't be unloaded, you can only discard the AppDomain.. Any written references?

I'm not sure if unloading AssemblyLoadContexts has been implemented, but I believe that was the idea. The association between loaded assemblies and an AssemblyLoadContext is already there. See dotnet/coreclr#1684.

Then "context" is more of a loading parameter per "type of load" in any case I don't see how appending the word "Context" to the class helps to clarify this.

I think "AppDomain" and "Assembly" are two different concepts that should be encapsulated separately.

The name doesn't seem so bad to me, conceptually I think of it as "an AppDomain is an AssemblyLoadContext".

@kouvel
Copy link
Member Author

kouvel commented Jan 8, 2016

Made a couple of changes:

  • Added notes about there being no timeouts on running finalizers (already the case), and on processing the new event
  • Changed the static ProcessExit event to an instance Unload event. For the time being, it will be raised at the same place as AppDomain.ProcessExit. Once AssemblyLoadContext unloading is implemented, it will move to a more appropriate spot.

@clrjunkie
Copy link

After FT enters its shutdown sequence, it tries to block T1, T2, and T3 before running finalizers.

FT enters its shutdown sequence because T3 called either Environment.Exit(0) or Process.Kill followed by WaitForExit(); before T2 called into GC Code, Correct?

Suppose T2 has just started a GC (see gc.cpp, GarbageCollectGeneration, GC is now flagged as started), then as part of trying to disable preemptive GC, it detects the shutdown case and blocks for shutdown

Again, because T3 has established the shutdown case, Correct?

So conceptually :

  1. T3 sets shutdown flag to true and keeps running
  2. T2 starts calling into GC Code detects that shutdown flag = true and blocks
  3. Because T3 set shutdown flag FT was resumed, checked the flag and started to execute shutdown code that attempts to block all other threads, since T2 is blocked FT can;t block it ..

Is that what you mean?

A finalizer that does allocation and needs to allocate more space will go into gc.cpp, try_allocate_more_space, where it will block forever waiting for GC to complete.

Why would the GC not complete?

This works

class WorkItem
{
public void DoWork()
{
Console.WriteLine("Done");
}

    ~WorkItem()
    {
        Console.WriteLine("Finalizing");

        var b = new byte[10000000];
        b[0] = 1;
        Console.WriteLine("Finalized " + b[0]);
    }

The thread running shutdown code would wait forever for the finalizer thread to signal completion,

Are you taking about a different "special" thread dedicated to cleanup, who is this "thread running shutdown code" ? and why would it wait forever for the finelizer to complete, why would finilerzer get stuck (aside from programmer bugs) ?

@clrjunkie
Copy link

The name doesn't seem so bad to me, conceptually I think of it as "an AppDomain is an AssemblyLoadContext".

So maybe AssemblyLoadContext should be abstract and AppDomain should subclass it?

@clrjunkie
Copy link

Than it would make sense to call methods to load and unload dll's on (into) AppDomains..

@clrjunkie
Copy link

Maybe I miss understood the sequence:

T3 sets shutdown = true which will cause FT to start running BUT T2 is faster then FT it calls GC Code which sets "gc-running" = true and starts shutdown code.. now FT checks "gc-running" and
waits for it to become false which will never happen because T2 is shutting down and won't reset the flag...

Is that what's happening?

@kouvel
Copy link
Member Author

kouvel commented Jan 9, 2016

Here's a test case that hangs on shutdown.

using System;
using System.Threading;

public class FinalizeTimeout
{
    public static int Main(string[] args)
    {
        Console.WriteLine("Main start");

        // Start a bunch of threads that allocate continuously, to increase the chance that when Main returns, one of the
        // threads will be blocked for shutdown while holding one of the GC locks
        for (int i = 0; i < Environment.ProcessorCount - 1; ++i)
        {
            var t = new Thread(ThreadMain);
            t.IsBackground = true;
            t.Start();
        }

        // Run the finalizer at least once to have its code be jitted
        BlockingFinalizerOnShutdown finalizableObject;
        do
        {
            finalizableObject = new BlockingFinalizerOnShutdown();
        } while (!BlockingFinalizerOnShutdown.finalizerCompletedOnce);

        Console.WriteLine("Main end");

        // Create another finalizable object, and immediately return from Main to have finalization occur during shutdown
        finalizableObject = new BlockingFinalizerOnShutdown() { isLastObject = true };
        return 100;
    }

    private static void ThreadMain()
    {
        byte[] b;
        while (true)
            b = new byte[1024];
    }

    private class BlockingFinalizerOnShutdown
    {
        public static bool finalizerCompletedOnce = false;
        public bool isLastObject = false;

        ~BlockingFinalizerOnShutdown()
        {
            if (finalizerCompletedOnce && !isLastObject)
                return;

            Console.WriteLine("Finalizer start");

            // Allocate in the finalizer for long enough to try allocation after one of the background threads blocks for
            // shutdown while holding one of the GC locks, to deadlock the finalizer. The main thread should eventually time
            // out waiting for the finalizer thread to complete, and the process should exit cleanly.
            TimeSpan timeout = isLastObject ? TimeSpan.FromMilliseconds(500) : TimeSpan.Zero;
            TimeSpan elapsed = TimeSpan.Zero;
            var start = DateTime.Now;
            int i = -1;
            object o;
            do
            {
                o = new object();
            } while ((++i & 0xff) != 0 || (elapsed = DateTime.Now - start) < timeout);

            Console.WriteLine("Finalizer end");
            finalizerCompletedOnce = true;
        }
    }
}

This test case has a lot of background threads doing allocation, so the chance of blocking during GC is high. After blocking background threads, the finalizer allocates as well and blocks on GC. The test case occasionally completes successfully on my machines, and hangs forever most of the time. If the test did not wait for the finalizer to run once before triggering shutdown, then allocation while jitting the finalizer function (or one of the library functions called inside) will block.

Is that what you mean?

Yes, to this and questions above it

Why would the GC not complete?

T2 blocked for shutdown after starting GC, it will not resume after that point.

So maybe AssemblyLoadContext should be abstract and AppDomain should subclass it?

Perhaps, regarding the subclassing but I'm not too familiar with them. I think AppDomain is not exposed in .NET Core, although I could be wrong.

Is that what's happening?

(From your last comment). Yes that is what happens in one case.

@clrjunkie
Copy link

Ok, I think understand the scenario.

So the question is:

T2 blocked for shutdown after starting GC, it will not resume after that point.

Why should T2 blocked for shutdown after starting GC? Why not wait for T2 to finish GC and than suspend it? The scenario is destructive by definition (in clean shutdown all user threads would be finished by the application) why is it important to hold the GC lock during shutdown which will cause FT to deadlock? Why not release GC Lock, and than suspend the thread and once all threads are suspended than run TF.

@kouvel
Copy link
Member Author

kouvel commented Jan 11, 2016

This particular issue can be fixed as you suggested, but as I mentioned above, the issue is more about the expectation that finalizers for live objects run to completion while other threads are blocked at some random point.

While this and other similar occurrences could technically be considered bugs that could be fixed separately, the issue is more about blocking T1, T2, and T3 before running finalizers. Finalizers can't be reliably expected to complete successfully when some threads are arbitrarily blocked. Finalizers may rely on running in a state where the object is not referenced anymore, which is not the case here. So the state of the object may not match expectations and anything could happen as a result, especially when unmanaged resources are involved. Although, it probably wouldn't be a common case that something bad would happen.

@clrjunkie
Copy link

the issue is more about blocking T1, T2, and T3 before running finalizers. Finalizers can't be reliably expected to complete successfully when some threads are arbitrarily blocked.

Why? Finalizers execute cleanup code of unreferenced objects by design.. How does this relate to blocked threads?

Finalizers may rely on running in a state where the object is not referenced anymore, which is not the case here.

Why? What is the case here?

So the state of the object may not match expectations and anything could happen as a result, especially when unmanaged resources are involved.

For Example?

@clrjunkie
Copy link

By "Live" you mean you run the finializer methods of Referenced objects??

What's the point? Everything "referenced" is in an undefined state during shutdown.

Is that your point ?

@kouvel
Copy link
Member Author

kouvel commented Jan 11, 2016

Why? Finalizers execute cleanup code of unreferenced objects by design.. How does this relate to blocked threads?

By "Live" you mean you run the finializer methods of Referenced objects??

Yes we are running finalizers for objects that are still referenced. To prevent continued use of those still live objects, threads are blocked for shutdown.

Consider an object that writes to a network stream using some stateful communication protocol. The finalizer would write a termination value to the stream and close the stream. Suppose that the termination value indicates to the receiving end that all data has been written, while abruptly closing the stream without writing the termination value would indicate incomplete transmission due to disconnection or some other reason. Writing the termination value in the finalizer assumes that there are no more references to the object, indicating that all data has been written. Suppose that a background thread is using the object, writing data to the pipe. During shutdown, the background thread is blocked at some arbitrary point, and the object is still referenced. Writing the termination value to the pipe in the finalizer at that point may be invalid according to the protocol.

Blocking threads arbitrarily and running finalizers for referenced objects can cause unintended behavior.

@clrjunkie
Copy link

I think the whole "thread blocking" aspect doesn't make the scenario any more or any less problematic.
The problem is that currently Finilizer code is executed not when it's supposed too, that's the problem. whether threads are blocking or not is irrelevant.

The problem statement is:

"Finalizers of referenced objects execute during shutdown" period.

There nothing more to explain.

If you agree we can move to discuss your solution.

@kouvel
Copy link
Member Author

kouvel commented Jan 11, 2016

Yes that's the core of the problem, blocking threads is only a natural side-effect of it.

@clrjunkie
Copy link

Ok, so I would include this as the sole problem statement and remove the whole GC lock scenario as it is more of a bug - nothing to learn for this.

Proposal:

Don't block threads before running finalizers

So how would you terminate the threads if the user did not ensure they exit cleanly before returning from Unload?

Run finalizers only for dead objects

What do you mean? finilizers run only for dead objects..Do you mean ensure all finilizers of dead objects execute? How would you ensure this?

Don't do a GC on shutdown to maximize the number of dead objects that can be finalized.

I don't understand the whole point is when GC completes it "gives work" to the finalizer thread (appending to the freachable queue new items) how does not doing GC maximize the number of dead objects that can be finalized?

Doing a GC on shutdown can guarantee that objects that are deterministically dead by the time of shutdown will be finalized.

You mean who ever does the GC also waits for the finalizer thread to exit? Who waits?

However, such objects should also be deterministically disposed before shutdown. For cases that require this, the user can trigger a GC explicitly before shutdown.

You mean the user should call GC.Collect() ?

When there are background threads that are still running, there would be no guarantee on how many objects will be finalized anyway

I don't think background threads are relevant, again if finalization was not performed it was not performed.

Provide a public AssemblyLoadContext.Unload event. Registered event handlers can clean up global resources. ◦The event is raised on the default AssemblyLoadContext during shutdown for the time being. Once unloading AssemblyLoadContext is implemented, raising the event would move to an appropriate point.

OK.

Since other threads are not blocked before this, and may continue to run for a short period after the event is raised, event handlers may need to handle concurrency, and safeguard from using cleaned up resources from other threads

I would delete this. You are providing a notification, whatever synchronization requirements are needed to coordinate with user logic is not of this proposal concern.

@kouvel
Copy link
Member Author

kouvel commented Jan 12, 2016

Thanks for your feedback, I took some of your suggestions and updated the proposal.

Ok, so I would include this as the sole problem statement and remove the whole GC lock scenario as it is more of a bug - nothing to learn for this.

I agree that this is not an irrefutable example of the problem. I added the example above to the proposal.

However, there is definitely something to learn from this example. It demonstrates some of the extent of unreliability - the bug you refer to is rather a manifestation of the core problem in the runtime, something that should not be tolerated.

For instance, we could also consider running finalizers for live objects without blocking other threads. This would lead to a whole set of other problems and is clearly not feasible, thus demonstrating that there is no feasible solution involving running finalizers for live objects.

So how would you terminate the threads if the user did not ensure they exit cleanly before returning from Unload?

Threads marked as IsBackground = false, are waited upon. Threads marked as IsBackground = true, are aborted. See https://github.com/dotnet/coreclr/blob/master/Documentation/botr/threading.md for more info.

What do you mean? finilizers run only for dead objects..Do you mean ensure all finilizers of dead objects execute? How would you ensure this?

Updated the wording to clarify. I meant "don't run finalizers for live objects".

I don't understand the whole point is when GC completes it "gives work" to the finalizer thread (appending to the freachable queue new items) how does not doing GC maximize the number of dead objects that can be finalized?

Updated the wording to clarify. I meant "don't do a GC on shutdown, where doing a GC would increase the number of dead objects that can be finalized".

You mean who ever does the GC also waits for the finalizer thread to exit? Who waits?

Finalizers are run before shutdown, the waiting is done by the thread running shutdown code. See ceemain.cpp, EEShutdownHelper.

You mean the user should call GC.Collect() ?

Not that they should, but that they could. I would only suggest this as a workaround for problematic legacy code that may be affected by the change.

I don't think background threads are relevant, again if finalization was not performed it was not performed.

This emphasizes that doing a GC on shutdown does not make any additional guarantees, I believe it is relevant, as the core problem only manifests when there are background threads running during shutdown, and it is a reason for the parent suggestion in the proposal. I would consider a program that shuts down without gracefully terminating its own background work to be a bad program, but it can and does happen.

I would delete this. You are providing a notification, whatever synchronization requirements are needed to coordinate with user logic is not of this proposal concern.

Especially during such complicated shutdown scenarios, I believe it is important to classify the circumstances under which a new API operates. Regardless, it doesn't hurt to detail an API's contract in any case, and I wish it were done more often.

@clrjunkie
Copy link

However, there is definitely something to learn from this example. It demonstrates some of the extent of unreliability - the bug you refer to is rather a manifestation of the core problem in the runtime, something that should not be tolerated

Ahh, no problem. I meant to say "nothing to learn from" as a user :)

Threads marked as IsBackground = false, are waited upon. Threads marked as IsBackground = true, are aborted. See https://github.com/dotnet/coreclr/blob/master/Documentation/botr/threading.md for more info.

Ok.. no change in this behavior.

Finalizers are run before shutdown, the waiting is done by the thread running shutdown code. See ceemain.cpp, EEShutdownHelper.

Didn't we agree that finalizers must not run on shutdown ?@!

Don't do a GC on shutdown. Doing a GC on shutdown would increase the number of dead objects that will be finalized.

So what, these are already dead objects, why not let them cleanup?

I understand that the notification gives the user an opportunity to properly stop running/referenced things,but if something is already dead why not let it die gracefully? This would actually be a valid user expectation.. What do I know when a finializer get's invoked.. I mean from a user perspective, I except that *once I release a reference to an object * any cleanup code I specified in its destructor would be executed sooner or later.. Now I need to think what happens when the program exits and there are still dead objects that didn't have their finalizer run ... big mess.

@kouvel
Copy link
Member Author

kouvel commented Jan 13, 2016

Didn't we agree that finalizers must not run on shutdown ?@!

Finalizers for live objects must not be run on shutdown. I was thinking finalizers would still run for dead objects, but it seems like there is no point in running any finalizers. I have updated the proposal to reflect that. So in that case, yes a GC.Collect() would need to be followed by a GC.WaitForPendingFinalizers().

So what, these are already dead objects, why not let them cleanup?

Some objects may be already dead but that won't be discovered until a GC is done. The reason to not do a GC before shutdown is the same as the reason to not run finalizers before shutdown.

I understand that the notification gives the user an opportunity to properly stop running/referenced things,but if something is already dead why not let it die gracefully? This would actually be a valid user expectation.. What do I know when a finializer get's invoked.. I mean from a user perspective, I except that *once I release a reference to an object * any cleanup code I specified in its destructor would be executed sooner or later.. Now I need to think what happens when the program exits and there are still dead objects that didn't have their finalizer run ... big mess.

Guaranteeing that dead objects will be finalized doesn't seem very useful. Consider the premise where there are a set of finalizable objects. They have finalizers because they don't have a clear end of life where they can be explicitly disposed. When the program enters the shutdown sequence, some of those objects may be dead, and some alive. What use is it to guarantee that the dead ones will be finalized, when the live ones are not? Finalizable objects referenced by static fields would also not be finalized. Ultimately, under this proposal, the finalizer is not guaranteed to run. To do proper cleanup, a class may need to handle the Unload event in addition to or instead of a finalizer.

@kouvel
Copy link
Member Author

kouvel commented Jan 13, 2016

Edited inline above

@clrjunkie
Copy link

When the program enters the shutdown sequence, some of those objects may be dead, and some alive. What use is it to guarantee that the dead ones will be finalized, when the live ones are not?

I agree with your reasoning, nevertheless I don't think it's obvious; therefore, I highly recommend to document the shutdown contract in a way that specifies the chain of events .

Something like:

In .NET Core, the CLR will not run Garbage Collection after raising an Unload event and any future calls to GC.Collect() will be discarded; therefore, once an Unload event occurs any finalization code of either referenced or unreferenced objects will not be executed by the finalizer thread.

Q1: Can an Unload event happen during an active GC? I assume it will get invoked after GC Completes, Correct?

Q2: Under what "shutdown circumstances" will the Unload event be raised?

  1. Main thread exit
  2. Environment.Exit
  3. After Process.Kill, before WaitForExit() or After?
  4. After which Linux/Unix Signal levels?
  5. Unhandled Exception in Managed Thread?
  6. Unhandled Exception in Native I/O Completion Thread?
  7. As far as the future AssemblyLoadContext (e.g AppDomain) unloading feature.. How to distinguish between an Unload event for the entire process and one targeted for a Context instance? any dependency between the two? In what order are Unload events raised for multiple AssemblyLoadContext ? Registration order?

@clrjunkie
Copy link

Extending the previous question:

Q1: Can an Unload event happen during an active GC? I assume it will get invoked after GC Completes, will it get invoked after all finalizers complete for that GC Cycle or will these finalizers be simply discarded

Q3: What happens if an exception occurs in an Unload event handler and there other subscribers will they get invoked?

@clrjunkie
Copy link

Q4: Which thread will actually invoke the Unload event? Is it a dedicated thread spawned by the CLR? Will it have a name? Will the same thread be used to run event handlers for all AssemblyLoadContexts?

@davidfowl
Copy link
Member

AssemblyLoadContext is the wrong place for this method. Environment makes more sense. Are these calls actually tied to the AppDomain or the LoadContext?

kouvel referenced this issue in kouvel/corefx Jan 26, 2016
This change is dependent on dotnet/coreclr#2867, which actually implements the Unloading event.

API review: https://github.com/dotnet/corefx/issues/5205
@kouvel
Copy link
Member Author

kouvel commented Jan 27, 2016

@clrjunkie

Sorry for the delay in responding.

However, I still see a need to state the message more clearly mainly because there is a fine line between “undefined behavior after a certain point” and “anything goes”.

The runtime is not limited to any particular order of events here, and I don't see the necessity to specify and enforce a particular order of events around the Unload event. Simply, it is raised during normal program execution shortly before the AssemblyLoadContext is unloaded, whenever that is. In the initial implementation, since unloading of AssemblyLoadContext is not implemented yet, this would be at some point before shutdown, but this is likely to change and should not be depended upon. The only additional guarantee being given is that it's called on the finalizer thread, so as to be sure that it won't run concurrently with finalizers.

In general, I think people would reasonably except their finalizers to execute during the “lifetime of the program” and I don’t think this proposal should imply otherwise.

It is not reasonable to expect that finalizers for reachable objects run during shutdown. Hence, it is not reasonable to expect that finalizers run at all. The best way to ensure that resources are cleaned up is to use the dispose pattern to take control of the lifetime of those resources. If the resource lifetime is left up to the GC with a finalizer, then the program does not know when the object will be unreachable, implying that the program cannot reasonably expect the finalizers to run for any of those objects since they may be reachable at the time of shutdown.

What’s new in this proposal is that the “lifetime of the program” (e.g AssemblyLoadContext) is effectively shortened with respect to finalizers to the point when the Unload event occurs.

Finalizers may run after the Unload event. They wouldn't in the initial implementation, but only because the Unload events would all be raised just before shutdown, since unloading AssemblyLoadContexts is not implemneted yet. But that would not be the case once unloading AssemblyLoadContexts is implemented.

Than I would remove:
Because if there is no change in behavior than there is no need to restate the current behavior in the context of what's being changed.

Whether finalizers would run for unreachable objects, and whether a GC would be done to maximize the number of those objects, is a natural question pertaining to the proposal. It also describes why finalizers for unreachable objects are not run on shutdown, which I think is relevant.

an AssemblyLoadContext may not live until the end of the process, so its Unload event would be raised before shutdown
Don’t understand.

Once unloading AssemblyLoadContexts is implemented, an AssemblyLoadContext and assemblies loaded with it may be unloaded at an arbitrary point as requested by the user or as determined by the GC. Finalizers for finalizable objects in those assemblies soon to be unloaded may continue to run after the Unload event is raised. The process may not necessarily shut down soon after this.

(abrupt shutdown due to unhandled exception, process kill, etc., will not raise the event).
Suggest to put as a separate bullet point.

Agreed, done

But isn’t there a Primary one - the one first created when the application launches? I would except at least the primary one to raise last.

Why should the Unload event on the default AssemblyLoadContext be raised last? There may be multiple AssemblyLoadContext instances that live until the end of the process. Code handling the event is expected to be cleaning up resources (or doing something) relevant to its assembly since the event indicates end-of-life of the set of assemblies associated with the same load context, and any particular assembly's lifetime is associated with only one AssemblyLoadContext, so I don't see a need to specify/enforce any particular ordering. I think what you're looking for is a ProcessExit event (from the original proposal), it would make sense to raise this event after all of the Unload events. But do you have a scenario in mind where something needs to run after all other Unload event handlers?

I also suggest to replace the term "live" object with "referenced" object as it is better understood in .net speak.

I have replaced with "reachable" and "unreachable" to be more explicit, as a referenced object may still be collected if it's not reachable.

dropping a small 'thanks' for the effort put in to make this proposal clear and ensure important items are not missing as well as giving a heads-up to the DR would have been appropriate.

I apologize if I hadn't mentioned before, but I do appreciate the feedback you have given and the time you have taken for this. Thank you very much!

But that's already true today. During shutdown, the CLR runs all the finalizers, in an undefined order. If any of them crashes, the runtime crashes. If any of them hang, the CLR waits some time (AFAIK 40 seconds) and then terminates.

clrjunkie commented 14 days ago:

I think the whole "thread blocking" aspect doesn't make the scenario any more or any less problematic.
The problem is that currently Finilizer code is executed not when it's supposed too, that's the problem. whether threads are blocking or not is irrelevant.
The problem statement is:
"Finalizers of referenced objects execute during shutdown" period.
There nothing more to explain.
If you agree we can move to discuss your solution.

kouvel commented 14 days ago:

Yes that's the core of the problem, blocking threads is only a natural side-effect of it.

Don't think I got your point, can you elaborate?

kouvel referenced this issue in kouvel/corefx Jan 27, 2016
This change is dependent on dotnet/coreclr#2867, which actually implements the Unloading event.

API review: https://github.com/dotnet/corefx/issues/5205
kouvel referenced this issue in kouvel/coreclr Jan 27, 2016
API review: dotnet/corefx#5205

This change implements the proposal in the API review above:
- Don't block threads for shutdown
- Don't run finalizers on shutdown (for both reachable and unreachable objects)
- Provide a public AssemblyLoadContext.Unloading event that can be handled to clean up global resources in an assembly
  - As unloading an AssemblyLoadContext is not yet implemented, the event will for the time being be raised shortly prior to shutdown
@clrjunkie
Copy link

However, I still see a need to state the message more clearly mainly because there is a fine line between “undefined behavior after a certain point” and “anything goes.

I wrote this in response to your statement:

I believe it would be simplest to state that finalizers are not guaranteed to run.

I should have initially quoted that to make it clear. Nevertheless, I think making such statement is a mistake. Finalizers are user facing API’s thus they must have a contract. Your statement would be analogous to a sale of house where the seller adds a clause to the contract that he or she reserves the right to cancel the deal at any time – which makes the whole contract meaningless. More technically, this would be same as saying that a “Thread” is not guaranteed to run – what does that mean?? Sure, like any other contract there are exceptions (e.g shutdown, unhandled expectations etc..) and there is no need to note everything like fatal conditions or pulling out the plug, but I don’t understand how you can expose a public API and say in a sweeping way that it’s not guaranteed to work.

clrjunkie wrote:

In general, I think people would reasonably except their finalizers to execute during the “lifetime of the program” and I don’t think this proposal should imply otherwise.

Kouvel wrote:

It is not reasonable to expect that finalizers for reachable objects run during shutdown.

I agree.

Hence, it is not reasonable to expect that finalizers run at all.

I don’t understand this sweeping conclusion.

The best way to ensure that resources are cleaned up is to use the dispose pattern to take control of the lifetime of those resources. If the resource lifetime is left up to the GC with a finalizer, then the program does not know when the object will be unreachable, implying that the program cannot reasonably expect the finalizers to run for any of those objects since they may be reachable at the time of shutdown.

But that’s exactly an example of one disclaimer in the API contract. Shutdown is a user-initiated process (even if done by the o/s) – thus users must be made aware of what consequence this has on finalizers and what he or she can do about it, like calling GC.Collect and GC.WaitForPendingFinalizers to invoke clean up of objects that perhaps became unreachable long before the shutdown signal took place but still didn't have their finalizer run. What your saying is because of this edge case users shouldn’t except their finalizers to run in general which in my opinion is wrong. The general assumption is that the program is in running state thus, when finalizers run is irrelevant to the general expectation that at some point they will, unless edge cases occur (e.g shutdown, unhandled exception etc...).

Finalizers may run after the Unload event. They wouldn't in the initial implementation, but only because the Unload events would all be raised just before shutdown, since unloading AssemblyLoadContexts is not implemneted yet. But that would not be the case once unloading AssemblyLoadContexts is implemented.

I don’t understand. The whole point of the Unload event is to signal that shutdown has started; thus any finalizer of current unreachable object is not guaranteed to execute after it returns – that’s clear shutdown semantics as they apply to finalizers. Saying that, users may opt to call GC.Collect() and GC.WaitForPendingFinalizers to ensure all unreachable objects finalizers execute before the Unload method returns. When you now say that “Finalizers may run after the Unload event” it appears that you now introduce something new, then define what’s “Shutdown” and be specific on how users should reason about it with respect finalizer.

Once unloading AssemblyLoadContexts is implemented, an AssemblyLoadContext and assemblies loaded with it may be unloaded at an arbitrary point as requested by the user or as determined by the GC. Finalizers for finalizable objects in those assemblies soon to be unloaded may continue to run after the Unload event is raised. The process may not necessarily shut down soon after this.

Why would finalizers continue to run after the Unload Event? Since Unload is raised on the finalizer thread, why would that thread go and continue executing finalizer code of finalizable objects after the event handler returns? Seems counter to the whole idea of not running finalizers at shutdown.
I mean this would only be valid if the user called GC.Collect() when inside the event handler at which point I would expect the user to call GC.WaitForPendingFinalizers to guarantee that all finalizers complete by the time the event handler returns. If the user called GC.Collect() without calling GC.WaitForPendingFinalizers than it’s reasonable to state that finializer execution is undefined, and that’s now a user problem because Unload has been raised and there is no guarantee to running finalizer code to completion once the method returns.

But isn’t there a Primary one - the one first created when the application launches? I would except at least the primary one to raise last.
Why should the Unload event on the default AssemblyLoadContext be raised last?

I was thinking more about AppDomains since I don’t know the spec’ for AssemblyLoadContext or what problem exactly it’s trying solve. However, in the full .NET AppDomains can establish a Proxy / Stub relationship in which case I can imagine a scenario where the “default” (e.g Server) AppDomain would need to clean last after all “Client” AppDomains terminate even for the sake of avoiding handling the case where an in-proc “Server” disappears to the “Client”.
Then again I don’t know what is an AssemblyLoadContext or more importantly what’s its purpose.

I have replaced with "reachable" and "unreachable" to be more explicit, as a referenced object may still be collected if it's not reachable.

Agree.

Thank you very much!

No worries, your welcome!

But that's already true today. During shutdown, the CLR runs all the finalizers, in an undefined order. If any of them crashes, the runtime crashes. If any of them hang, the CLR waits some time (AFAIK 40 seconds) and then terminates.
clrjunkie commented 14 days ago:
I think the whole "thread blocking" aspect doesn't make the scenario any more or any less problematic.
The problem is that currently Finilizer code is executed not when it's supposed too, that's the problem. whether threads are blocking or not is irrelevant.
The problem statement is:
"Finalizers of referenced objects execute during shutdown" period.
There nothing more to explain.
If you agree we can move to discuss your solution.
kouvel commented 14 days ago:
Yes that's the core of the problem, blocking threads is only a natural side-effect of it.
Don't think I got your point, can you elaborate?

The point is that calling finalizer code during shutdown is problematic, especially for reachable objects as you just reaffirmed:

It is not reasonable to expect that finalizers for reachable objects run during shutdown.

Therefore, assuming that inventing a purpose dedicated “Shutdown Finalizer” that's would be called in no particular order would just follow an existing good behavior that’s already in the Full .NET FX is incorrect.

kouvel referenced this issue in kouvel/coreclr Jan 27, 2016
API review: dotnet/corefx#5205

This change implements the proposal in the API review above:
- Don't block threads for shutdown
- Don't run finalizers on shutdown (for both reachable and unreachable objects)
- Provide a public AssemblyLoadContext.Unloading event that can be handled to clean up global resources in an assembly
  - As unloading an AssemblyLoadContext is not yet implemented, the event will for the time being be raised shortly prior to shutdown
@kouvel
Copy link
Member Author

kouvel commented Jan 27, 2016

I don’t understand how you can expose a public API and say in a sweeping way that it’s not guaranteed to work.

This is a change in contract, it's not discarding the contract outright. I don't think this change is nearly as extreme as you make it out to be, especially considering that the previous finalizer contract during shutdown was flawed.

Hence, it is not reasonable to expect that finalizers run at all.
I don’t understand this sweeping conclusion.

Sorry, I meant to say it is not reasonable to expect that finalizers always run (due to the point mentioned before this).

But that’s exactly an example of one disclaimer in the API contract. Shutdown is a user-initiated process (even if done by the o/s) – thus users must be made aware of what consequence this has on finalizers and what he or she can do about it, like calling GC.Collect and GC.WaitForPendingFinalizers to invoke clean up of objects that perhaps became unreachable long before the shutdown signal took place but still didn't have their finalizer run. What your saying is because of this edge case users shouldn’t except their finalizers to run in general which in my opinion is wrong. The general assumption is that the program is in running state thus, when finalizers run is irrelevant to the general expectation that at some point they will, unless edge cases occur (e.g shutdown, unhandled exception etc...).

Consider a finalizable object. Either a program knows the lifetime of the object, or it doesn't.

  • Program knows the lifetime
    • Use dispose pattern
  • Program doesn't know the lifetime
    • Program cannot expect the object to be unreachable at shutdown time
    • Program should implement a finalizer to clean up resources early, for the case when it is determined by the GC that it is unreachable
    • Program may need to handle the case where the object is not finalized at shutdown time (object may be reachable)
      • Handle the Unload event to do cleanup
      • As the object may still be reachable, the program may expect the object to be used during and after the Unload event handler, and may need to account for that.

While it is a lot more tedious for a program to cleanly handle the case where the program doesn't know the object's lifetime, it is possible under this changed contract. Is there a gap here, or do you have a scenario that does not work under this changed contract?

Calling GC.WaitForPendingFinalizers would be a no-op in the Unload event handler, as it is called on the finalizer thread. When I mentioned GC.Collect and GC.WaitForPendingFinalizers, I mentioned it as a workaround for the case where the program initiates shutdown. In this case, ideally the program would shut down its background work, dispose its finalizable objects (since the end of lifetime is clear here), and shut down. Or as a workaround, the program could remove finalizable object references and call Collect/WaitForPendingFinalizers. The former is preferred, and sometimes may be necessary. For instance, if SIGTERM initiates shutdown, the expectation is that the program handles the Unload event to dispose its finalizable objects, but the Collect/WaitForPendingFinalizers workaround would not be available here, and I'm asserting that it should not be necessary in the first place to use this workaround. I could be wrong though, I'd like to get some more feedback on this before making further changes.

We're starting with something simple here, with a recommendation on how to cleanly handle shutdown. We're looking for feedback to see if there are gaps people are running into, and we can look into filling those gaps using any appropriate means, such as implementing a particular order, enforcing additional specification, adding a new API like ProcessExit, etc.

I was thinking more about AppDomains since I don’t know the spec’ for AssemblyLoadContext or what problem exactly it’s trying solve. However, in the full .NET AppDomains can establish a Proxy / Stub relationship in which case I can imagine a scenario where the “default” (e.g Server) AppDomain would need to clean last after all “Client” AppDomains terminate even for the sake of avoiding handling the case where an in-proc “Server” disappears to the “Client”.
Then again I don’t know what is an AssemblyLoadContext or more importantly what’s its purpose.

There is no inherent dependency between AssemblyLoadContexts that I'm aware of, outside of managing assembly lifetimes. It's up to the program to handle any dependencies that it creates by enforcing a particular order of cleanup.

Therefore, assuming that inventing a purpose dedicated “Shutdown Finalizer” that's would be called in no particular order would just follow an existing good behavior that’s already in the Full .NET FX is incorrect.

I agree that this has much the same problems as regular finaliers. But it may be a problem that needs to be solved somehow. I'll leave this for a separate issue.

@clrjunkie
Copy link

This is a change in contract, it's not discarding the contract outright. I don't think this change is nearly as extreme as you make it out to be, especially considering that the previous finalizer contract during shutdown was flawed.

It's not a question of whether the change is extreme or not, it's a question of clarity. I've been working with .NET since 2002 and believe me people who are doing anything "system-level" will debate this forever if it's not made clear.

Consider a finalizable object. Either a program knows the lifetime of the object, or it doesn't.

Can you please phrase this again from a user point of view.

Program doesn't know the lifetime

What exactly do you mean? That the developer released all references to the objects?

Program cannot expect the object to be unreachable at shutdown time

What exactly cannot be expected, that the object is collected, finializer run ?

Is there a gap here, or do you have a scenario that does not work under this changed contract?

Here is one:

Calling GC.WaitForPendingFinalizers would be a no-op in the Unload event handler, as it is called on the finalizer thread.

How would one know that? What the finelizer thread can or cannot do is internal CLR detail. Now it runs Unload, tomorrow maybe something else. it's now a fact that this thread doesn't run only finalization code.

Thinking about this a bit more, the Unload event is really needed when there is an external signal (like o/s shutdown) since programs don't just "shutdown" and the developer obviously has total control of the shutdown when it's initiated by the app itself. So why run Unload on the Finalizer thread and not on a different thread dedicated for that purpose. That way the users can ensure (if they choose) that all finalizers of unreachable objects run to completion by doing Collect/WaitForPendingFinalizers and explicitly dispose reachable objects, this seems like the most straight forward way to go about this whole process. It's exactly as if the app itself initiated the shutdown just from a different angle.

@kouvel
Copy link
Member Author

kouvel commented Jan 27, 2016

It's not a question of whether the change is extreme or not, it's a question of clarity. I've been working with .NET since 2002 and believe me people who are doing anything "system-level" will debate this forever if it's not made clear.

I'm all for clarity, where it is necessary to develop for a scenario to make it work as expected. We can argue about clarity but I'm not yet convinced that this needs further specification.

Can you please phrase this again from a user point of view.

Considering an object, a developer can choose to control the lifetime of the object, or the developer can choose to leave the lifetime up to the GC.

What exactly do you mean? That the developer released all references to the objects?

I mean in the case where the developer leaves the object's lifetime up to the GC

What exactly cannot be expected, that the object is collected, finializer run ?

I didn't intend on implying anything by the statement. As stated, it follows directly from the premise above the statement that it cannot be expected that the object is unreachable by the time of shutdown. If an object can be expected to be unreachable, then the program knows the lifetime of the object and that falls into the first case, use the dispose pattern.

How would one know that? What the finelizer thread can or cannot do is internal CLR detail. Now it runs Unload, tomorrow maybe something else. it's now a fact that this thread doesn't run only finalization code.

Thinking about this a bit more, the Unload event is really needed when there is an external signal (like o/s shutdown) since programs don't just "shutdown" and the developer obviously has total control of the shutdown when it's initiated by the app itself. So why run Unload on the Finalizer thread and not on a different thread dedicated for that purpose. That way the users can ensure (if they choose) that all finalizers of unreachable objects run to completion by doing Collect/WaitForPendingFinalizers and explicitly dispose reachable objects, this seems like the most straight forward way to go about this whole process. It's exactly as if the app itself initiated the shutdown just from a different angle.

Yea that is a possibility, and I agree it would be more consistent with normal shutdown. Could consider this change later based on further feedback. As such, I have removed the specification in the proposal that the event is raised on the finalizer thread (let's not assume that for now).

But doing this just to support Collect/Wait I don't think is a strong reason, for such objects the recommendation is to use the dispose pattern.

@clrjunkie
Copy link

But doing this just to support Collect/Wait I don't think is a strong reason, for such objects the recommendation is to use the dispose pattern.

But you can only use the dispose pattern on reachable objects. What if you use a library that allocates and de-allocates files or connections under the cover and uses finalizers to clean those up (as in your proposal example). In such case if you want to ensure that all those are properly teardown once Unload occurs than your only option would be to do Collect/Wait since that library left the lifetime of those objects up to the GC. You don't know what people write into the Destructor. I would argue that if you use a third-party library that does any kind of persistence or networking it would actually be good practice to do Collect/Wait before explicit shutdown or in a dedicated Unload thread. BTW, the motivating scenario for CriticalFinalizers was specifically FileStream to allow flushing buffers in "Regular Finalizer" and close handle in "Critical Finalizer" (see first link in my reply to @terrajobst)

@clrjunkie
Copy link

Furthermore, it's important to state that doing Collect/Wait is currently a no-op in the Unload event, since people would reasonably expect this work like in any other method call.

@kouvel
Copy link
Member Author

kouvel commented Jan 27, 2016

What if you use a library that allocates and de-allocates files or connections under the cover and uses finalizers to clean those up (as in your proposal example).

Based on this, I'm assuming the library is leaving the lifetime of these objects to the GC.

In such case if you want to ensure that all those are properly teardown once Unload occurs than your only option would be to do Collect/Wait since that library left the lifetime of those objects up to the GC.

With the GC managing the lifetime of these objects, relying on Collect/Wait in this case is not sufficient - it would not call finalizers for the reachable objects.

On the other hand, if you're saying some work will be done prior to calling Collect/Wait to ensure that these objects will not be reachable, then that work should also include disposing these objects.

I would argue that if you use a third-party library that does any kind of persistence or networking it would actually be good practice to do Collect/Wait before explicit shutdown or in a dedicated Unload thread.

Any number of consumers may handle the Unload event. For performance reasons, it would not be a good idea for many of them to call Collect/Wait. It should not be necessary, and I would recommend against this.

BTW, the motivating scenario for CriticalFinalizers was specifically FileStream to allow flushing buffers in "Regular Finalizer" and close handle in "Critical Finalizer" (see first link in my reply to @terrajobst)

Yes I intend on filing a separate issue to discuss SafeHandles, critical finalizers, etc. Some further fixes will need to be made for smooth cleanup of these.

Furthermore, it's important to state that doing Collect/Wait is currently a no-op in the Unload event, since people would reasonably expect this work like in any other method call.

If this turns out to be important in practice, I'd rather treat it as a bug.

@kouvel
Copy link
Member Author

kouvel commented Jan 27, 2016

With the GC managing the lifetime of these objects, relying on Collect/Wait in this case is not sufficient - it would not call finalizers for the reachable objects.

Adding to this, the library also does not know which of those objects are reachable.

jkotas referenced this issue in dotnet/coreclr Jan 28, 2016
API review: https://github.com/dotnet/corefx/issues/5205

This change implements the proposal in the API review above:
- Don't block threads for shutdown
- Don't run finalizers on shutdown (for both reachable and unreachable objects)
- Provide a public AssemblyLoadContext.Unloading event that can be handled to clean up global resources in an assembly
  - As unloading an AssemblyLoadContext is not yet implemented, the event will for the time being be raised shortly prior to shutdown

[tfs-changeset: 1569451]
@clrjunkie
Copy link

With the GC managing the lifetime of these objects, relying on Collect/Wait in this case is not sufficient - it would not call finalizers for the reachable objects.
On the other hand, if you're saying some work will be done prior to calling Collect/Wait to ensure that these objects will not be reachable, then that work should also include disposing these objects

clrjunkie wrote:

That way the users can ensure (if they choose) that all finalizers of unreachable objects run to completion by doing Collect/WaitForPendingFinalizers and explicitly dispose reachable objects

Any number of consumers may handle the Unload event. For performance reasons, it would not be a good idea for many of them to call Collect/Wait.

True, but that's a user concern... obviously, since events are ordered, one would call Collect/Wait in the last subscriber, which BTW is a motivation for having the default AssemblyLoadContext raise last.

Yes I intend on filing a separate issue to discuss SafeHandles, critical finalizers, etc. Some further fixes will need to be made for smooth cleanup of these.

I think the main takeaway is the user-scenario: people release FileStreams to the GC so there is no explicit way to call Dispose afterwards

If this turns out to be important in practice, I'd rather treat it as a bug.

I suggest you reconsider...

@clrjunkie
Copy link

Adding to this, the library also does not know which of those objects are reachable.

The library doesn't care about this, it delegates all cleanup concerns to the GC and I don't think the proposal should make any judgements on such library implementation, it's really out of the proposal scope. The proposal should not limit dealing with what may be considered as "non-optimal" library implementation or usage. If there is a need to do Collect/Wait during shutdown then the runtime needs to allow it.

@kouvel
Copy link
Member Author

kouvel commented Feb 2, 2016

I think the main takeaway is the user-scenario: people release FileStreams to the GC so there is no explicit way to call Dispose afterwards

Perhaps they should not be doing that. They should be disposing the FileStreams.

The library doesn't care about this, it delegates all cleanup concerns to the GC and I don't think the proposal should make any judgements on such library implementation, it's really out of the proposal scope.

So you're stating, but what if the library does care about this? I'm not trying to judge one way or another, I'm just stating that for objects whose lifetime is left up to the GC, the library does not know which of those are reachable or unreachable at the time of shutdown.

The proposal should not limit dealing with what may be considered as "non-optimal" library implementation or usage.

[Corrected] I would generally agree, on top of saying that the proposal should also not go out of its way to support non-optimal scenarios. While it would be nice to support Collect/Wait, it really would only make sense in the optimal scenarios where the program already takes control of lifetimes of objects that need to do some cleanup work, only using Collect/Wait instead of dispose. It's not a sufficient solution in the situation you described, where the program leaves these objects' lifetimes up to the GC.

If there is a need to do Collect/Wait during shutdown then the runtime needs to allow it.

I agree, however I'm questioning the need to do Collect/Wait. I don't see it.

@clrjunkie
Copy link

Perhaps they should not be doing that. They should be disposing the FileStreams.

Perhaps, but maybe some don’t. Tell me why C++ “Smart Pointer” was invented? People forget to call ‘delete’ (e.g dispose). It’s a fact. Putting aside JIT optimizations why SafeHandle was invented? Why Critical Finalizers was invented? Why Finalizers was invented? Exactly for similar reasons.

but what if the library does care about this?

Then there is no problem. The library provides a dispose method to clean-up after itself. I don’t understand your point.

I'm not trying to judge one way or another

Yes you are:

Perhaps they should not be doing that. They should be disposing the FileStreams.

I'm just stating that for objects whose lifetime is left up to the GC, the library does not know which of those are reachable or unreachable at the time of shutdown.

Here are the cases that I see as far as letting the Finalizer/GC end the lifetime of the handle, what you’re saying look like (c)

(a) Library delegates ‘handle’ close to the GC through finalizers – Library DOES NOT hold static references to objects holding open handles during shutdown

(b) Library delegates all ‘handle’ close to the GC through finalizers – Library DOES hold static references to some objects holding open handles during shutdown BUT provides a dispose method to explicitly dispose these reachable objects

(c) Library delegates all ‘handle’ close to the GC through finalizers – Library DOES hold static references to some objects holding open handles but DOES NOT provide a dispose method to explicitly dispose these reachable objects

My argument is that case (a) and (b) must be considered as possible use-cases and Collect/Wait is viable solution to those cases, case (c) is not interesting since it’s obviously a bug because there is no way close to the resource period.

on top of saying that the proposal should also not go out of its way to support non-optimal scenarios. While it would be nice to support Collect/Wait, it really would only make sense in the optimal scenarios where the program already takes control of lifetimes of objects that need to do some cleanup work.

The whole point of SafeHandles/CrtiticalFinalizer/Finlizers is to deal with “non-optimal” scenarios. I suggest you read the links I posted which considered these as “The most important features in .NET”

Now, the shutdown implementations does not allow those to be explicitly executed during shutdown which can actually be considered a regression from .NET FX if disallowing a "manual" invocation (e.g Collect/Wait)

only using Collect/Wait instead of dispose. It's not a sufficient solution in the situation you described, where the program leaves these objects' lifetimes up to the GC.

It does, for case (a) and after calling dispose in case (b).

I'm questioning the need to do Collect/Wait. I don't see it.

The whole point is about cases that you don't see. If there was, maybe it could be changed. The point is about not disabling an API that would otherwise be expected to work as within any other method call. The question is about symmetry and dealing with non-optimal situations.

@clrjunkie
Copy link

Just to make clear: case (b) involves a mix of some objects that are released to the 'GC' and some others that are reachable through static reference. Doing Collect/Wait would obviously impact only pending "unreachable" objects for finalization and can only supplement proper shutdown. The user is still responsible to call Stop/Shutdown explicitly on the Library. However; doing Collect/Wait ensures that all released objects by that library are properly destructed before the process terminates.

@clrjunkie
Copy link

Furthermore, consider your own example:

The finalizer would write a termination value to the stream and close the stream. Suppose that the termination value indicates to the receiving end that all data has been written, while abruptly closing the stream without writing the termination value would indicate incomplete transmission due to disconnection or some other reason.

How can a user ensure that this finalizer code is invoked before the process terminates without Collect/Wait? I mean obviously if you add a Dispose method and expect the user to call it then there would be no need for a Finalizer, unless you acknowledge that people do release objects to the GC for finalization or simply forget which is the whole point of the finalizer to begin with.

@kouvel
Copy link
Member Author

kouvel commented Feb 2, 2016

Got your feedback, thanks. As I agreed before, it would be good to raise the event on a different thread (not on the finalizer thread). As for Collect/Wait, I think our argument is going in circles, but I do see your point. At the very least, it would be somewhat of an easy workaround for programs that use finalizers in this way. I'll discuss with a few more people to determine what should be done if anything.

Thanks again for your time and patience, much appreciated!

@clrjunkie
Copy link

Your welcome.

MichalStrehovsky referenced this issue in MichalStrehovsky/corert Feb 19, 2016
Shutdown finalization was abandoned for .NET core.

See discussion in pull request dotnet#889 and dotnet/corefx#5205.
MichalStrehovsky referenced this issue in MichalStrehovsky/corert Feb 19, 2016
Shutdown finalization was abandoned for .NET core.

See discussion in pull request dotnet#889 and dotnet/corefx#5205.
@kouvel kouvel closed this as completed Feb 23, 2016
dotnet-bot referenced this issue in dotnet-bot/corert Apr 19, 2016
Remove shutdown finalization. It makes CoreRT/PN consistent with .NETCore / CoreCLR. The long story is in https://github.com/dotnet/corefx/issues/5205. The shutdown finalization was not used by UWP apps because they are never shutdown gracefully.

[tfs-changeset: 1597160]
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 1.0.0-rtm milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

No branches or pull requests

6 participants