Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

g3log - Process hangs "because" of the LogWorker destructor #520

Closed
RoyalcraftDev opened this issue May 12, 2024 · 11 comments
Closed

g3log - Process hangs "because" of the LogWorker destructor #520

RoyalcraftDev opened this issue May 12, 2024 · 11 comments
Labels

Comments

@RoyalcraftDev
Copy link

Context

I was trying to fix an issue where my process would stay on even after executing. In short, my app uses methods from a native library I created and obviously uses g3log. Though, I didn't have in mind at all it would be the cause. For long hours, I was trying different things to get rid of the issue, but nothing really worked. I ended up using Process Hacker to find more information about that zombie process. In the process details, it showed one thread left. I naturally checked the stack and saw things related to a wait state. Then after reading further, I saw lines related to g3log, more precisely these lines were related to the destructor of the LogWorker.

Thread State: Wait:WrAlertByThreadId

0, wow64cpu.dll!TurboDispatchJumpAddressEnd+0x544
1, wow64cpu.dll!TurboDispatchJumpAddressEnd+0x421
2, wow64cpu.dll!BTCpuSimulate+0x9
3, wow64.dll!Wow64KiUserCallbackDispatcher+0x4b9
4, wow64.dll!Wow64LdrpInitialize+0x12d
5, ntdll.dll!LdrInitShimEngineDynamic+0x31f1
6, ntdll.dll!LdrInitializeThunk+0x1db
7, ntdll.dll!LdrInitializeThunk+0x63
8, ntdll.dll!LdrInitializeThunk+0xe
9, ntdll.dll!ZwWaitForAlertByThreadId+0xc (No unwind info)
10, KernelBase.dll!SleepConditionVariableSRW+0x23 (No unwind info)
11, msvcp140d.dll!std::_Winerror_message+0x213 (No unwind info)
12, msvcp140d.dll!std::_Winerror_message+0x1d5 (No unwind info)
13, msvcp140d.dll!Cnd_wait+0x32 (No unwind info)
14, libNative.dll!std::condition_variable::wait+0x2c
15, libNative.dll!std::_Associated_state<int>::_Wait+0x76
16, libNative.dll!std::_State_manager<int>::wait+0x5f (No unwind info)
17, libNative.dll!g3::LogWorker::~LogWorker+0x81 (No unwind info)
18, libNative.dll!g3::LogWorker::`scalar deleting destructor'+0x16
19, libNative.dll!std::default_delete<g3::LogWorker>::operator()+0x34
20, libNative.dll!std::unique_ptr<g3::LogWorker,std::default_delete<g3::LogWorker> >::~unique_ptr<g3::LogWorker,std::default_delete<g3::LogWorker> >+0x2b
21, libNative.dll!`ECHO::Log::init'::`2'::`dynamic atexit destructor for 'logWorker''+0xd

Afterwards, I checked in the source code of g3log and looked at the destructor of it and found these lines:

auto bg_clear_sink_call = [this] { _impl._sinks.clear(); };
auto token_cleared = g3::spawn_task(bg_clear_sink_call, _impl._bg.get());
token_cleared.wait();

(My) Solution

I didn't really think too much about it, I just commented the lines, and after recompiling the library, the process was correctly exiting!

But...

I doubt that commenting lines of code like that is very nice, it feels more like a hack than anything else. I'd like to know if there is maybe something that I did wrong that resulted in this behavior or just a bug inside the library.

Additional Context

Platform: Windows 10
Language Version: C++14
Compiler: MSVC

Note: The codebase of the native library is quite old, around 2011 and so does the version of g3log used. After checking I saw files from 2011-2015 of g3log

@KjellKod
Copy link
Owner

KjellKod commented May 12, 2024

  1. Does it happen for normal exits also or just during crashes?

At g3log exit, it flushes all sinks (default, homemade or through g3sinks) until all messages are taken care of.

  1. Have you observed if messages are still pouring into your log file during the "hang"? If so, that would mean it's trying to empty the queues before exiting

  2. G3log is also slowing down a fatal exit by the same behavior. Technically most actions after a fatal exit signal is in the domain of undefined behavior - it might be defined for a particular system/compiler etc but that's not knowledge handled within g3log. It COULD be this that causes it - it can also be a plethora of other things like improper thread initialization in DLLs, improper forking - thread handling, weird system library linking etc.

  3. Regardless, yes commenting out the wait will lead to faster exit of the system as it's not waiting for the sink to properly shut down. (Could your sink be stuck? Or even be the place where it's crashing?)
    --

Assuming the system has used g3log for a substantial amount of time - it's "unusual" that something like this would surface now unless other things in the system setup changed.

Just my thoughts about it.

@RoyalcraftDev
Copy link
Author

RoyalcraftDev commented May 13, 2024

This issue happens during a normal exit. When a crash occurs in the DLL it will crash the whole application as well.

I call a native destroy method which shuts down everything inside the native library including the logger. It will not log anything else after I destroyed it.

I want to also note that I have a version which is an executable by itself instead of a native library for tests. So it handles basically all the calls I was doing in my .NET app in C++, here I don't really have issues, when the code reaches the end, it properly exits.

@RoyalcraftDev
Copy link
Author

RoyalcraftDev commented May 13, 2024

Essentially, a better solution would be to completely free the library out of my process' memory. This was my initial attempt to fix the issue but even if I use the FreeLibrary from kernel32.dll, it doesn't really do much except denying me from using the library unless I call back the LoadLibrary function.

@RoyalcraftDev
Copy link
Author

RoyalcraftDev commented May 13, 2024

Essentially, a better solution would be to completely free the library out of my process' memory. This was my initial attempt to fix the issue but even if I use the FreeLibrary from kernel32.dll, it doesn't really do much except denying me from using the library unless I call back the LoadLibrary function.

After destroying the Logger in the native code and "releasing" the library, the handle of the log file remains until the process exits. Which could mean that it is in some way still active.
Log Handle remains

@KjellKod
Copy link
Owner

KjellKod commented May 13, 2024

... a version which is an executable by itself instead of a native 
library for tests. So it handles basically all the calls I was doing
 in .NET app in C++, here I don't really have issues, when the code
 reaches the end, it properly exits.

And this version is what was running successfully with g3log, I gather?

And now there's an attempt to change the setup and using dlls which is now experiencing issues?

**
A Quick Look at closed issues (there's more) show these common culprits for dlls and g3log setup
**

If you search for closed issues and DLL you can find some common issues with DLLs and Windows when working with g3log

Issue #486 incompatible versions debug release

One thing I’ve seen before is using DLLs in a way that breaks windows standard. Initialization should happpen in the main executable and not in the DLL

Another issue I’ve seen is when a static instantiations that are internal to g3log is attempted to be shared shared across DLL. DLL static initialization and destruction are a mess in Windows and most devs struggle to get this right

Other issues I’ve seen

  1. The DLL is owning the g3log instance through a static setup —> Fix: the main program needs to own the G3log instance

  2. The G3log (thread) initialization call is handled in the “pre” startup hook routine —> on all platforms (OSX, Linux,Windows) this is unsafe as these pre start routines must never have threaded construct / mutex etc in them

  3. The G3log shutdown call or destruction of the instance is handled in the “post” hook routine. Same issue here as in the 2) above

  4. DllMain(...) use and threads
    Windows DLL environment cannot be used at all for setup and breakdown of threaded constructs through DllMain(...)

  5. Although not a DLL issue per se, it’s similar in nature. Using forking (I've never done this in Windows but assume is possible) and threaded structures is a recipe for disaster

@RoyalcraftDev
Copy link
Author

RoyalcraftDev commented May 13, 2024

In fact, I also have an Android version of the native code, compiled as a shared object and made to be ran by an android application and once again, I don't have any issues here either. This case isn't much different I'm reproducing what the android platform does by creating a DLL interface and using C# for actually running the native methods.

So can't really instantiate the logger within the managed code. It might just as well be a platform issue by the fact that I use a DLL instead of an executable though I really feel like this shouldn't happen since a DLL isn't much different from an executable though I might be wrong on this.

@KjellKod
Copy link
Owner

If I understand you correctly you are instantiating g3log inside the DLL.

How and when and where are you instantiating the logger object?

@RoyalcraftDev
Copy link
Author

RoyalcraftDev commented May 14, 2024

Initializing the logger is almost the first thing it does.
image

@KjellKod
Copy link
Owner

KjellKod commented May 14, 2024 via email

@ghost
Copy link

ghost commented May 14, 2024

I think I understand what's going on.

g3log relies on proper RAII initialization for destruction. Here the logger instance is hidden as a static object within a function, so at process exit the destructor is never called, and therefore the logging sinks aren't flushed.

How static instances are destroyed is AFAIK defined per each specific system.

Some windows specific work around that specifically touched this are mentioned in

#88 (comment)
#91

@KjellKod
Copy link
Owner

Seems like the root cause for this was that initialization and destruction of the logger was not conforming to expected usage. https://kjellkod.github.io/g3log/g3log_usage.html#adding-and-removing-sinks Windows dll work arounds as handled in #88 (comment) and #91 were given.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants