Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang's LLD is broken, again! #6126

Closed
HolyBlackCat opened this issue Jan 12, 2020 · 19 comments · Fixed by #6353
Closed

Clang's LLD is broken, again! #6126

HolyBlackCat opened this issue Jan 12, 2020 · 19 comments · Fixed by #6353

Comments

@HolyBlackCat
Copy link

After the last update of the llvm packages (9.0.0-3), LDD no longer works.
ld.lld.exe hangs indefinitely. (It miraculously worked once for me, but no luck since then.)

You can reproduce it simply by running

echo -e '#include <iostream>\nint main() {std::cout << "Hello!\\n";}' >a.cpp
clang++ -fuse-ld=lld a.cpp

This closely resembles bug #5231 that was already fixed.

@HolyBlackCat HolyBlackCat changed the title Clang's LDD is broken, again! Clang's LLD is broken, again! Jan 12, 2020
@mati865
Copy link
Collaborator

mati865 commented Jan 12, 2020

Ah I remember now, that's the same thing as in mstorsjo/llvm-mingw#11
Backporting llvm/llvm-project@564481a should fix it, otherwise #6098 has to be reverted.

cc @adrpo

@adrpo
Copy link
Contributor

adrpo commented Jan 12, 2020

Ouch! I guess I sort of expected some issues with allowing duplicate symbols when linking.
I will see if i can patch with llvm/llvm-project@564481a
If I can't do that I will try to link statically only lld.

@mati865
Copy link
Collaborator

mati865 commented Jan 12, 2020

I guess I sort of expected some issues with allowing duplicate symbols when linking.

This time it's a bug introduced somewhere around LLVM 8/9.

adrpo added a commit to adrpo/MINGW-packages that referenced this issue Jan 13, 2020
- apply patch from:
  llvm/llvm-project@564481a
- lld doesn't seem to hangs up anymore
adrpo added a commit to adrpo/MINGW-packages that referenced this issue Jan 13, 2020
- apply patch from:
  llvm/llvm-project@564481a
- lld doesn't seem to hangs up anymore
- fix CRLF to LF and update sums again
Alexpux added a commit that referenced this issue Jan 13, 2020
@lazka
Copy link
Member

lazka commented Jan 15, 2020

@HolyBlackCat Can you test again now?

@HolyBlackCat
Copy link
Author

@lazka Works for me!

@lazka
Copy link
Member

lazka commented Jan 15, 2020

Thanks!

@lazka lazka closed this as completed Jan 15, 2020
@mati865
Copy link
Collaborator

mati865 commented Jan 31, 2020

Unfortunately it's not fully fixed.
While building LLVM during Rust build I'm sometimes getting ld.lld hangs (over hour long).
Trying to debug hanged process unhangs it before I can get the stack trace.

@HolyBlackCat
Copy link
Author

HolyBlackCat commented Feb 4, 2020

@mati865 It sometimes hangs for me too, but not very often.

@mncat77
Copy link

mncat77 commented Feb 8, 2020

Can confirm that ld.lld still hangs around once every three times it is invoked in an unpredictable fashion for me.

@HolyBlackCat
Copy link
Author

@lazka This probably needs to be reopened.

@lazka lazka reopened this Feb 11, 2020
@HolyBlackCat
Copy link
Author

@adrpo What's the plan? It still hangs sometimes even with the patch, so the only option is to switch to static linking for LLD?

@mati865
Copy link
Collaborator

mati865 commented Feb 18, 2020

FYI there were more fixes in LLVM master for hangs on Windows but I don't have time to find them. They will be included in LLVM 10 and hopefully fix it for good.

@adrpo
Copy link
Contributor

adrpo commented Feb 18, 2020

@HolyBlackCat, I will have a look if is possible to static link just LLD.
@mati865, thanks for the info, I'll see if I can find them.

@mati865
Copy link
Collaborator

mati865 commented Feb 27, 2020

FTR here is stacktrace:

00000000`017be978 00007ffc`f5458ba3 : 00000000`000b0000 00007ffc`f7a8a867 00000000`000d0101 00000000`03890000 : ntdll!NtWaitForSingleObject+0x14
00000000`017be980 00000000`64942f6f : 00000000`00000002 00000000`000d0560 00000000`00000000 00000000`000002bc : KERNELBASE!WaitForSingleObjectEx+0x93
00000000`017bea20 00000000`64944c0f : 00000000`000b0000 00007ffc`f7a86139 00000000`00000000 00000000`64944ad3 : libwinpthread_1!pthread_mutex_lock+0xaf
00000000`017bea70 00000000`64946473 : 00000000`000b0000 00000000`000b0000 00000000`00000000 00000000`000bfa20 : libwinpthread_1!_pth_gpointer_locked+0x1f
00000000`017beab0 00000000`039d8b37 : 00000000`000cb0a0 00000000`053dfef0 00000000`7051b000 00000000`05f61a30 : libwinpthread_1!pthread_join+0x13
00000000`017beb00 00000000`6c45ad05 : 00000000`000bfa20 00000000`017bec98 00000000`00000000 00007ffc`e1419de0 : libstdc___6!ZNSt6thread4joinEv+0x17
00000000`017beb30 00000000`6c45b1ea : 00000000`00000003 00007ffc`f7a72bff 00000000`00000001 00000000`7ffe0385 : libLLVM!ZN4llvm14OptionRegistry8instanceEv+0x5d5
00000000`017bec20 00000000`6e777ac5 : 00000000`70577328 00000000`6e769963 00000000`053dff00 00000000`7051b000 : libLLVM!ZN4llvm14OptionRegistry8instanceEv+0xaba
00000000`017bec50 00000000`6c3c117c : 00000000`0000000d 00007ffc`f7a64ef7 00000000`00000000 00000000`00000001 : libLLVM!ZN4llvm16windows_manifest21WindowsManifestMerger25WindowsManifestMergerImpl13getParseErrorEv+0xfc25
00000000`017bec90 00000000`6c3c125d : 00000000`00000001 00007ffc`f4b8849e 00000000`00000000 00000000`00000000 : libLLVM+0x117c
00000000`017becf0 00007ffc`f7a650a1 : 00007ffc`00000000 00000000`00000000 00000000`00000001 00000000`7ffe0385 : libLLVM+0x125d
00000000`017bed40 00007ffc`f7aaab02 : 00000000`000b45a0 00000000`6c3c0000 00000000`00000000 00000000`6e768c70 : ntdll!LdrpCallInitRoutine+0x65
00000000`017bedb0 00007ffc`f7aaa9ad : 00000000`00000000 00000000`053e7ee0 00000000`00000035 00000000`00000000 : ntdll!LdrShutdownProcess+0x132
00000000`017beeb0 00007ffc`f603cd8a : 00000000`00000000 00000000`053e7ee0 00000000`6c49e250 00000000`00000035 : ntdll!RtlExitUserProcess+0xad
00000000`017beee0 00007ffc`f5fba245 : 00000000`7051b000 00000000`00000000 00000000`053e7ee0 00000000`00000001 : kernel32!ExitProcessImplementation+0xa
00000000`017bef10 00007ffc`f5fba8b5 : 00000000`00000001 00000000`6c4a0046 00000000`00000001 00000000`053e7ee0 : msvcrt!_crtExitProcess+0x15
00000000`017bef40 00000000`005cfc9f : 00000000`00000001 00000000`053e7ee0 00000000`00000000 00000000`053e8510 : msvcrt!doexit+0x171
00000000`017befb0 00000000`004128af : 00000000`053ecd40 00000000`006eb155 00000000`00000505 00000000`017bf1a0 : ld_lld+0x1cfc9f
00000000`017befe0 00000000`005503c1 : 00000000`0079e058 00000000`005d5e98 00000000`017bf410 00000000`017bf550 : ld_lld+0x128af
00000000`017bf070 00000000`0071f443 : 00000000`03c0d000 00000000`0073c893 00000000`00000000 00000000`00000000 : ld_lld+0x1503c1
00000000`017bfc80 00000000`004013b4 : 00000000`00000039 00000000`053d1820 00000000`007a6bb0 00000000`00000000 : ld_lld+0x31f443
00000000`017bfe30 00000000`0040150b : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ld_lld+0x13b4
00000000`017bff00 00007ffc`f6037bd4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ld_lld+0x150b
00000000`017bff30 00007ffc`f7aaced1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
00000000`017bff60 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

Probably a mingw-w64 bug.

@jeremyd2019
Copy link
Member

jeremyd2019 commented Oct 30, 2020

Here is a slightly better-symbolicated stack trace of the lld hang when dynamically linked to libLLVM.dll:

 # Child-SP          RetAddr               Call Site
00 0000009f`7effe9c8 00007ffb`299826ee     ntdll!NtWaitForSingleObject+0x14
01 0000009f`7effe9d0 00007ffb`1efd2f3f     KERNELBASE!WaitForSingleObjectEx+0x8e
02 0000009f`7effea70 00007ffb`1efd69b3     libwinpthread_1!pthread_mutex_lock+0xaf
03 0000009f`7effeac0 00007ffb`116259e7     libwinpthread_1!pthread_join+0x23
04 0000009f`7effeb20 00007ffa`e95118fd     libstdc___6!ZNSt6thread4joinEv+0x17
05 0000009f`7effeb50 00007ffa`e9511dba     libLLVM!atexit+0xa058d
06 0000009f`7effec40 00007ffa`eb9dbeb5     libLLVM!atexit+0xa0a4a
07 0000009f`7effec70 00007ffa`e947117c     libLLVM!_execute_onexit_table+0x55 [C:\_\M\mingw-w64-crt-git\src\mingw-w64\mingw-w64-crt\misc\onexit_table.c @ 67] 
08 0000009f`7effecb0 00007ffa`e947125d     libLLVM!_CRT_INIT+0x16c [C:\_\M\mingw-w64-crt-git\src\mingw-w64\mingw-w64-crt\crt\crtdll.c @ 141] 
09 0000009f`7effed10 00007ffb`2c087b3d     libLLVM!__DllMainCRTStartup+0x5d [C:\_\M\mingw-w64-crt-git\src\mingw-w64\mingw-w64-crt\crt\crtdll.c @ 205] 
0a 0000009f`7effed60 00007ffb`2c0b3de1     ntdll!LdrpCallInitRoutine+0x61
0b 0000009f`7effedd0 00007ffb`2c0b3c7d     ntdll!LdrShutdownProcess+0x141
0c 0000009f`7effeed0 00007ffb`2ae2e0ab     ntdll!RtlExitUserProcess+0xad
0d 0000009f`7effef00 00007ffb`2b95a155     KERNEL32!ExitProcessImplementation+0xb
0e 0000009f`7effef30 00007ffb`2b95a7c5     msvcrt!_crtExitProcess+0x15
0f 0000009f`7effef60 00007ff7`e643acf3     msvcrt!doexit+0x171
10 0000009f`7effefd0 00007ff7`e62444c0     ld_lld!atexit+0x2097f3
11 0000009f`7efff000 00007ff7`e63bdeb5     ld_lld!atexit+0x12fc0
12 0000009f`7efff090 00007ff7`e65980d2     ld_lld!atexit+0x18c9b5
13 0000009f`7efff4a0 00007ff7`e62313c1     ld_lld!__p__fmode+0x1510b2
14 0000009f`7efff650 00007ff7`e62314f6     ld_lld!__tmainCRTStartup+0x231 [C:\_\M\mingw-w64-crt-git\src\mingw-w64\mingw-w64-crt\crt\crtexe.c @ 335] 
15 0000009f`7efff720 00007ffb`2ae27034     ld_lld!mainCRTStartup+0x16 [C:\_\M\mingw-w64-crt-git\src\mingw-w64\mingw-w64-crt\crt\crtexe.c @ 214] 
16 0000009f`7efff750 00007ffb`2c09cec1     KERNEL32!BaseThreadInitThunk+0x14
17 0000009f`7efff780 00000000`00000000     ntdll!RtlUserThreadStart+0x21

Now, as for why this happens... This is pretty much the same as https://devblogs.microsoft.com/oldnewthing/20120427-00/?p=7763, except for the part where it says "The only options are to hang or crash. I think I’ll crash"... this implementation decided to hang instead 😛

No really... Process shutdown terminates all other threads. One of those threads must have been holding the mutex around winpthreads' thread id table. Now when the remaining thread tries to call pthread_join from its global destructors, winpthread tries to acquire that mutex, and since it was never released it waits forever.

The soluton? "The customer needs to restructure the program so that it either cleans up its thread pool work before the ExitProcess, or it can simply skip all thread pool operations when the reason for the DLL_PROCESS_DETACH is process termination." I'm not entirely sure how to do the latter... one hint I found is that the lpvReserved parameter will be non-NULL on DLL_PROCESS_DETACH if the process is exiting.

@lhmouse any thoughts on whether this is a bug in mingw-w64, or in LLVM?

@jeremyd2019
Copy link
Member

An interesting comment on that MSDN page on DllMain:

When handling DLL_PROCESS_DETACH, a DLL should free resources such as heap memory only if the DLL is being unloaded dynamically (the lpReserved parameter is NULL). If the process is terminating (the lpvReserved parameter is non-NULL), all threads in the process except the current thread either have exited already or have been explicitly terminated by a call to the ExitProcess function, which might leave some process resources such as heaps in an inconsistent state. In this case, it is not safe for the DLL to clean up the resources. Instead, the DLL should allow the operating system to reclaim the memory.

I wonder if this is a suggestion to only call the destructors (_execute_onexit_table) if lpvReserved is NULL?

@lhmouse
Copy link
Contributor

lhmouse commented Oct 30, 2020

On Linux, if a process calls exit() or returns from main(), the other threads are still running. The process will eventually exit despite them. It is not necessary to join with them, but it is cructial that they do not access destroyed (static) objects.

On Windows, if a process calls exit() or returns from main(), the other threads are terminated at unpredictable locations before TLS callbacks. This places significant limits on what DllMain() and TLS callbacks can do and what they must not do.

Your suggestion to bypass destructors upon process exit doesn't seem correct to me, as such behavior is required by Itanium ABI. My opinion is that LLD should wait for worker threads explicitly before returning from main(), or if really those threads are out of question, call ::Terminate(::GetCurrentProcess()) instead.

@adrpo
Copy link
Contributor

adrpo commented Nov 2, 2020

I actually tried:

#if defined(__MINGW32__)
  TerminateProcess(GetCurrentProcess(), ret);
#endif

and that hangs as well.

@adrpo
Copy link
Contributor

adrpo commented Nov 2, 2020

I actually tried:

#if defined(__MINGW32__)
  TerminateProcess(GetCurrentProcess(), ret);
#endif

and that hangs as well.

Never mind, the control flow in lld was weird, it didn't even get to my code.

adrpo added a commit to adrpo/MINGW-packages that referenced this issue Nov 3, 2020
- use the patch from jeremyd2019
- remove the changes to patch 302
- update check sums
adrpo added a commit to adrpo/MINGW-packages that referenced this issue Nov 3, 2020
- use the patch from jeremyd2019
- remove the changes to patch 302
- update check sums
- resolve ambigous wasm name by prefixing it with lld::
adrpo added a commit to adrpo/MINGW-packages that referenced this issue Nov 4, 2020
- use the patch from jeremyd2019
- remove the changes to patch 302
- update check sums
- resolve ambigous wasm name by prefixing it with lld::
- bump pgkrel, remove !strip
jeremyd2019 pushed a commit to adrpo/MINGW-packages that referenced this issue Nov 4, 2020
- use the patch from jeremyd2019
- remove the changes to patch 302
- update check sums
- resolve ambigous wasm name by prefixing it with lld::
- bump pgkrel, remove !strip

Fixes msys2#6126, fixes msys2#7190
lazka added a commit that referenced this issue Nov 5, 2020
fix lld hang (#6126) and dynlink clang to reduce size (#7190)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants