-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGFPE in private __libc_early_init in glibc 2.34+ #5437
Comments
This is what was reported on the list at https://groups.google.com/g/DynamoRIO-Users/c/CKQD11eXyfs and was about to be filed, so this will serve as the tracking issue. Re: dr$sim online hanging on a pipe: be sure there isn't a stale pipe file from a prior aborted run which can cause such a hang. |
Oh, I only searched through the Github issues.
I made sure to delete the old pipe with |
Pasting key info from https://groups.google.com/g/DynamoRIO-Users/c/CKQD11eXyfs Tested on commit 5e13602, Arch Linux x86_64 (glibc package version is 2.35-3) and Ubuntu 21.10 x86_64 (glibc-bin package version is 2.34-0ubuntu3.2). It looks the problem is related to #5134. The below output is commit 5e13602 on Ubuntu 21.10.
Probably dl_tls_static_size and dl_tls_static_align (on glibc/nptl/nptl-stack.h:58 of 2.34-0ubuntu3.2's glibc source) are zero.
So without the __libc_early_init call, glibc 2.32 crashes; but with it 2.34 crashes? Can't win. There must be some other magic hardcoded initialization done specially for libc in 2.34 by ld.so?? |
Sorry for the delay in filing the Issue. I tested on Ubuntu 22.04 daily build, but drrun gives SIGFPE. |
That comment is running without any client: the SIGFPE is only with a client that imports from libc. It does reproduce on Ubuntu 22.04 with a libc-importing client. |
It looks that glibc 2.34 added a line which is This article says glibc 2.34 removed libpthread, and it integrated into libc.so.6. dynamorio's loader will fail to load libpthread.so. |
Unfortunately glibc is going the direction of Android with Bionic with tight integration between the loader and libpthread with hardcoded, private dependences between them such that the loader cannot easily be replaced for private loading as it no longer uses clean public interfaces to load libc and libpthread. This is why Android support breaks with each release as they change the internal TLS layout: #3543, #3683 |
Here is a proposal for avoiding DR having to perform custom undocumented Instead of DR being the private ld.so and loading the client lib and all We create a "client executable" ELF file ("cliex") with ifuncs for the How is the client lib loaded: dynamically by the cliex, and the cliex What about client libs with no dependences other than libdynamorio? Do we Xref #1285 Mac private loader: though there there is an issue with having |
@johnfxgalea @abhinav92003 looking for some feedback on which way to go here; The 3rd solution is to drop private library support completely and try to provide |
Yeah, redirection and the copying of own library versions (to avoid resource conflict and re-entrant issues). I don't think dropping support of the private loader is the best solution... AFAIR, DR has limited support for disabling the private loader but one has to stay dealing with gcc xflags. I like your proposed cliex solution, although I'm not sure how resolving ifuncs on windows would work in a nice fashion.
Are you concerned about performance of the proposed solution here? My first impression was, in the long run, to always use the cliex approach to help with maintainability, but not really sure whether to keep the two. |
In case it wasn't clear, the proposal is not to eliminate private library copies isolated from app libraries, but to eliminate DR as the loader of those private libraries and instead use a private copy of ld.so.
This would be only for Linux + Android.
It would be simpler with one approach: but part of me likes having the fallback of a scheme that has no dependence on changes in ld.so/libc/libpthread for no-dep clients. Maybe those are so rare nowadays that it's not worth the maintenance burden. |
For workarounds until a long-term solution is developed: For some of the simpler C clients, setting |
It seems dynamorio doesn't run on ubuntu 22.04. This is a major bummer, as our summer research interns were hoping to use dynamorio on modern ubuntu 22.04. I'd rather not have to back up to ubuntu 20.04 |
Please consider helping to solve the problem; having more contributors and active maintainers in the community helps tremendously. Also note that core DR and no-external-library clients should work fine on 22.04 (see #5437 (comment)). |
My apologies for the comment yesterday. We'll try to make DR play for our needs on 22.04. Anything involving glibc triggers repressed nightmares from decades ago. |
Adds a workaround for the SIGFPE in glibc 2.34+ __libc_early_init() by setting two ld.so globals located via hardcoded offsets, making this fragile and considered temporary. Tested on glibc 2.34 where every libc-using client crashes with SIGFPE but they work with this fix. Adds an Ubuntu22 GA CI run but if we have failures due to other reasons the plan is to drastically shrink the tests run or abandon if it's too much work right now. Issue: #5437
I'm surprised nobody else has put effort into solving this. Today I tried writing a reasonable value into the two ld vars identified above, which fixes the SIGFPE and allows the clients I tested to work as they had before. This is rather hacky as a hardcoded value is needed for the var GLRO offsets: unless someone knows of a way to find them more cleanly (decoding some exported function to find an offset would be a little better)? This is PR #5695. Summarizing the situation:
|
Adds a workaround for the SIGFPE in glibc 2.34+ __libc_early_init() by setting two ld.so globals located via hardcoded offsets, making this fragile and considered temporary. (Improvements might include decoding __libc_early_init or other functions to find the offsets, which is also fragile; making runtime options to set them for a non-rebuild fix; disabling the call to __libc_early_init which doesn't seem to be needed for 2.34). Tested on glibc 2.34 where every libc-using client crashes with SIGFPE but they work with this fix. Adds an Ubuntu22 GA CI run but it has many failures due to the rseq issue #5431. Adds a workaround for this by having drrun set -disable_rseq if it detects glibc 2.35+. Even with this we have a number of test failures so for now we use a label to just run 4 sanity-check tests. This should be enough to detect glibc changes that break the offsets here. Issue: #5437, #5431
Updates DR to cacb5424e for workarounds for 2 Ubuntu22 issues (glibc SIGFPE and rseq failure). Issue: DynamoRIO/dynamorio#5437, DynamoRIO/dynamorio#5431
Updates DR to cacb5424e for workarounds for 2 Ubuntu22 issues (glibc SIGFPE and rseq failure). Issue: DynamoRIO/dynamorio#5437, DynamoRIO/dynamorio#5431
I tried to run my client with the proposed workaround. Now I get hang on release build on Ubuntu 22.04 instead of crash. The debug build reports the following error:
GDB backtrace:
P.S. Commented out all code in my client except Boost options parsing that I statically link with. It works well. I'll continue uncommenting code part by part to determine what breaks the client. |
As you can see there is a SIGSEGV. The assert in synch.c on processing the SIGSEGV is a secondary effect. I would suggest getting a callstack of the SIGSEGV point and debugging from there, as well as callstacks of all threads for the release build hang. If this is not related to the private loader (the crash/hang is not in a private library) please open a separate issue. |
Now I am getting SIGFPE on 32 bit client:
|
Adds the same workaround for the SIGFPE in glibc 2.34+ __libc_early_init() as for 64-bit in PR #5695: we hardcode the 32-bit offsets of the two globals written by the workaround. Tested on glibc 2.34 where every libc-using client crashes with SIGFPE but they work with this fix. Adds an Ubuntu22 GA CI 32-bit run. Issue: #5437
Adds the same workaround for the SIGFPE in glibc 2.34+ __libc_early_init() as for 64-bit in PR #5695: we hardcode the 32-bit offsets of the two globals written by the workaround. Tested on glibc 2.34 where every libc-using client crashes with SIGFPE but they work with this fix. Adds an Ubuntu22 GA CI 32-bit run. Issue: #5437
@derekbruening, thank you! Workaround in #5902 resolved my issue. |
Feedback from #6693 (comment) on wanting to keep the use of the system glibc:
|
Updates the __libc_early_init offsets for 32-bit glibc 2.38 for the private loader workaround of a glibc initialization issue. Tested locally. Issue: #5437
Adds decoding of __libc_early_init to find the glibc private loader workaround offsets we need. The heuristic is to look for the last large (>0x100) load offset before the first DIV. Tested on 2.38 64-bit and 32-bit. Also updates the __libc_early_init offsets for 32-bit glibc 2.38 as a fallback. Tested these locally. Issue: #5437
Updates DR to the latest to get private loader fixes for recent glibc versions. Issue: DynamoRIO/dynamorio#5437 Fixes #2507
Updates DR to the latest to get private loader fixes for recent glibc versions. Issue: DynamoRIO/dynamorio#5437 Fixes #2507
Hello again! Kubuntu 24.04.01 glibc version: 2.39 |
Засунул его в контейнер из-за проблем с совместимостью Желательно при разработке польковаться glibc-3.3 или меньше. О подробностях читать тут: DynamoRIO/dynamorio#5437
Continue past the SIGILL to get to the SIGSEGV to get a callstack (if it's
a SIGSEGV (or SIGBUS) on a safe read, continue past that as well: sometimes
there are multiple in debug builds). See
https://dynamorio.org/page_debugging.html#autotoc_md142.
…On Sat, Sep 28, 2024 at 5:32 AM Kafanov Stepan ***@***.***> wrote:
Hello again!
I'm from this issue #7008
<#7008>
I tried to use the latest DR version (10.93.19987) on my system and I
still get an error on Kubuntu.
On Fedora 40 in container this problem was fixed!
For this time error looks like this
image.png (view on web)
<https://github.com/user-attachments/assets/4a790f67-0a8b-4432-922f-43f59dd66e76>
but inside with using GDB it's the same
image.png (view on web)
<https://github.com/user-attachments/assets/f4bf964c-8bcd-43ce-a34f-1f7df3aa0653>
BTW in 4-level logs I can see "os_file_exists failed: 0xfffffffffffffffe"
error
Kubuntu 24.04.01 glibc version: 2.39
Fedora 40 glibc version: 2.39
Fedora 39 glibc version: 2.38
—
Reply to this email directly, view it on GitHub
<#5437 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRVIMXTFVGTLMMJIHH3NVLZYZZTJAVCNFSM6AAAAABJKT7JWOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBQGU4DCNJUGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Describe the bug
This bug may not only affect drcachesim but also drmemory, drcpusim and probably other clients as well.
When I run drcachesim like this:
./drrun -disable_rseq -t drcachesim -offline -- ls
I get a SIGFPE:[1] 2736 floating point exception (core dumped) ./drrun -disable_rseq -t drcachesim -offline -- ls
To Reproduce
Steps to reproduce the behavior:
ls
should work./drrun -disable_rseq -t drcachesim -offline -- ls
Please also answer these questions:
What happens when you run without any client?
Without any client works (thanks to -disable_rseq)
What happens when you run with debug build ("-debug" flag to drrun/drconfig/drinject)?
Same behaviour
Expected behavior
No crash
Screenshots or Pasted Text
Versions
What version of DynamoRIO are you using?
current master (562e797) and also 9.0.1
Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem?
No
What operating system version are you running on?
Manjaro Linux (derivative of Arch Linux)
Is your application 32-bit or 64-bit?
64bit
Additional context
This time, I wasn't able to test glibc 2.33, so it's not clear if this is also related to glibc 2.35.
Logs:
log.0.3045.txt
ls.0.3045.txt
When I run without -offline another issue occurs. DynamoRIO hangs while waiting on a pipe:
I will eventually also create an issue for this.
The text was updated successfully, but these errors were encountered: