-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NativeAOT symbols are broken on linux when publishing Release #77407
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Do you have repro steps? I just debugged a coredump in #77522 (comment) and stacks looked fine. I can't find where we would be passing |
Repro is in an Ubuntu 22.04 WSL instance with RC2, publishing helloworld using Afterwards I open up the exe in GDB and I get the following:
Function names are still around, but line number and file info is lost, as well as the symbol table. You're right about
|
AH, my mistake, it's not the linking command that makes a difference, it's passing |
The DWARF specification states that the form of an exprloc consists of an unsigned LEB128 length value, followed by the encoded location bytes of the specified length. For some reason we were adding one to the length value being emitted. This looks incorrect to me. The above calculation for REG-REG (a variable stored in two registers) correctly calculates the length of each register type tag, plus the size of the interpolating PIECE tags, plus the size of notation for each register. The extra byte looks wrong. I've tested this locally and it appears to resolve dotnet/runtime#77407. Unfortunately, it also causes llvm-dwarfdump --verify to constantly complain about missing base addresses. I can't confirm at the moment, but my suspicion is that this is revealing an existing bug. Even if this is somehow causing a new bug, I think the resulting symbols with this change are better than the alternative (no working symbols at all).
The DWARF specification states that the form of an exprloc consists of an unsigned LEB128 length value, followed by the encoded location bytes of the specified length. For some reason we were adding one to the length value being emitted. This looks incorrect to me. The above calculation for REG-REG (a variable stored in two registers) correctly calculates the length of each register type tag, plus the size of the interpolating PIECE tags, plus the size of notation for each register. The extra byte looks wrong. I've tested this locally and it appears to resolve dotnet/runtime#77407. Unfortunately, it also causes llvm-dwarfdump --verify to constantly complain about missing base addresses. I can't confirm at the moment, but my suspicion is that this is revealing an existing bug. Even if this is somehow causing a new bug, I think the resulting symbols with this change are better than the alternative (no working symbols at all).
The DWARF specification states that the form of an exprloc consists of an unsigned LEB128 length value, followed by the encoded location bytes of the specified length. For some reason we were adding one to the length value being emitted. This looks incorrect to me. The above calculation for REG-REG (a variable stored in two registers) correctly calculates the length of each register type tag, plus the size of the interpolating PIECE tags, plus the size of notation for each register. The extra byte looks wrong. I've tested this locally and it appears to resolve dotnet/runtime#77407. Unfortunately, it also causes llvm-dwarfdump --verify to constantly complain about missing base addresses. I can't confirm at the moment, but my suspicion is that this is revealing an existing bug. Even if this is somehow causing a new bug, I think the resulting symbols with this change are better than the alternative (no working symbols at all). (cherry picked from commit b85b64b)
The DWARF specification states that the form of an exprloc consists of an unsigned LEB128 length value, followed by the encoded location bytes of the specified length. For some reason we were adding one to the length value being emitted. This looks incorrect to me. The above calculation for REG-REG (a variable stored in two registers) correctly calculates the length of each register type tag, plus the size of the interpolating PIECE tags, plus the size of notation for each register. The extra byte looks wrong. I've tested this locally and it appears to resolve dotnet/runtime#77407. Unfortunately, it also causes llvm-dwarfdump --verify to constantly complain about missing base addresses. I can't confirm at the moment, but my suspicion is that this is revealing an existing bug. Even if this is somehow causing a new bug, I think the resulting symbols with this change are better than the alternative (no working symbols at all).
Re-opening to track servicing fix. |
The DWARF specification states that the form of an exprloc consists of an unsigned LEB128 length value, followed by the encoded location bytes of the specified length. For some reason we were adding one to the length value being emitted. This looks incorrect to me. The above calculation for REG-REG (a variable stored in two registers) correctly calculates the length of each register type tag, plus the size of the interpolating PIECE tags, plus the size of notation for each register. The extra byte looks wrong. I've tested this locally and it appears to resolve dotnet/runtime#77407. Unfortunately, it also causes llvm-dwarfdump --verify to constantly complain about missing base addresses. I can't confirm at the moment, but my suspicion is that this is revealing an existing bug. Even if this is somehow causing a new bug, I think the resulting symbols with this change are better than the alternative (no working symbols at all). (cherry picked from commit b85b64b)
The fix is in the LLVM objwriter release branch and we picked up the change in release/7.0 of this repo. I think this can be closed. |
Did this fix make it into .NET 7.0.1? I'm still running into this problem in a non-trivial project with the 7.0.1 tooling. (Interestingly, upgrading from 7.0.0 to 7.0.1 did fix this issue for a simple Hello World app, but I'm not sure if it's a direct result of this change or if it was a complete fluke.) |
I think it will only be in 7.0.2. 7.0.1 happened in mid-november: https://github.com/dotnet/runtime/milestone/107?closed=1 and this change only landed after: dotnet/llvm-project#321 |
* Apply llvm.patch Taken from https://github.com/dotnet/runtime/blob/7ab969c84ef05ba948c0075392716ce335b47744/src/coreclr/tools/aot/ObjWriter/llvm.patch. * Add objwriter library * Taken from https://github.com/dotnet/runtime/tree/7ab969c84ef05ba948c0075392716ce335b47744/src/coreclr/tools/aot/ObjWriter. * Updated README.md * Updated CMakeLists.txt to remove reference to CORECLR_INCLUDE_DIR. * Added cordebuginfo.h, cvconst.h, cfi.h from coreclr/inc at the above commit. * Build the ObjWriter package * Add ObjWriter API to set DWARF version (#161) Contributes to https://github.com/dotnet/runtimelab/issues/1738. * Add `.note.GNU-stack` section to produced executables (#162) Do this unconditionally because there's no scenario where we would need executable stack for managed code. * Remove Darwin workaround (#163) This caught my attention as I was looking at the ObjWriter. LLVM no longer emits a `LC_VERSION_MIN_MACOSX` load command unless we explicitly set a version. I don't see a difference in `llvm-objdump -macho -x foo.o` with/without these lines (I didn't bother myself to boot into macOS to run `otool`). * Fix llvm-dwarfdump warnings (#164) Fixes https://github.com/dotnet/runtimelab/issues/1535. No warnings left with llvm-dwarfdump from LLVM 12. * Revert "Fix llvm-dwarfdump warnings (#164)" (#218) This reverts commit afc9070. * Add new NuGet package, `Microsoft.NETCore.Runtime.JIT.Tools`, includes `FileCheck` and `llvm-mca` (#256) https://github.com/dotnet/runtime is wanting to start writing assembly (x64/ARM64) verification tests. Instead of building our own tool to support writing those kinds of tests, we want to leverage LLVM's `FileCheck`. We also want to include `llvm-mca` at the request of @EgorBo This PR creates a new NuGet package for `dotnet/runtime` to consume which we named `Microsoft.NETCore.Runtime.JIT.Tools`. So far, this package only includes LLVM's `FileCheck` and `llvm-mca` tools. * [ObjWriter] Enable DWARF debug information emitting for Mach-O (#269) * Account for GOT VariantKind on osx-arm64 (#185) * Add API for emitting compact unwind encoding, enforce DWARF encoding if not explicitly overridden * Add comment * Update ObjWriter to LLVM 14 API * Add support for generating uninitialized sections (#306) We support `.bss` but not custom sections that are bss-like. This adds such support. * Do not indiscriminately create text section (#312) If we ended up with nothing in the text section, this line would error LLVM out in: https://github.com/dotnet/llvm-project/blob/3db8d68195c17386557f1a258312bbae4051dc05/llvm/lib/MC/ELFObjectWriter.cpp#L1458-L1459 Because we generate a reference to the empty text section in the `aranges` section. I double checked and debugging on Linux still works fine without this. `SetCodeSectionAttribute` is an objwriter API and we have access to it from the managed side. We should be calling it from there if it's needed for something that I didn't realize (we do call it from the managed side for the `.managed` section, but that one actually has debug information generated, unlike `.text`). * Fix off-by-one error in DWARF reg-reg location (#317) The DWARF specification states that the form of an exprloc consists of an unsigned LEB128 length value, followed by the encoded location bytes of the specified length. For some reason we were adding one to the length value being emitted. This looks incorrect to me. The above calculation for REG-REG (a variable stored in two registers) correctly calculates the length of each register type tag, plus the size of the interpolating PIECE tags, plus the size of notation for each register. The extra byte looks wrong. I've tested this locally and it appears to resolve dotnet/runtime#77407. Unfortunately, it also causes llvm-dwarfdump --verify to constantly complain about missing base addresses. I can't confirm at the moment, but my suspicion is that this is revealing an existing bug. Even if this is somehow causing a new bug, I think the resulting symbols with this change are better than the alternative (no working symbols at all). * Setting context object file info * Add verbosity to linux x64 pipeline In order to understand what is happening with std path error. * Revert "Add verbosity to linux x64 pipeline" This reverts commit 5c4636e. * Upgrading linux build image * [Temporary] Adding verbosity to get more pipeline error info * Update image name for linux x64 * Fix Linux x64 build * Revert "[Temporary] Adding verbosity to get more pipeline error info" This reverts commit 9d76b36. * Updating Build_Linux_musl timeout * Update linux-musl Docker images * Fix linux-musl-x64 build * Setting clang/++ version 15 for linux musl * Copying clang/clan++ vars to unix-like OS * Fix cut & paste error * Fix objcopy and strip path in cross-compilation * Update azure-pipelines.yml $(ClangVersion) $(ClangPlusVersion) weren't defined for OSX and should be defined for every Linux * Bump timeout for Linux musl build * Clean up .gitignore * Consolidate Clang[Plus]Version into ClangVersionArg * Move CLANG_TARGET from environment into build parameter Always quote _BuildConfig on command line so empty value is not accidentally using next parameter as the value * Update URL in cordebuginfo.h to point to dotnet/runtime * Bump Windows build timeout to 210 * Fix a typo in compiler name * Revert $(_BuildConfig) -> "$(_BuildConfig)" change * Change ClangTarget to ClangTargetArg since apparently it gets propagated as environment variable into wrong steps * Fix inadvertent change * Bump timeout everywhere --------- Co-authored-by: Michal Strehovský <MichalStrehovsky@users.noreply.github.com> Co-authored-by: Andy Gocke <andy@commentout.net> Co-authored-by: Will Smith <lol.tihan@gmail.com> Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com> Co-authored-by: Brian Bohe <brianbohe@gmail.com> Co-authored-by: Alexander Köplinger <alex.koeplinger@outlook.com>
I would expect symbols to be worse, but not broken in this configuration.
It looks like change is caused by passing-O
to clang when linking the binaries. Without-O
, debugging symbols are present and functional. With-O
, all symbols and stacks are broken.It's something in the DWARF info emitted by ILC.
The text was updated successfully, but these errors were encountered: