Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm17/rust: random variations in binaries #72206

Open
bmwiedemann opened this issue Nov 14, 2023 · 19 comments
Open

llvm17/rust: random variations in binaries #72206

bmwiedemann opened this issue Nov 14, 2023 · 19 comments
Labels
incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.)

Comments

@bmwiedemann
Copy link

While working on reproducible builds for openSUSE, I found that
various rust packages produced different binaries in every build,
even when trying to run builds as similar as possible.

versions:

  • openSUSE Tumbleweed 20231108
  • libLLVM17-17.0.4
  • rust-1.73

affected packages:

  • contrast
  • difftastic
  • rage-encryption
  • sad
  • tiny

Diffs (from filterdiff strings {1,2}/usr/bin/sad) look like this:

 _ZN12aho_corasick6packed5teddy7compile7Builder5build17h1ec13fda1b911f51E
-_ZN71_$LT$sad..argparse..Arguments$u20$as$u20$clap_builder..derive..Args$GT$12augment_args17hb63754f2b3f23d81E.llvm.7060706947469140631
 _ZN9hashbrown11rustc_entry62_$LT$impl$u20$hashbrown..map..HashMap$LT$K$C$V$C$S$C$A$GT$$GT$11rustc_entry17h3404a67c5ebadd16E
...
 _ZN4core3ptr37drop_in_place$LT$sad..types..Fail$GT$17h4a94ee46810be113E.llvm.8642040941601995225
+_ZN71_$LT$sad..argparse..Arguments$u20$as$u20$clap_builder..derive..Args$GT$12augment_args17hb63754f2b3f23d81E.llvm.5074700338430727270
 _ZN5tokio7runtime9scheduler6Handle5spawn17he9d3e7b13daae14fE
...
 anon.f8976536234e6da06679650d2fbaaa4b.3.llvm.2679463096021464399
-anon.cf22f61d4b59a658db753519f084dfc2.29.llvm.7060706947469140631
 anon.cdafd13cf3071da50e39cc55ae689763.5.llvm.15702125857252448827
@@ -40685,6 +40684,7 @@
 anon.865f8e79028a90f3ec901722c5bc9a01.1751.llvm.141406229195693483
 anon.a3ee08d29b32c89866319fa543dc03d1.3.llvm.17006684412476199951
 _ZN61_$LT$std..io..stdio..StdoutLock$u20$as$u20$std..io..Write$GT$5flush17ha20f2e5d3ef392cdE
+anon.cf22f61d4b59a658db753519f084dfc2.29.llvm.5074700338430727270
 rust_panic

so it seems there is some random number appended and that influences ordering.

@asl asl added incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.) and removed new issue labels Nov 14, 2023
@asl
Copy link
Collaborator

asl commented Nov 14, 2023

Have you reported this issue to Rust? Are you sure it's an LLVM problem and not Rust frontend?

@bmwiedemann
Copy link
Author

bmwiedemann commented Nov 14, 2023

Not yet. It was not clear to me how to distinguish LLVM/rust problems. And most past rust problems turned out to be in libLLVM. e.g. rust-lang/rust#57041

@bmwiedemann
Copy link
Author

bmwiedemann commented Nov 14, 2023

btw: re-building llvm 17.0.4 itself produces similar diffs in random appendages in one file:

--- old /usr/lib64/libomp.so (disasm)
+++ new /usr/lib64/libomp.so (disasm)
@@ -1793,22 +1793,22 @@
        mov    offset(%rip),%r15        #   <kmp_e_debug@@VERSION-0xae3c>
        cmpl   $something,(%r15)
        jl     <__kmp_initialize_bget + ofs>
-       lea    -offset(%rip),%rdi        #   <.str.76.llvm.10039950945441166082>
-       lea    -offset(%rip),%rdx        #   <.str.1.llvm.10039950945441166082>
+       lea    -offset(%rip),%rdi        #   <.str.76.llvm.10444269058502487673> 
+       lea    -offset(%rip),%rdx        #   <.str.1.llvm.10444269058502487673>

@saethlin
Copy link

saethlin commented Aug 6, 2024

tiny has actual nondeterminisim in its builds because it iterates over a std::collections::HashMap to produce the output of a proc macro. The standard library HashMap's hasher is seeded with system entropy.
https://github.com/osa1/tiny/blob/ee8615a55256b242b02e6f8a2350ca7f39aca517/crates/term_input_macros/src/tree.rs#L6

Also I've been unable to locate the source for this sad package. Which project exactly are you referring to?

@thesamesam
Copy link
Member

thesamesam commented Aug 6, 2024

Looks like it's https://github.com/ms-jpq/sad, looking at the opensuse package.

@saethlin
Copy link

saethlin commented Aug 6, 2024

Thanks! The nondeterminisim there seems to come from its build script, which executes python3 and asks for system entropy that way: https://github.com/ms-jpq/sad/blob/6c65d52211e79298c0e1b1496a7287e98cb8e813/build.rs#L18

I would not be surprised if all the above crates are causing you trouble because of implementation decisions in their source, not issues with the toolchain. I will not be debugging any more here.

@bmwiedemann
Copy link
Author

I filed rust-lang/rust#128675 yesterday and the rust people say it might be the build tools' fault. Which it apparently was for sad + tiny, but that was not obvious from the diff that had llvm random/hash IDs all over the place.

That still leaves the part about llvm's own libomp.so that should be independent of rust. Should I file a new issue for that one?

@workingjubilee
Copy link
Contributor

As documented with the description of symbol-mangling, this is done by LLVM during LTO: https://github.com/rust-lang/rust/blob/b586701f78a6d5c7f618b76e7ae3cace9a6fbf37/src/doc/rustc/src/symbol-mangling/v0.md?plain=1#L996-L1013

What LTO settings are you using, and what linker?

@bmwiedemann
Copy link
Author

https://github.com/bmwiedemann/openSUSE/blob/master/packages/l/llvm18/llvm18.spec#L1091 says we use -DLLVM_ENABLE_LTO=Thin probably with -Wl,--thinlto-jobs=8

build log shows the line that creates libomp as

/home/abuild/rpmbuild/BUILD/llvm-18.1.6.src/stage1/bin/clang++ -fPIC -fno-plt -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -Wall -fcolor-diagnostics -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic -fno-semantic-interposition -fdata-sections -std=c++11 -O2 -g -DNDEBUG  -Wl,--build-id=sha1   -Wl,--as-needed -Wl,--no-undefined -Wl,-z,now -Wl,-z,defs -Wl,-z,nodelete -flto=thin -shared -Wl,-soname,libompd.so -o lib64/libompd.so projects/openmp/libompd/src/CMakeFiles/ompd.dir/TargetValue.cpp.o projects/openmp/libompd/src/CMakeFiles/ompd.dir/omp-debug.cpp.o projects/openmp/libompd/src/CMakeFiles/ompd.dir/omp-state.cpp.o projects/openmp/libompd/src/CMakeFiles/ompd.dir/omp-icv.cpp.o  lib64/libomp.so  -lm  -ldl

@workingjubilee
Copy link
Contributor

that's not LLVM 17?

can you verify that the libomp.so output is non-reproducible with the same build command passed each time?

@bmwiedemann
Copy link
Author

I always compare with the same build tools. The libomp issue just affects several llvm versions in the same way.

Here is the log from llvm17:

[7985/9562] : && /home/abuild/rpmbuild/BUILD/llvm-17.0.6.src/stage1/bin/clang -fPIC -fno-plt -fPIC -fno-semantic-interposition -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -Wall -Wcast-qual -Wformat-pedantic -Wimplicit-fallthrough -Wsign-compare -Wno-enum-constexpr-conversion -Wno-extra -Wno-pedantic -O2 -g -DNDEBUG  -Wl,--build-id=sha1 --ld-path=/home/abuild/rpmbuild/BUILD/llvm-17.0.6.src/stage1/bin/ld.lld  -Wl,--as-needed -Wl,--no-undefined -Wl,-z,now -Wl,-z,defs -Wl,-z,nodelete -Wl,--color-diagnostics -flto=thin  -Wl,--as-needed -Wl,--version-script=/home/abuild/rpmbuild/BUILD/llvm-17.0.6.src/projects/openmp/runtime/src/exports_so.txt -static-libgcc -Wl,-z,noexecstack -shared -Wl,-soname,libomp.so -o lib64/libomp.so projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_alloc.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_atomic.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_csupport.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_debug.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_itt.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_environment.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_error.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_global.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_i18n.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_io.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_runtime.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_settings.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_str.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_tasking.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_threadprivate.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_utility.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_barrier.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_wait_release.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_affinity.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_dispatch.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_lock.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_sched.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_collapse.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/z_Linux_util.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_gsupport.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/thirdparty/ittnotify/ittnotify_static.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_taskdeps.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_cancel.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_ftn_cdecl.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_ftn_extra.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_version.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/ompt-general.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/ompd-specific.cpp.o projects/openmp/runtime/src/CMakeFiles/omp.dir/z_Linux_asm.S.o  -lm  -ldl && cd /home/abuild/rpmbuild/BUILD/llvm-17.0.6.src/build/lib64 && /usr/bin/cmake -E create_symlink libomp.so libgomp.so && /usr/bin/cmake -E create_symlink libomp.so libiomp5.so

@workingjubilee
Copy link
Contributor

I see. Does altering the number of ThinLTO jobs or not using ThinLTO change this?

@workingjubilee
Copy link
Contributor

...hm. Apparently the build winds up passing -flto=thin twice?

/home/abuild/rpmbuild/BUILD/llvm-18.1.6.src/stage1/bin/clang++
-fPIC
-fno-plt
-fPIC
-fno-semantic-interposition
-fvisibility-inlines-hidden
-Wall
-Wextra
-Wno-unused-parameter
-Wwrite-strings
-Wcast-qual
-Wmissing-field-initializers
-pedantic
-Wno-long-long
-Wc++98-compat-extra-semi
-Wimplicit-fallthrough
-Wno-noexcept-type
-Wnon-virtual-dtor
-Wdelete-non-virtual-dtor
-Wsuggest-override
-Wstring-conversion
-Wmisleading-indentation
-Wctad-maybe-unsupported
-fdiagnostics-color
-ffunction-sections
-fdata-sections
-flto=thin
-Wall
-fcolor-diagnostics
-Wcast-qual
-Wformat-pedantic
-Wimplicit-fallthrough
-Wsign-compare
-Wno-enum-constexpr-conversion
-Wno-extra
-Wno-pedantic
-fno-semantic-interposition
-fdata-sections
-std=c++11
-O2
-g
-DNDEBUG
-Wl,--build-id=sha1
-Wl,--as-needed
-Wl,--no-undefined
-Wl,-z,now
-Wl,-z,defs
-Wl,-z,nodelete
-flto=thin
-shared
-Wl,-soname,libompd.so
-o
lib64/libompd.so
projects/openmp/libompd/src/CMakeFiles/ompd.dir/TargetValue.cpp.o
projects/openmp/libompd/src/CMakeFiles/ompd.dir/omp-debug.cpp.o
projects/openmp/libompd/src/CMakeFiles/ompd.dir/omp-state.cpp.o
projects/openmp/libompd/src/CMakeFiles/ompd.dir/omp-icv.cpp.o
lib64/libomp.so
-lm
-ldl

and both -pedantic and -Wno-pedantic?

@saethlin
Copy link

saethlin commented Aug 6, 2024

Which it apparently was for sad + tiny, but that was not obvious from the diff that had llvm random/hash IDs all over the place.

If there is fault, it lies with the authors of those codebases. Not the build tools.

I debugged both of those examples very quickly by diffing the disassembly with diff <(objdump -d sad1) <(objdump -d sad2), looking up the instruction where things started changing, then reading the source code for those functions. I suspect a similar approach would let you either fix all the rest of these cases, or at least it would eliminate one cause.

@bmwiedemann
Copy link
Author

I think, building without lto gives a clearer view on the diff, that makes debugging the actual issue much easier. e.g. I was looking at https://rb.zq1.de/compare.factory-20240731/diffs/sad-compare.out that had 25 unrelated diffs from lto and rust's codegen-units=16 in the disassembly.

@bmwiedemann
Copy link
Author

Without LTO, libomp.so becomes reproducible.

@workingjubilee
Copy link
Contributor

There seems to be a -use-source-filename-for-promoted-locals flag that makes the promoted names deterministic, but I can't find any documentation for it or where you would pass that flag.

@bmwiedemann
Copy link
Author

This is the only hint I can find: https://github.com/llvm/llvm-project/blob/main/llvm/test/ThinLTO/X86/promote-local-name.ll#L14

 llvm-lto -use-source-filename-for-promoted-locals -thinlto-action=import %t.bc -thinlto-index=%t2.bc -o - ...

@workingjubilee
Copy link
Contributor

It seems the names added by ThinLTO are based on a hash of the module's IR, which suggests to me that it should be more deterministic than it actually seems to be? It seems like it would be best to refocus your efforts on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.)
Projects
None yet
Development

No branches or pull requests

5 participants