Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc: upgrade aarch64-linux default from 9 to 11 #167726

Closed

Conversation

ajs124
Copy link
Member

@ajs124 ajs124 commented Apr 7, 2022

Description of changes

Closes #108305

This isn't my expertise, so idk if this is how and where this should be implemented. Also, this should probably be conditional, the way fedora does. Which is why the if is there, but I don't know the best place to put this conditional…

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@tpwrules
Copy link
Contributor

tpwrules commented Apr 10, 2022

So I took this PR out for a spin and unfortunately it seems like the issues which got the GCC 9 -> 10 upgrade reverted have not been solved. For example:

$ nix-build -A pkgs.icu
building '/nix/store/l8spxjiwyv6z3bp65vrp427h15vy18nf-icu4c-70.1.drv'...
[...]
config.status: creating tools/makeconv/makeconv.1
g++ -O2 -W -Wall -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long -std=c++11     -o ../../bin/makeconv gencnvex.o genmbcs.o makeconv.o ucnvstat.o -L../../lib -licutu -L../../lib -licui18n -L../../lib -licuuc -L../../stubdata -licudata -lpthread -lm  
/nix/store/v8djiq0kavq89k4wf5badp4iapihk808-binutils-2.38/bin/ld: ../../bin/makeconv: hidden symbol `__aarch64_ldadd4_acq_rel' in /nix/store/l6xpkxvvwv8vi1r2qqs1zdak8dm2hhrx-gcc-11.2.0/lib/gcc/aarch64-unknown-linux-gnu/11.2.0/libgcc.a(ldadd_4_4.o) is referenced by DSO
/nix/store/v8djiq0kavq89k4wf5badp4iapihk808-binutils-2.38/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:81: ../../bin/makeconv] Error 1
make[2]: Leaving directory '/build/icu/source/tools/makeconv'
make[1]: *** [Makefile:47: all-recursive] Error 2
make[1]: Leaving directory '/build/icu/source/tools'
make: *** [Makefile:153: all-recursive] Error 2

I also noticed this error trying to build the multiplatform GCC, which may provide a clue as I have seen that libgcc_s.so file referenced in some of the investigations:

$ nix-build -A pkgsCross.aarch64-multiplatform.buildPackages.gcc
building '/nix/store/yh2am5sqhz3qw58md8bh45jqw5if03q7-aarch64-unknown-linux-gnu-stage-static-gcc-11.2.0.drv'...
[...]
/nix/store/hyb7yk84m9h9x5h2dz8fdbzva1mi1w8r-bash-5.1-p16/bin/bash ../../../gcc-11.2.0/libgcc/../mkinstalldirs /nix/store/gax8cz16mw3nl2hw9jssc6kwlvhsqn7x-aarch64-unknown-linux-gnu-stage-static-gcc-11.2.0/lib/gcc/aarch64-unknown-linux-gnu/11.2.0/include
/nix/store/z24k8xi96925v4h0m8s8291q3k207ryj-coreutils-9.0/bin/install -c -m 644 unwind.h /nix/store/gax8cz16mw3nl2hw9jssc6kwlvhsqn7x-aarch64-unknown-linux-gnu-stage-static-gcc-11.2.0/lib/gcc/aarch64-unknown-linux-gnu/11.2.0/include
/nix/store/hyb7yk84m9h9x5h2dz8fdbzva1mi1w8r-bash-5.1-p16/bin/bash ../../../gcc-11.2.0/libgcc/../mkinstalldirs /nix/store/gax8cz16mw3nl2hw9jssc6kwlvhsqn7x-aarch64-unknown-linux-gnu-stage-static-gcc-11.2.0/lib/gcc/aarch64-unknown-linux-gnu/11.2.0/include
/nix/store/z24k8xi96925v4h0m8s8291q3k207ryj-coreutils-9.0/bin/install -c -m 644 ../../../gcc-11.2.0/libgcc/gcov.h /nix/store/gax8cz16mw3nl2hw9jssc6kwlvhsqn7x-aarch64-unknown-linux-gnu-stage-static-gcc-11.2.0/lib/gcc/aarch64-unknown-linux-gnu/11.2.0/include
make[2]: Leaving directory '/build/build/aarch64-unknown-linux-gnu/libgcc'
make[1]: Leaving directory '/build/build'
/nix/store/c80rclsar7sb49rxh9djmqi8dfyrhn5a-builder.sh: line 232: /nix/store/qkllm9wkyp38wh6lvjjy03srahimfx97-aarch64-unknown-linux-gnu-stage-static-gcc-11.2.0-lib/lib/libgcc_s.so: No such file or directory

On the other hand, the semi-random selection of packages I tried all built fine, with the exception of ones that gave that same sort of DSO error and link failure for aarch64 atomic symbols.

I also wonder if this should wait until #148539 is merged, assuming that can be done in a timely manner, so that we can avoid introducing potential problems on aarch64 where gfortran's version != gcc's version.

@ajs124
Copy link
Member Author

ajs124 commented Apr 12, 2022

I could have sworn I tested exactly that, but now it's failing for me, as well. Thanks for giving this a try though. I'll see if I can figure something out.

@vcunat
Copy link
Member

vcunat commented Apr 13, 2022

Note: (default?) gcc changes planned for 22.05 freeze in four days now.

@ajs124 ajs124 mentioned this pull request Apr 22, 2022
13 tasks
@Gaelan
Copy link
Contributor

Gaelan commented Apr 25, 2022

Alright, I figured out the bug:

This correctly patches gcc to include a linker script as libgcc_s.so. However, that's only one copy: glibc also has a libgcc_s.so, which is still a symlink. Horrifyingly, this one is copied straight from the bootstrap tools - see #36947.

@Gaelan
Copy link
Contributor

Gaelan commented Apr 25, 2022

See also #136574 (comment), which is the same issue for Clang.

@tpwrules tpwrules mentioned this pull request May 1, 2022
13 tasks
@Madouura
Copy link
Contributor

Madouura commented May 1, 2022

I feel between this and #170991 we may want to consider https://github.com/nixos/rfcs/blob/master/rfcs/0046-platform-support-tiers.md and default to GCC 11 for stdenv (tier 1), and sparingly use special cases such as this instead for tier 2 and higher platforms.

@tpwrules
Copy link
Contributor

Is this possible to test now? It would be great also to see a writeup about what the issue really was and how it is fixed. The commit history and guts of GCC are not clear to me.

@ajs124
Copy link
Member Author

ajs124 commented Oct 22, 2022

I think it's still broken, not exactly sure.
@Ma27 was looking into it and I asked @Ericson2314 some questions about it, but I sadly still don't really grasp the whole issue myself.

@vcunat
Copy link
Member

vcunat commented Oct 22, 2022

I tried building protobuf, but I didn't get there due to failing tests in dejagnu.

  • as seen in references, protobuf was one of packages where the issue was manifesting
  • I used the shared aarch64.nixos.community machine; binaries should remain there for some time

@tpwrules
Copy link
Contributor

Well I was able to build pkgs.icu as store path /nix/store/7w2jkljibv9z2jr0w8pszq4wmag7mh4p-icu4c-71.1 as above on my personal aarch64 machine, so progress is being made. The version did change though. I'll try later to build with GCC 11 with no bootstrap changes and see if it breaks again.

@Gaelan can you provide some context for the changes?

@tpwrules
Copy link
Contributor

I am able to confirm that these commits specifically fix pkgs.icu. If I just change GCC's version it does not build and gives the same errors I posted earlier. I will give protobuf a try too. The stdenv built fine for me, not sure exactly how dejagnu relates. Is this GCC's own test suite failing? Is that run by default?

@tpwrules
Copy link
Contributor

Unfortunately, it looks like the problem with dejagnu is not that the tests fail, but that the expect tool it uses for testing itself aborts during shutdown. I briefly investigated with GDB and the abort occurs as a result of pthread_exit() calling __libc_fatal().

This suggests there's still something fundamentally wrong with the approach proposed here.

@tpwrules
Copy link
Contributor

tpwrules commented Oct 23, 2022

I did some more investigation and expect dies with the secret message "libgcc_s.so.1 must be installed for pthread_exit to work" (evidently stderr is closed by the time it's printed; it's visible in strace though). If I add NIX_LDFLAGS = [ "-lgcc_s" ]; to expect, then dejagnu builds and the tests work and protobuf builds too.

So it looks like the crime isn't really that we copy libgcc_s.so into glibc, as it's necessary for when glibc needs to dynamically load it, but that it ends up with the bootstrap version. Can we extend the bootstrap to build a glibc with the right one? I'd guess that not including it and fixing all the packages that secretly need -lgcc_s in some cases is a path to great sadness.

@Ma27
Copy link
Member

Ma27 commented Oct 23, 2022

I'm happy to look into it ~next week, but no guarantees though.

@wegank wegank mentioned this pull request Dec 21, 2022
13 tasks
@ajs124 ajs124 closed this Jan 12, 2023
@ajs124 ajs124 deleted the fix/aarch64-linux-gcc10 branch January 12, 2023 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants