Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object files can only be linked with legacy clang linker on Apple #8730

Open
sezna opened this issue Jun 1, 2024 · 20 comments
Open

Object files can only be linked with legacy clang linker on Apple #8730

sezna opened this issue Jun 1, 2024 · 20 comments

Comments

@sezna
Copy link

sezna commented Jun 1, 2024

Hello, thank you very much for all of the incredible software y'all are writing. I'm currently using Cranelift to generate native binaries for my project, and I'm continually impressed at the level of detail and effort that goes into these things. Not to mention the super helpful Zulip chat.

In the aforementioned chat, I was receiving help from @bjorn3 and @cfallin on generating executables for an Apple silicon Mac. The following code does not successfully execute on an M-series processor Mac: https://gist.github.com/sezna/e358a9167d4ef3063f6cfe4ce4c2ad37

Using the legacy linker, that is, adding -Wl -ld_classic to the clang invocation, does work. However, as the legacy linker will likely be going away someday, I'm filing this issue in the hopes that we can figure out why it doesn't work with the current linker. Note that cg_clif also uses -Wl -ld_classic, so when the legacy linker is dropped, cg_clif could also have issues.

@philipc
Copy link
Contributor

philipc commented Jun 2, 2024

I've reproduced the issue with arm64. If I change to use x86_64 it works with either linker.

The linker error is:

0  0x100fa2f2c  __assert_rtn + 72
1  0x100ee9ec4  ld::InputFiles::SliceParser::parseObjectFile(mach_o::Header const*) const + 22976
2  0x100ef6404  ld::InputFiles::parseAllFiles(void (ld::AtomFile const*) block_pointer)::$_7::operator()(unsigned long, ld::FileInfo const&) const + 420
3  0x1975dc440  _dispatch_client_callout2 + 20
4  0x1975f1544  _dispatch_apply_invoke_and_wait + 224
5  0x1975f084c  _dispatch_apply_with_attr_f + 1180
6  0x1975f0a38  dispatch_apply + 96
7  0x100f741e0  ld::AtomFileConsolidator::parseFiles(bool) + 292
8  0x100f12b08  main + 9252
ld: Assertion failed: (pattern[0].addrMode == addr_other), function addFixupFromRelocations, file Relocations.cpp, line 700.
clang: error: linker command failed with exit code 1 (use -v to see invocation)

clang generates relocations like this:

      18: 90000000     	adrp	x0, 0x0 <ltmp0+0x18>
		0000000000000018:  ARM64_RELOC_PAGE21	l_.str
      1c: 91000000     	add	x0, x0, #0
		000000000000001c:  ARM64_RELOC_PAGEOFF12	l_.str
      20: 94000000     	bl	0x20 <ltmp0+0x20>
		0000000000000020:  ARM64_RELOC_BRANCH26	_puts

cranelift generates:

      1c: 90000000     	adrp	x0, 0x0 <_main+0xc>
		000000000000001c:  ARM64_RELOC_GOT_LOAD_PAGE21	_hello_world
      20: f9400000     	ldr	x0, [x0]
		0000000000000020:  ARM64_RELOC_GOT_LOAD_PAGEOFF12	_hello_world
      24: 90000002     	adrp	x2, 0x0 <_main+0x14>
		0000000000000024:  ARM64_RELOC_GOT_LOAD_PAGE21	_puts
      28: f9400042     	ldr	x2, [x2]
		0000000000000028:  ARM64_RELOC_GOT_LOAD_PAGEOFF12	_puts
      2c: d63f0040     	blr	x2

If I hack cranelift to generate this:

      1c: 90000000     	adrp	x0, 0x0 <_main+0xc>
		000000000000001c:  ARM64_RELOC_PAGE21	_hello_world
      20: 91000000     	add	x0, x0, #0
		0000000000000020:  ARM64_RELOC_PAGEOFF12	_hello_world
      24: 90000002     	adrp	x2, 0x0 <_main+0x14>
		0000000000000024:  ARM64_RELOC_PAGE21	_puts
      28: 91000042     	add	x2, x2, #0
		0000000000000028:  ARM64_RELOC_PAGEOFF12	_puts
      2c: d63f0040     	blr	x2

then the linker error changes to:

ld: invalid use of ADRP in '_main' to '_puts'

If I hack cranelift to generate this:

      1c: 90000000     	adrp	x0, 0x0 <_main+0xc>
		000000000000001c:  ARM64_RELOC_PAGE21	_hello_world
      20: 91000000     	add	x0, x0, #0
		0000000000000020:  ARM64_RELOC_PAGEOFF12	_hello_world
      24: 94000000     	bl	0x24 <_main+0x14>
		0000000000000024:  ARM64_RELOC_BRANCH26	_puts

then it links and runs successfully.

Complete hack:

diff --git a/cranelift/codegen/src/isa/aarch64/abi.rs b/cranelift/codegen/src/isa/aarch64/abi.rs
index 922e4ebfc..ea0bc22f3 100644
--- a/cranelift/codegen/src/isa/aarch64/abi.rs
+++ b/cranelift/codegen/src/isa/aarch64/abi.rs
@@ -1027,7 +1027,8 @@ impl ABIMachineSpec for AArch64MachineDeps {
     ) -> SmallVec<[Inst; 2]> {
         let mut insts = SmallVec::new();
         match &dest {
-            &CallDest::ExtName(ref name, RelocDistance::Near) => insts.push(Inst::Call {
+            //&CallDest::ExtName(ref name, RelocDistance::Near) => insts.push(Inst::Call {
+            &CallDest::ExtName(ref name, _) => insts.push(Inst::Call {
                 info: Box::new(CallInfo {
                     dest: name.clone(),
                     uses,
@@ -1039,6 +1040,7 @@ impl ABIMachineSpec for AArch64MachineDeps {
                     callee_pop_size,
                 }),
             }),
+/*
             &CallDest::ExtName(ref name, RelocDistance::Far) => {
                 insts.push(Inst::LoadExtName {
                     rd: tmp,
@@ -1058,6 +1060,7 @@ impl ABIMachineSpec for AArch64MachineDeps {
                     }),
                 });
             }
+*/
             &CallDest::Reg(reg) => insts.push(Inst::CallInd {
                 info: Box::new(CallIndInfo {
                     rn: *reg,
diff --git a/cranelift/codegen/src/isa/aarch64/inst/emit.rs b/cranelift/codegen/src/isa/aarch64/inst/emit.rs
index f9df1a8d2..0f89903a4 100644
--- a/cranelift/codegen/src/isa/aarch64/inst/emit.rs
+++ b/cranelift/codegen/src/isa/aarch64/inst/emit.rs
@@ -3177,6 +3177,7 @@ impl MachInstEmit for Inst {
                     let inst = Inst::Adrp { rd, off: 0 };
                     inst.emit(sink, emit_info, state);
 
+/*
                     // ldr rd, [rd, :got_lo12:X]
                     sink.add_reloc(Reloc::Aarch64Ld64GotLo12Nc, &**name, 0);
                     let inst = Inst::ULoad64 {
@@ -3185,6 +3186,17 @@ impl MachInstEmit for Inst {
                         flags: MemFlags::trusted(),
                     };
                     inst.emit(sink, emit_info, state);
+*/
+                    // add rd, rd, :lo12:X
+                    sink.add_reloc(Reloc::Aarch64Ld64GotLo12Nc, &**name, 0);
+                    Inst::AluRRImm12 {
+                        alu_op: ALUOp::Add,
+                        size: OperandSize::Size64,
+                        rd,
+                        rn: rd.to_reg(),
+                        imm12: Imm12::maybe_from_u64(0).unwrap(),
+                    }
+                    .emit(sink, emit_info, state);
                 } else {
                     // With absolute offsets we set up a load from a preallocated space, and then jump
                     // over it.
diff --git a/cranelift/object/src/backend.rs b/cranelift/object/src/backend.rs
index 2d48e9b61..921c5273e 100644
--- a/cranelift/object/src/backend.rs
+++ b/cranelift/object/src/backend.rs
@@ -764,7 +764,8 @@ impl ObjectModule {
                     r_type: object::elf::R_AARCH64_ADR_GOT_PAGE,
                 },
                 object::BinaryFormat::MachO => RelocationFlags::MachO {
-                    r_type: object::macho::ARM64_RELOC_GOT_LOAD_PAGE21,
+                    //r_type: object::macho::ARM64_RELOC_GOT_LOAD_PAGE21,
+                    r_type: object::macho::ARM64_RELOC_PAGE21,
                     r_pcrel: true,
                     r_length: 2,
                 },
@@ -775,7 +776,8 @@ impl ObjectModule {
                     r_type: object::elf::R_AARCH64_LD64_GOT_LO12_NC,
                 },
                 object::BinaryFormat::MachO => RelocationFlags::MachO {
-                    r_type: object::macho::ARM64_RELOC_GOT_LOAD_PAGEOFF12,
+                    //r_type: object::macho::ARM64_RELOC_GOT_LOAD_PAGEOFF12,
+                    r_type: object::macho::ARM64_RELOC_PAGEOFF12,
                     r_pcrel: false,
                     r_length: 2,
                 },

So it appears that cranelift isn't generating the correct relocations for ARM64 on macOS. I'll let those who know more about this stuff fix it properly.

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 27, 2024

A simple

#[inline(never)]
pub unsafe fn foo() -> (unsafe extern "C" fn(), &'static u8) {
    bar(); (bar, &FOO)
}

extern "C" {
    fn bar();
    static FOO: u8;
}

produces

rust_out.o:     file format mach-o arm64

Disassembly of section __TEXT,__text:

0000000000000000 <ltmp0>:
       0: a9bf7bfd      stp     x29, x30, [sp, #-16]!
       4: 910003fd      mov     x29, sp
       8: 94000000      bl      0x8 <ltmp0+0x8>
                0000000000000008:  ARM64_RELOC_BRANCH26 _bar
       c: 90000000      adrp    x0, 0x0 <ltmp0+0xc>
                000000000000000c:  ARM64_RELOC_GOT_LOAD_PAGE21  _bar
      10: f9400000      ldr     x0, [x0]
                0000000000000010:  ARM64_RELOC_GOT_LOAD_PAGEOFF12       _bar
      14: 90000001      adrp    x1, 0x0 <ltmp0+0x14>
                0000000000000014:  ARM64_RELOC_GOT_LOAD_PAGE21  _FOO
      18: f9400021      ldr     x1, [x1]
                0000000000000018:  ARM64_RELOC_GOT_LOAD_PAGEOFF12       _FOO
      1c: a8c17bfd      ldp     x29, x30, [sp], #16
      20: d65f03c0      ret

with the LLVM backend. If I try to link it I get a linker error as expected:

Undefined symbols for architecture arm64:
  "_FOO", referenced from:
      rust_out::foo::he5a9b1ed9cb33bc0 in rust_out.o
  "_bar", referenced from:
      rust_out::foo::he5a9b1ed9cb33bc0 in rust_out.o
      rust_out::foo::he5a9b1ed9cb33bc0 in rust_out.o
  "_main", referenced from:
      <initial-undefines>
ld: symbol(s) not found for architecture arm64

If I apply the following patch to Cranelift:

diff --git a/cranelift/codegen/src/isa/aarch64/abi.rs b/cranelift/codegen/src/isa/aarch64/abi.rs
index 922e4ebfc..6d43c1db6 100644
--- a/cranelift/codegen/src/isa/aarch64/abi.rs
+++ b/cranelift/codegen/src/isa/aarch64/abi.rs
@@ -1027,7 +1027,7 @@ impl ABIMachineSpec for AArch64MachineDeps {
     ) -> SmallVec<[Inst; 2]> {
         let mut insts = SmallVec::new();
         match &dest {
-            &CallDest::ExtName(ref name, RelocDistance::Near) => insts.push(Inst::Call {
+            &CallDest::ExtName(ref name, _/*RelocDistance::Near*/) => insts.push(Inst::Call {
                 info: Box::new(CallInfo {
                     dest: name.clone(),
                     uses,

Relocations that print identical by objdump -dr are generated by Cranelift:

rust_out.o:     file format mach-o arm64

Disassembly of section __TEXT,__text:

0000000000000000 <__ZN8rust_out3foo17he5a9b1ed9cb33bc0E>:
       0: a9bf7bfd      stp     x29, x30, [sp, #-16]!
       4: 910003fd      mov     x29, sp
       8: d10043ff      sub     sp, sp, #16
       c: 94000000      bl      0xc <__ZN8rust_out3foo17he5a9b1ed9cb33bc0E+0xc>
                000000000000000c:  ARM64_RELOC_BRANCH26 _bar
      10: 90000000      adrp    x0, 0x0 <__ZN8rust_out3foo17he5a9b1ed9cb33bc0E+0x10>
                0000000000000010:  ARM64_RELOC_GOT_LOAD_PAGE21  _bar
      14: f9400000      ldr     x0, [x0]
                0000000000000014:  ARM64_RELOC_GOT_LOAD_PAGEOFF12       _bar
      18: 90000007      adrp    x7, 0x0 <__ZN8rust_out3foo17he5a9b1ed9cb33bc0E+0x18>
                0000000000000018:  ARM64_RELOC_GOT_LOAD_PAGE21  _FOO
      1c: f94000e7      ldr     x7, [x7]
                000000000000001c:  ARM64_RELOC_GOT_LOAD_PAGEOFF12       _FOO
      20: 910003e8      mov     x8, sp
      24: f9000107      str     x7, [x8]
      28: 910003e8      mov     x8, sp
      2c: f9400101      ldr     x1, [x8]
      30: 910043ff      add     sp, sp, #16
      34: a8c17bfd      ldp     x29, x30, [sp], #16
      38: d65f03c0      ret

Yet the object file produced by LLVM gives a regular linker error due to undefined symbols, while the one produced by Cranelift crashes the linker:

0  0x100e02074  __assert_rtn + 72
1  0x100d3edb8  ld::InputFiles::SliceParser::parseObjectFile(mach_o::Header const*) const + 22712
2  0x100d4b830  ld::InputFiles::parseAllFiles(void (ld::AtomFile const*) block_pointer)::$_8::operator()(unsigned long, ld::FileInfo const&) const + 440
3  0x189fa6428  _dispatch_client_callout2 + 20
4  0x189fba850  _dispatch_apply_invoke3 + 336
5  0x189fa63e8  _dispatch_client_callout + 20
6  0x189fa7c68  _dispatch_once_callout + 32
7  0x189fbaeec  _dispatch_apply_invoke_and_wait + 372
8  0x189fb9e9c  _dispatch_apply_with_attr_f + 1212
9  0x189fba08c  dispatch_apply + 96
10  0x100dd0564  ld::AtomFileConsolidator::parseFiles(bool) + 292
11  0x100d6bee8  main + 9532
ld: Assertion failed: (pattern[0].addrMode == addr_other), function addFixupFromRelocations, file Relocations.cpp, line 701.

This means there has to be a difference somewhere. If I run bingrep on both binaries, the relocations are shown in opposite order for both executable. For LLVM:

Relocations(5):
  Segment   Section   Count  
  __TEXT    __text    5      
         Type                             Offset   Length   PIC      Extern   SymbolNum   Symbol  
         ARM64_RELOC_GOT_LOAD_PAGEOFF12   0x18     2        false    true     0x2         _FOO    
         ARM64_RELOC_GOT_LOAD_PAGE21      0x14     2        true     true     0x2         _FOO    
         ARM64_RELOC_GOT_LOAD_PAGEOFF12   0x10     2        false    true     0x3         _bar    
         ARM64_RELOC_GOT_LOAD_PAGE21      0xc      2        true     true     0x3         _bar    
         ARM64_RELOC_BRANCH26             0x8      2        true     true     0x3         _bar    

and for Cranelift:

Relocations(5):
  Segment   Section   Count  
  __TEXT    __text    5      
         Type                             Offset   Length   PIC      Extern   SymbolNum   Symbol  
         ARM64_RELOC_BRANCH26             0xc      2        true     true     0x2         _bar    
         ARM64_RELOC_GOT_LOAD_PAGE21      0x10     2        true     true     0x2         _bar    
         ARM64_RELOC_GOT_LOAD_PAGEOFF12   0x14     2        false    true     0x2         _bar    
         ARM64_RELOC_GOT_LOAD_PAGE21      0x18     2        true     true     0x1         _FOO    
         ARM64_RELOC_GOT_LOAD_PAGEOFF12   0x1c     2        false    true     0x1         _FOO    

As hack I tried to reverse the order of relocations emitted by Cranelift too, which worked.

diff --git a/cranelift/codegen/src/isa/aarch64/abi.rs b/cranelift/codegen/src/isa/aarch64/abi.rs
index 922e4ebfc..6d43c1db6 100644
--- a/cranelift/codegen/src/isa/aarch64/abi.rs
+++ b/cranelift/codegen/src/isa/aarch64/abi.rs
@@ -1027,7 +1027,7 @@ impl ABIMachineSpec for AArch64MachineDeps {
     ) -> SmallVec<[Inst; 2]> {
         let mut insts = SmallVec::new();
         match &dest {
-            &CallDest::ExtName(ref name, RelocDistance::Near) => insts.push(Inst::Call {
+            &CallDest::ExtName(ref name, _/*RelocDistance::Near*/) => insts.push(Inst::Call {
                 info: Box::new(CallInfo {
                     dest: name.clone(),
                     uses,
diff --git a/cranelift/object/src/backend.rs b/cranelift/object/src/backend.rs
index 2d48e9b61..c5a84f14c 100644
--- a/cranelift/object/src/backend.rs
+++ b/cranelift/object/src/backend.rs
@@ -501,7 +501,7 @@ impl ObjectModule {
                 ref name,
                 flags,
                 addend,
-            } in &symbol.relocs
+            } in symbol.relocs.iter().rev()
             {
                 let target_symbol = self.get_symbol(name);
                 self.object

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 27, 2024

Looks like the change to cranelift/codegen/src/isa/aarch64/abi.rs isn't even necessary at all. Only the relocation order reversing is required.

@philipc
Copy link
Contributor

philipc commented Jun 27, 2024

That sounds better. I was comparing clang output instead of looking at what rustc with llvm backend did, so you can probably ignore my results.

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 27, 2024

What would be the best place to add this relocation reversing code? object or cranelift-object? On the one hand it is something that would affect other projects too, so object makes more sense, on the other hand reversing the relocation order right before writing would require anyone who wants to round trip an object file through the object crate to reverse relocation order again before the add_relocation calls to produce a working object file.

@philipc
Copy link
Contributor

philipc commented Jun 27, 2024

LLVM does the reverse in its object writer. I'm not certain which place is best, for the same reasons you give, but I'm inclined to fix it in object, since its primary use is for generating objects, not for round trips. It already does at least one other change that round trips need to account for (symbol name decorations).

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 27, 2024

Opened gimli-rs/object#702

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 29, 2024

This has been fixed in object 0.36.1.

@sezna
Copy link
Author

sezna commented Jun 29, 2024

Thank you all who looked into this!

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 29, 2024

This issue can be closed now, right?

@camelid
Copy link
Contributor

camelid commented Jun 29, 2024

I'm now getting ld: warning: no platform load command found in '/path/to/cranelift/output.o', assuming: macOS. This happens only with the default modern linker, but not the legacy linker. The executable still works, but it's unfortunate to get this warning. It seems like Julia encountered a similar issue and fixed it by including the macOS version number in the OS segment of the triple: JuliaLang/julia#51830. I'm using target_lexicon::Triple::host, so it could be an issue with that crate rather than cranelift?

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 29, 2024

We will need to call set_macho_build_version in cranelift-object to emit the LC_BUILD_VERSION load command.

@camelid
Copy link
Contributor

camelid commented Jun 30, 2024

It seems like rustc has implemented a fix for the same issue: rust-lang/rust#111384.

@camelid
Copy link
Contributor

camelid commented Jun 30, 2024

It's unclear to me if cranelift-object should be exposing some function to set the macho build version or if it should handle it automatically. Do we have enough information in cranelift-object to target the exact version?

@madsmtm
Copy link

madsmtm commented Sep 28, 2024

I've looked into this recently, see rust-lang/rust#129432 for the broader Rust issue of deployment targets and SDKs.

I'm pretty sure the correct behaviour for Cranelift would be to set the values on MachOBuildVersion as follows:

  • platform, depending on the target triple's OperatingSystem and Environment.
  • minos (deployment target), extracted from target_lexicon::OperatingSystem::MacOSX.
    • Will need some work in target_lexicon to better support iOS/tvOS/watchOS/visionOS here too.
    • I think you might need to error if the version isn't present in the given triple?
      • The system tooling (linker, assembler, otool etc.) requires a version here.
      • At the very least don't emit the LC_BUILD_VERSION command in this case, and warn very strongly.
  • sdk (SDK version) to 0. This is also what LLVM does by default if no SDK version is given, and the tooling shows n/a in that case.

Note that this will not support older tooling that uses LC_VERSION_MIN_IPHONEOS, LC_VERSION_MIN_MACOSX etc. (we're talking somewhere around Xcode 9 or 10), but that's probably fine, we still support older OS versions the linker is going to patch it up such that you can still run the binaries on older OS versions.

@madsmtm
Copy link

madsmtm commented Sep 28, 2024

It's unclear to me if cranelift-object should be exposing some function to set the macho build version or if it should handle it automatically. Do we have enough information in cranelift-object to target the exact version?

So to answer this directly, Cranelift should have enough information to set this if it requires the deployment target / minimum OS version to be set in the target triple (e.g. that users pass aarch64-apple-macosx12.0.0, and not just aarch64-apple-macosx).

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 28, 2024

Cranelift uses target-lexicon for target triples, which aims to math rustc. Rustc target triples do not allow passing in the OS version. Instead rustc has a default for each target and additionally allows setting the same env vars that clang reads. There is no way currently to pass this information to Cranelift.

@madsmtm
Copy link

madsmtm commented Sep 28, 2024

Cranelift uses target-lexicon for target triples, which aims to math rustc. Rustc target triples do not allow passing in the OS version. Instead rustc has a default for each target and additionally allows setting the same env vars that clang reads. There is no way currently to pass this information to Cranelift.

Hmm, isn't rustc passing the LLVM target to Cranelift? https://github.com/rust-lang/rust/blob/612796c42077605fdd3c6f7dda05745d8f4dc4d8/compiler/rustc_codegen_cranelift/src/lib.rs#L262

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 28, 2024

Yes, but as it turns out target-lexicon is just lenient with parsing to parse most LLVM triples fine too. I'm pretty sure it would parse aarch64-apple-macosx12.0.0 as having the OS field set to macosx12.0.1, which is not an OS known to rustc or target-lexicon and thus would fail parsing.

@madsmtm
Copy link

madsmtm commented Sep 29, 2024

Yes, but as it turns out target-lexicon is just lenient with parsing to parse most LLVM triples fine too. I'm pretty sure it would parse aarch64-apple-macosx12.0.0 as having the OS field set to macosx12.0.1, which is not an OS known to rustc or target-lexicon and thus would fail parsing.

Nope, they do parse the version number, see https://github.com/bytecodealliance/target-lexicon/blob/7c80d459a9fdd121e9f23feb680c3db13c1baa39/src/targets.rs#L1393-L1421.

In any case, I have opened bytecodealliance/target-lexicon#111 for figuring out how to resolve the rustc/LLVM discrepancy in target-lexicon.


But if it is as you say, that Cranelift wants to accept rustc target triples, then it needs some way to specify the deployment target on Apple platforms too.

A simple way to do this would be to add ObjectBuilder::apple_deployment_target(&mut self, major: u16, minor: u8, patch: u8) -> &mut Self, but I'd argue that it might be worthwhile to add such a setter to the Triple itself directly? At least, if Cranelift wants to at some point do an optimization based on the deployment target (which LLVM does all the time, though mostly for Objective-C code), then it'd be nice to have available earlier in the compilation pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants