Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Objc transition can swap architecture, leading to link failure #19204

Closed
cpsauer opened this issue Aug 9, 2023 · 9 comments
Closed

Objc transition can swap architecture, leading to link failure #19204

cpsauer opened this issue Aug 9, 2023 · 9 comments
Labels
team-Rules-ObjC Issues for Objective-C maintainers type: bug untriaged

Comments

@cpsauer
Copy link
Contributor

cpsauer commented Aug 9, 2023

Description of the bug:

Hi wonderful Bazelers,

I stumbled across an edge case where Bazel gets confused about the target architecture on macOS, causing linking to fail.

More explicitly (and for searchability), one ends up with a link line that mixes arm64 and x86_64 archives, leading to errors like the following
ld: warning: ignoring file bazel-out/applebin_macos-darwin_**x86_64**-fastbuild-ST-79db1517e854/bin/lib.a, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
Which turns into an undefined symbols error, breaking the build:
Undefined symbols for architecture arm64: for whatever was in the mistakenly x86_64 archive.

Which category does this issue belong to?

Objc (but I didn't see that in the dropdown--maybe C++ if the ObjC<>C++ intersection is considered a subset.)

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

The easiest way to reproduce the problem is to point a cc_binary -> cc_library -> objc_library, having set up compilation for Apple platforms per rules_apple, and passing in the right ordering of architectures to --macos_cpus

Here's a minimal reproduction for Apple Silicon Macs:
linkingArchIssue.zip

Just grab it and bazel build :main to reproduce the linking issue.

If you're on Intel, I suspect you can reproduce by passing --macos_cpus=arm64,x86_64 instead of --macos_cpus=x86_64,arm64.

Note that this isn't a particularly contrived case, just a minimized one. It's easy to run into it this way creating test binaries.

Which operating system are you running Bazel on?

macOS 13.4.1

What is the output of bazel info release?

release 7.0.0-pre.20230724.1 (latest rolling)

Regression, Cross References

Since it's dependent on passing --macos_cpus in the right order and has to do with the the cc -> objc transition, I'm guessing this issue has to do with selecting the first macOS CPU in //src/main/starlark/builtins_bzl/common/objc/transitions.bzl

See https://github.com/bazelbuild/bazel/pull/14816/files, where that line return macos_cpus[0] was introduced, fixing the related #14803 -- and I suspect narrowing it down to this one.

@keith, sorry to tag you, but since you've been working in there, does that seem like the right source of the issue? I think you're better positioned than me: Do you know how to correctly get the CPU corresponding to the top level platform from there?
(And do you think the other, similar cases *_cpus[0] cases for other platforms might case similar issues in tests?)

Thanks so much, all.
Chris

@keith
Copy link
Member

keith commented Aug 10, 2023

I'm not sure what the root cause is here but probably the easiest solution is to use https://github.com/bazelbuild/apple_support/blob/master/doc/rules.md#universal_binary

@cpsauer
Copy link
Contributor Author

cpsauer commented Aug 12, 2023

Definitely a good workaround--but we probably want to fix the underlying problem if we can, right?

I tested quickly, and this problem can indeed be fixed by flipping the order of the fallback conditions in the lines referred to in your PR, since that keeps us consistent with the machine we're building for. That is, by making the Mac case in objc/transitions.bzl the following:

    if platform_type == MACOS:
        cpu_value = settings["//command_line_option:cpu"]
        if cpu_value.startswith(DARWIN_CPU_PREFIX):
            return cpu_value[len(DARWIN_CPU_PREFIX):]
        macos_cpus = settings["//command_line_option:macos_cpus"]
        if macos_cpus:
            return macos_cpus[0]
        return DEFAULT_MACOS_CPU

I don't have as much context as you all, but it does seem like we might be able to greatly simplify the process of choosing an architecture when one hasn't already been specified by an apple_split_cpu transition: Couldn't we just always keep the CPU specified, i.e., //command_line_option:cpu? All normal rules_apple top-level targets will apply the transition, right, so we're only talking about building for mac via cc_binary or directly compiling a library, in which case it's an error to specify the cpu you didn't intend (especially since //command_line_option:apple_platform_type seems to be undocumented).

What do you think, @keith? Thanks in advance for your patience if I've missed something that's obvious to you here.

@cpsauer
Copy link
Contributor Author

cpsauer commented Aug 12, 2023

^ To make those concrete, I tossed up a couple of PRs to show the two options:
Smallest change, just flipping the order of the conditions, as in the code above: #19235
Slightly larger--and much redder--change, doing the simplification proposed above: #19236

@keith
Copy link
Member

keith commented Aug 12, 2023

the goal is actually to toss that transition entirely #16870, but that requires some internal changes at google AFAIUI

@cpsauer
Copy link
Contributor Author

cpsauer commented Aug 12, 2023

Makes sense. Think we could merge the first in the meantime, or maybe (ideally) the second, moving things more simply towards no transition?

@keith
Copy link
Member

keith commented Aug 14, 2023

maybe @googlewalt can chime in on the feasibility of removing the transition before 7.x

@keith
Copy link
Member

keith commented Aug 15, 2023

#19256

@googlewalt
Copy link
Contributor

Thanks for implementing the incompatible flag. That is a good first step and I'll review it. As for the internal cleanup required to flip that flag, I don't have time for that this year, but there is some progress made -- @comius is this something on your team's radar?

@cpsauer
Copy link
Contributor Author

cpsauer commented Aug 31, 2023

Thanks so much for doing, Keith!
(I'll close down the other PRs.)

For anyone else running into this, pass build --incompatible_disable_objc_library_transition until #16870 is resolved.

keith added a commit to keith/bazel that referenced this issue Sep 1, 2023
This is a migration for bazelbuild#16870

Users who rely on the current behavior should instead wrap their library in a target from [rules_apple](https://github.com/bazelbuild/rules_apple).

Fixes bazelbuild#19204

Closes bazelbuild#19256.

PiperOrigin-RevId: 561253613
Change-Id: I1b1b24a08caa33e86881bfe376a525060c9887a9
(cherry picked from commit 0f34e76)
keith added a commit to keith/bazel that referenced this issue Sep 6, 2023
This is a migration for bazelbuild#16870

Users who rely on the current behavior should instead wrap their library in a target from [rules_apple](https://github.com/bazelbuild/rules_apple).

Fixes bazelbuild#19204

Closes bazelbuild#19256.

PiperOrigin-RevId: 561253613
Change-Id: I1b1b24a08caa33e86881bfe376a525060c9887a9
(cherry picked from commit 0f34e76)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Rules-ObjC Issues for Objective-C maintainers type: bug untriaged
Projects
None yet
6 participants
@keith @cpsauer @googlewalt @Pavank1992 @iancha1992 and others