Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] new toolchain file needs workaround for CMake AR identification bug #1698

Closed
DoDoENT opened this issue Apr 13, 2022 · 21 comments
Closed
Labels

Comments

@DoDoENT
Copy link

DoDoENT commented Apr 13, 2022

Description

After updating to Xcode 13.5, my android builds started failing with the following errors:

error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: Unsupported triple for mach-o cpu type: aarch64-none-linux-android21

Mysteriously, the same build worked before the Xcode upgrade.

The reason for this is that the new toolchain file does not set the CMAKE_AR variable. By inspecting it, I see that it's set to /usr/bin/ar. Apparently, before the Xcode upgrade, the macOS's builtin ar "somehow worked" with android binaries.

If I use the legacy toolchain file by setting -DANDROID_USE_LEGACY_TOOLCHAIN_FILE=ON, everything works again and upon inspection of the CMAKE_AR variable, I see that it's set to the correct path within the NDK installation dir.

Affected versions

r23

Canary version

No response

Host OS

Mac

Host OS version

12.3.1

Affected ABIs

armeabi-v7a, arm64-v8a, x86, x86_64

Build system

CMake

Other build system

No response

minSdkVersion

16

Device API level

No response

@DoDoENT DoDoENT added the bug label Apr 13, 2022
@DanAlbert
Copy link
Member

FWIW we've been encountering other issues with the new toolchain file (#1693) that can't be fixed on our end so the next releases will be reverting back to the legacy toolchain by default.

This one appears fixable from our end though, so we'll try to get that in as well.

@DanAlbert
Copy link
Member

I started by writing the regression test, and to my surprise it's already passing. That's in our master branch, however, so I still need to check that the same is true of r23.

It's also possible that this was a CMake bug that is no longer present in 3.22.1 (or perhaps something older, that's just the one I'm testing with). That would definitely match what I found while looking for the place to fix this, because the new toolchain file doesn't set any of those variables explicitly (the goal of the new toolchain file is to delegate as much as possible directly to CMake to reduce behavior differences between Android and other targets).

@DanAlbert
Copy link
Member

The same test is working fine with r23b. If you can try again with CMake 3.22.1 (in the canary channel of the SDK manager, or get it from CMake's download page) and still see the issue, lmk.

osspop pushed a commit to osspop/android-ndk that referenced this issue Jan 17, 2023
This was reported broken but that does not appear to be the case. Adding
the regression test anyway as proof. It might be the case that this is
broken with older CMake versions or perhaps in older NDK releases.

Bug: android/ndk#1698
Test: This is the test
Change-Id: Ia2c69b1263187639d6cf69dcbdb3345f6ee3be45
@DoDoENT
Copy link
Author

DoDoENT commented May 26, 2023

I've now finally given a second try for new toolchain, and, unfortunately, this still happens with r25c.

I'm using CMake 3.26.3 (downloaded from the official page, not the one bundled with the Android Sudio).

@DanAlbert
Copy link
Member

I'm not sure what to tell you. We have a test running for both the new and old CMake toolchain files and it's working fine. I altered that test to print the values as well to rule out CMake using xcode's instead, and that's showing me that the NDK is used.

cmake_minimum_required(VERSION 3.22)
project(CMakeExportsTest C CXX)

if(NOT DEFINED CMAKE_C_COMPILER)
    message(FATAL_ERROR "CMAKE_C_COMPILER not set")
else()
    message(WARNING "CMAKE_C_COMPILER is ${CMAKE_C_COMPILER}")
endif()

if(NOT DEFINED CMAKE_CXX_COMPILER)
    message(FATAL_ERROR "CMAKE_CXX_COMPILER not set")
else()
    message(WARNING "CMAKE_CXX_COMPILER is ${CMAKE_CXX_COMPILER}")
endif()

if(NOT DEFINED CMAKE_AR)
    message(FATAL_ERROR "CMAKE_AR not set")
else()
    message(WARNING "CMAKE_AR is ${CMAKE_AR}")
endif()

if(NOT DEFINED CMAKE_STRIP)
    message(FATAL_ERROR "CMAKE_STRIP not set")
else()
    message(WARNING "CMAKE_STRIP is ${CMAKE_STRIP}")
endif()

if(NOT DEFINED CMAKE_RANLIB)
    message(FATAL_ERROR "CMAKE_RANLIB not set")
else()
    message(WARNING "CMAKE_RANLIB is ${CMAKE_RANLIB}")
endif()
$ cmake --version
cmake version 3.26.4

CMake suite maintained and supported by Kitware (kitware.com/cmake).
$ $(xcode-select -p)/usr/bin/xcodebuild -version                         
Xcode 14.1
Build version 14B47b
$ cmake -DCMAKE_TOOLCHAIN_FILE=~/Library/Android/sdk/ndk/25.2.9519653/build/cmake/android.toolchain.cmake path/to/test -DANDROID_USE_LEGACY_TOOLCHAIN_FILE=OFF
# Passes, shows things like
#   CMAKE_RANLIB is
#   /Users/danalbert/Library/Android/sdk/ndk/25.2.9519653/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ranlib
$ cmake -DCMAKE_TOOLCHAIN_FILE=~/Library/Android/sdk/ndk/25.2.9519653/build/cmake/android.toolchain.cmake path/to/test -DANDROID_USE_LEGACY_TOOLCHAIN_FILE=ON
# Also passes, shows things like
#   CMAKE_RANLIB is
#   /Users/danalbert/Library/Android/sdk/ndk/25.2.9519653/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ranlib

Does any of that look wrong to you? What's different about your workflow/environment?

@DoDoENT
Copy link
Author

DoDoENT commented May 31, 2023

I've tried adding your test to my suite, and I get the following output:

-- Using Conan toolchain: /Users/dodo/Work/Microblink/conan-toolchains/android/test_package/build/android-armv8-clang-14.0.7/Release/generators/conan_toolchain.cmake
-- Android: Targeting API '21' with architecture 'arm64', ABI 'arm64-v8a', and processor 'aarch64'
-- Android: Selected unified Clang toolchain
-- The CXX compiler identification is Clang 14.0.7
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_C_COMPILER set to /Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android21-clang
-- CMAKE_CXX_COMPILER set to /Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++
-- CMAKE_AR set to /usr/bin/ar
-- CMAKE_STRIP set to /Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-strip
-- CMAKE_RANLIB set to /Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ranlib
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/dodo/Work/Microblink/conan-toolchains/android/test_package/build/android-armv8-clang-14.0.7/Release

Note CMAKE_AR being set to /usr/bin/ar.

However, this happens when using conan toolchain that uses NDK's toolchain internally. If I use NDK's toolchain directly, I get the correct result.

Maybe this is now a conan bug somehow. I'll investigate further and get back...

@DoDoENT
Copy link
Author

DoDoENT commented May 31, 2023

It appears to be an NDK bug after all.

The conan generates the following toolchain file:

# Conan automatically generated toolchain file
# DO NOT EDIT MANUALLY, it will be overwritten

# Avoid including toolchain file several times (bad if appending to variables like
#   CMAKE_CXX_FLAGS. See https://github.com/android/ndk/issues/323
include_guard()

message(STATUS "Using Conan toolchain: ${CMAKE_CURRENT_LIST_FILE}")

if(${CMAKE_VERSION} VERSION_LESS "3.15")
    message(FATAL_ERROR "The 'CMakeToolchain' generator only works with CMake >= 3.15")
endif()




set(CMAKE_SYSROOT /Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/sysroot)




set(CMAKE_C_COMPILER "/Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android21-clang")
set(CMAKE_CXX_COMPILER "/Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android21-clang++")


# New toolchain things
set(ANDROID_PLATFORM android-21)
set(ANDROID_STL c++_static)
set(ANDROID_ABI arm64-v8a)
set(ANDROID_USE_LEGACY_TOOLCHAIN_FILE OFF)
include(/Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/build/cmake/android.toolchain.cmake)

# ... the rest of it

This is enough to reproduce my issue - CMAKE_AR being set to /usr/bin/ar.

However, by removing the line set( ANDROID_USE_LEGACY_TOOLCHAIN_FILE OFF) or by commenting it out, NDK's ar is selected, as expected.

I've even experimented with more miniature toolchain wrappers. It appears, that this toolchain file is enough to reproduce the issue:

set(CMAKE_C_COMPILER "/Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android21-clang")
set(CMAKE_CXX_COMPILER "/Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android21-clang++")

set(ANDROID_USE_LEGACY_TOOLCHAIN_FILE OFF)
include(/Users/dodo/.conan2/p/b/andro5e1a4d14e3264/p/build/cmake/android.toolchain.cmake)

Note that if I don't explicitly define CMAKE_C_COMPILER and CMAKE_CXX_COMPILER in this case, I get the CMAKE_C_COMPILER not set error. This can also be reproduced with the vanilla NDK toolchain if you invoke CMake like this:

cmake -G "Ninja" -DCMAKE_TOOLCHAIN_FILE=/path/to/ndk/build/cmake/android.toolchain.cmake -DANDROID_USE_LEGACY_TOOLCHAIN_FILE=OFF -DCMAKE_BUILD_TYPE=Release /path/to/test/

I think that the problematic part of the NDK's toolchain file is this snippet:

if(DEFINED ANDROID_USE_LEGACY_TOOLCHAIN_FILE)
  set(_USE_LEGACY_TOOLCHAIN_FILE ${ANDROID_USE_LEGACY_TOOLCHAIN_FILE})
else()
  # Default to the legacy toolchain file to avoid changing the behavior of
  # CMAKE_CXX_FLAGS. See https://github.com/android/ndk/issues/1693.
  set(_USE_LEGACY_TOOLCHAIN_FILE true)
endif()
if(_USE_LEGACY_TOOLCHAIN_FILE)
  include("${CMAKE_CURRENT_LIST_DIR}/android-legacy.toolchain.cmake")
  return()
endif()

Notably, if ANDROID_USE_LEGACY_TOOLCHAIN_FILE is not defined, the legacy toolchain is used by default (the _USE_LEGACY_TOOLCHAIN_FILE is set to true.

Only after defining ANDROID_USE_LEGACY_TOOLCHAIN_FILE as OFF, the new non-legacy toolchain is actually used, and it appears that it still contains this bug that I reported a year ago.

Your test passes probably because you don't explicitly define ANDROID_USE_LEGACY_TOOLCHAIN_FILE to OFF and then it accidentally uses the legacy toolchain because this is the default.

@DanAlbert
Copy link
Member

Notably, if ANDROID_USE_LEGACY_TOOLCHAIN_FILE is not defined, the legacy toolchain is used by default (the _USE_LEGACY_TOOLCHAIN_FILE is set to true.

Yes, intentionally. The legacy toolchain is the default because the new one has pretty significant issues that we're not willing to enable by default (it is shockingly easy to accidentally override necessary default flags with a seemingly innocent build.gradle change, including some things that were in the default build.gradle template for a very long time; there's no NDK or AGP bug causing that, it's CMake's intended behavior).

Your test passes probably because you don't explicitly define ANDROID_USE_LEGACY_TOOLCHAIN_FILE to OFF and then it accidentally uses the legacy toolchain because this is the default.

We do: https://cs.android.com/android/platform/superproject/+/master-ndk:ndk/ndk/test/buildtest/case.py;l=493;drc=e5273558b1c999ce743d469b97201b24f4a318b3. You can see my snippet in that post that was explicitly removed from the test runner to avoid that as well. There's something else going on, but I'm still not sure what.

I've even experimented with more miniature toolchain wrappers. It appears, that this toolchain file is enough to reproduce the issue...

That repros for me, thanks 👍 Oddly enough, today so does the test case I uploaded last week... I have no clue what I changed. I applied an OS update last week but that'd be a pretty baffling cause (and for that to be the culprit I'd have to have been >1 year behind you on updates in the first place).

@DanAlbert DanAlbert reopened this Jun 5, 2023
@DanAlbert DanAlbert changed the title [BUG] CMAKE_AR not set with the new toolchain file [BUG] CMAKE_AR not set with the old toolchain file Jun 5, 2023
@DanAlbert DanAlbert changed the title [BUG] CMAKE_AR not set with the old toolchain file [BUG] CMAKE_AR not set with the new toolchain file Jun 5, 2023
@DanAlbert
Copy link
Member

Reopening while we prove it one way or the other, but given that this works with the old toolchain file and not the new one, there's a chance this is actually a CMake bug rather than an NDK bug.

@DanAlbert
Copy link
Member

DanAlbert commented Jun 5, 2023

Here's the same problem after removing our toolchain file from the equation entirely:

$ cmake -DCMAKE_ANDROID_NDK=/Users/danalbert/Library/Android/sdk/ndk/25.2.9519653 -DCMAKE_SYSTEM_NAME=Android -DCMAKE_SYSTEM_VERSION=21 ../tests/build/cmake_exports
...
Make Warning at CMakeLists.txt:7 (message):
  CMAKE_C_COMPILER is
  /Users/danalbert/Library/Android/sdk/ndk/25.2.9519653/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang


CMake Warning at CMakeLists.txt:13 (message):
  CMAKE_CXX_COMPILER is
  /Users/danalbert/Library/Android/sdk/ndk/25.2.9519653/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++


CMake Warning at CMakeLists.txt:19 (message):
  CMAKE_AR is /usr/bin/ar


CMake Warning at CMakeLists.txt:25 (message):
  CMAKE_STRIP is
  /Users/danalbert/Library/Android/sdk/ndk/25.2.9519653/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-strip


CMake Warning at CMakeLists.txt:31 (message):
  CMAKE_RANLIB is
  /Users/danalbert/Library/Android/sdk/ndk/25.2.9519653/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ranlib
...

I would guess this has something to do with ar vs llvm-ar, but it's damned odd that llvm-strip and llvm-ranlib are detected correctly if that's the case...

Same results if I move the $NDK/build/cmake/hooks directory out of the way to really rule us out. This looks like a CMake bug to me. It might be one that we're able to work around? That's ostensibly what the hooks directory is for but what we can and can't do in those has never been clear to me.

@DanAlbert
Copy link
Member

DanAlbert commented Jun 5, 2023

By the way, I expect the real answer you're looking for is "don't use the new toolchain file". There's a reason it isn't the default. It's not ready yet, and tbh I'm losing hope that it ever will be.

@DanAlbert
Copy link
Member

An improved test:

cmake_minimum_required(VERSION 3.22)
project(CMakeExportsTest C CXX)

foreach(TEST_VAR CMAKE_C_COMPILER CMAKE_CXX_COMPILER CMAKE_AR CMAKE_STRIP CMAKE_RANLIB)
    if(NOT DEFINED "${TEST_VAR}")
        message(FATAL_ERROR "${TEST_VAR} not set")
    elseif(NOT ${TEST_VAR} MATCHES "${CMAKE_ANDROID_NDK}")
        message(FATAL_ERROR "${TEST_VAR} (${${TEST_VAR}}) is outside the NDK (${CMAKE_ANDROID_NDK})")
    else()
        message(WARNING "${TEST_VAR} is ${${TEST_VAR}}")
    endif()
endforeach()

But interestingly that seems to pass in our test runner, but I see the error I expect when running it myself...

@DanAlbert
Copy link
Member

This broke in CMake some time between 3.22 and 3.26. The test above passes because the latest in the SDK is 3.22 (a bug in its own right, I know, but we're shorthanded), so that's what we test with. When I was testing it myself I was using a fresh 3.26 since that's what you're using.

I still need to check if this is something we can workaround in our hooks, but I think we can pretty safely say that the bug is in CMake. Up to you if you want to file that with them. I doubt I'll have the time, since it only affects a workflow that I cannot recommend anyone use.

@DanAlbert
Copy link
Member

https://android-review.googlesource.com/c/platform/ndk/+/2616091 improves the test, but for the reasons above (and in the commit message) it would not have caught this. It will eventually when we're able to update CMake, but it's not likely ever going to be feasible for us to test against every version of CMake, so there will always be some uncovered gap.

@DanAlbert
Copy link
Member

Okay, fine, I've been nerd sniped :) https://gitlab.kitware.com/cmake/cmake/-/issues/24975

@DanAlbert DanAlbert changed the title [BUG] CMAKE_AR not set with the new toolchain file [FR] add workaround for CMake AR identification bug Jun 5, 2023
@DanAlbert DanAlbert changed the title [FR] add workaround for CMake AR identification bug [FR] new toolchain file needs workaround for CMake AR identification bug Jun 5, 2023
@DanAlbert DanAlbert changed the title [FR] new toolchain file needs workaround for CMake AR identification bug [BUG] new toolchain file needs workaround for CMake AR identification bug Jun 5, 2023
@DanAlbert DanAlbert added this to NDK r26 Jun 5, 2023
@github-project-automation github-project-automation bot moved this to Triaged in NDK r26 Jun 5, 2023
@DanAlbert
Copy link
Member

(triaging the workaround to r26 for now, but that's going to depend on how much time we have)

@DoDoENT
Copy link
Author

DoDoENT commented Jun 6, 2023

By the way, I expect the real answer you're looking for is "don't use the new toolchain file". There's a reason it isn't the default. It's not ready yet, and tbh I'm losing hope that it ever will be.

No problem - I've been using the legacy toolchain the whole time. Every now and then I retest with the "non-legacy" one to see if it became usable.

@DanAlbert
Copy link
Member

Has actually been a known CMake bug for a while (shame on me for not finding the duplicate): https://gitlab.kitware.com/cmake/cmake/-/issues/23320.

Every now and then I retest with the "non-legacy" one to see if it became usable.

tbh I wouldn't bother until the default changes. It's not something we're currently investing in because there's no end in sight. It's proven to be a huge cost for almost no benefit.

@DanAlbert
Copy link
Member

No idea if my timing was just convenient or if Android making noise made it happen, but CMake just uploaded a patch that likely fixes the issue: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/8544

@DanAlbert
Copy link
Member

I didn't find the time to attempt a workaround for the new toolchain. I'm pretty sure I won't either since this only affects the toolchain we recommend people don't use, a much better workaround is to not use -DANDROID_USE_LEGACY_TOOLCHAIN_FILE=OFF, and by the time this does make it to the top of my todo list "update cmake" is probably an even better option.

@DanAlbert
Copy link
Member

The gitlab MR linked above was tagged for CMake 3.27, which is out now. Getting the SDK's CMake updated (which I don't have time for quite yet but is on my todo list for Q1/Q2) sounds like a better use of my time than workaround around the old one. Closing this as won't fix unless someone's got an argument against doing so.

@DanAlbert DanAlbert closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2024
@DanAlbert DanAlbert removed this from NDK r27 Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants