Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mono musl support #76500

Merged
merged 4 commits into from
Jan 27, 2023
Merged

Mono musl support #76500

merged 4 commits into from
Jan 27, 2023

Conversation

ayakael
Copy link
Contributor

@ayakael ayakael commented Oct 2, 2022

Descriptions

Attempt at musl support for mono as to allow s390x on Alpine Linux.

There seems to be issues with mono on musl. Outside perspective would be much appreciated.

Related issues:

System.InsufficientExecutionStackException when building many packages (dotnet/roslyn#64423)

Fixes #76805

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 2, 2022
@ghost
Copy link

ghost commented Oct 2, 2022

Tagging subscribers to this area: @directhex
See info in area-owners.md if you want to be subscribed.

Issue Details

Descriptions

Attempt at musl support for mono as to allow s390x on Alpine Linux.

There seems to be issues with mono on musl. Outside perspective would be much appreciated.

Related issues:

System.InsufficientExecutionStackException when building many packages (dotnet/roslyn#64423)

Author: ayakael
Assignees: -
Labels:

area-Infrastructure-mono, community-contribution

Milestone: -

@ayakael ayakael force-pushed the mono-musl-support branch 5 times, most recently from 4c044f3 to b94845d Compare October 2, 2022 03:06
@ayakael ayakael force-pushed the mono-musl-support branch 4 times, most recently from 29d3104 to 6a37c1c Compare October 2, 2022 04:06
src/mono/mono.proj Outdated Show resolved Hide resolved
@ayakael
Copy link
Contributor Author

ayakael commented Oct 3, 2022

The following bugs exist for mono on musl:

A fix for one of them likely involves setting stack size properly due to musl's really low default of 128K rather than the usual 8M.

@ayakael ayakael force-pushed the mono-musl-support branch from 6a37c1c to 993447a Compare October 3, 2022 16:52
@ayakael
Copy link
Contributor Author

ayakael commented Oct 22, 2022

With progress done on #76805, I've got a full build of s390x working on dotnet6. They are still some issues on dotnet7, which indeed only occurs on s390x:

The process cannot access the file '/builds/ayakael/aports/testing/dotnet7-stage0/src/dotnet-d41bfecf5090e9163aa2da251246abed9e756e53/src/runtime/artifacts/obj/System.IO.Ports' because it is being used by another process

@ayakael ayakael requested a review from thaystg as a code owner October 26, 2022 20:06
@ayakael
Copy link
Contributor Author

ayakael commented Dec 9, 2022

On s390x, libcoreclr.so is not explicitely linking to libucontext.so due to --as-needed, despite build of runtime being successful: Error relocating artifacts/bin/mono/Linux.s390x.Release/libcoreclr.so: setcontext: symbol not found. What am I forgetting?

@uweigand
Copy link
Contributor

uweigand commented Dec 9, 2022

On s390x, libcoreclr.so is not explicitely linking to libucontext.so due to --as-needed, despite build of runtime being successful: Error relocating artifacts/bin/mono/Linux.s390x.Release/libcoreclr.so: setcontext: symbol not found. What am I forgetting?

Can you share the actual full linker command line in effect when building libcoreclr.so. With --as-needed, the order of objects and libraries on the command line becomes important.

@uweigand
Copy link
Contributor

uweigand commented Dec 9, 2022

On s390x, libcoreclr.so is not explicitely linking to libucontext.so due to --as-needed, despite build of runtime being successful: Error relocating artifacts/bin/mono/Linux.s390x.Release/libcoreclr.so: setcontext: symbol not found. What am I forgetting?

Can you share the actual full linker command line in effect when building libcoreclr.so? With --as-needed, the order of objects and libraries on the command line becomes important.

@am11
Copy link
Member

am11 commented Dec 9, 2022

full linker command line

grep -R as-needed artifacts/obj is your friend to figure it out.

@ayakael
Copy link
Contributor Author

ayakael commented Dec 10, 2022

Thanks everyone, here it is:

artifacts/obj/mono/Linux.s390x.Release/CMakeFiles/CMakeOutput.log: "/usr/bin/ld" -pie -z now -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf64_s390 -dynamic-linker /lib/ld-musl-s3
90x.so.1 -o cmTC_55628 /usr/lib/Scrt1.o /usr/lib/crti.o /usr/bin/../lib/gcc/s390x-alpine-linux-musl/12.2.1/crtbeginS.o -L/usr/bin/../lib/gcc/s390x-alpine-linux-musl/12.2.1 -L/usr/bin/../lib/
gcc/s390x-alpine-linux-musl/12.2.1/../../../../s390x-alpine-linux-musl/lib -L/lib -L/usr/lib --as-needed --build-id=sha1 -lucontext --no-as-needed CMakeFiles/cmTC_55628.dir/CMakeCCompilerABI
.c.o -lssp_nonshared -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/bin/../lib/gcc/s390x-alpine-linux-musl/12.2.1/crtendS.o /usr/lib/crtn.o
artifacts/obj/mono/Linux.s390x.Release/CMakeFiles/CMakeOutput.log:  ignore line: [clang-15: warning: -Wl --no-as-needed: 'linker' input unused [-Wunused-command-line-argument]]

Indeed, --no-as-needed is after --as-needed

@ayakael
Copy link
Contributor Author

ayakael commented Dec 10, 2022

Fixed the issue via setting -Wl,--no-as-needed before -Wl,-lucontext, with the added target_link_libraries(mono-sgen PRIVATE ucontext). Thanks for the pointers :)

@ayakael
Copy link
Contributor Author

ayakael commented Dec 10, 2022

Found a proper fix where -lucontext is not needed anymore. In line with previous suggestions, this makes linking less hacky, and more specific. @am11 Could you do a final review? It'd be nice to merge this.

@steveisok
Copy link
Member

@akoeplinger Outside of resolving conflicts, how does this PR look?

Copy link
Member

@akoeplinger akoeplinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

@akoeplinger akoeplinger merged commit f478cfe into dotnet:main Jan 27, 2023
@@ -1325,7 +1325,7 @@ int32_t SystemNative_CopyFile(intptr_t sourceFd, intptr_t destinationFd, int64_t
// Try copying data using a copy-on-write clone. This shares storage between the files.
if (sourceLength != 0)
{
while ((ret = ioctl(outFd, FICLONE, inFd)) < 0 && errno == EINTR);
while ((ret = ioctl(outFd, (int)FICLONE, inFd)) < 0 && errno == EINTR);
Copy link
Contributor

@Sapana-Khemkar Sapana-Khemkar Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayakael
Why FICLONE casted to int? Second argument of ioctl function is unsigned long and not int.
This caused ppc64le build error. Can you please revert this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a proposal by @am11, as build failed without setting Wno-sign-conversion. See: #76500 (comment).

Here is the exact error:

/var/build/dotnet7/testing/dotnet7-stage0/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/src/runtime/src/native/libs/System.Native/pal_io.c(1270,36): error GBBD8A67C: implicit conversion changes signedness: 'unsigned long' to 'int' [-Wsign-conversion] [/var/build/dotnet7/testing/dotnet7-stage0/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/src/runtime/src/native/libs/build-native.proj]
/var/build/dotnet7/testing/dotnet7-stage0/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/src/runtime/src/native/libs/System.Native/pal_io.c(1270,36): error GBBD8A67C: implicit conversion changes signedness: 'unsigned long' to 'int' [-Wsign-conversion] [/var/build/dotnet7/testing/dotnet7-stage0/src/dotnet-e6dd91c290b808f971a1ac69c2fb29395bbf1051/src/runtime/src/native/libs/build-native.proj]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change last argument to unsigned long here:

#define FICLONE _IOW(0x94, 9, int)
without this (int)FICLONE cast, does it fix both linux-ppc64le and linux-musl-ppc64le builds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be incorrect - the _IOW macro uses sizeof its last argument as a component of the resulting value, so changing that size from 4 to 8 will result in an incorrect value of FICLONE.

This issue seems to be caused by some difference in the definition of ioctl with musl. In glibc, we have this:

extern int ioctl (int __fd, unsigned long int __request, ...);

Is this declared differently in musl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a ppc64le specific issue, where musl and glibc have different definitions. Rest of the platform matrix (including linux-musl-x64/arm64/arm etc..) is fine either way.

That would be incorrect - the _IOW macro uses sizeof

I see, so the current definition is also incorrect; should be size_t instead of int, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a ppc64le specific issue, where musl and glibc have different definitions. Rest of the platform matrix (including linux-musl-x64/arm64/arm etc..) is fine either way.

The only difference between ppc64le and other platforms is the numerical value of FICLONE here. On ppc64le, this value is 0x80049409 while on other platforms it is 0x40049409 due to a different definition of the _IOW macro. This means on other platforms the value is the same interpreted as int and unsigned long, so there's no compiler warning. On ppc64le, the value is negative interpreted as int and positive interpreted as unsigned long, that's why we get the warning. The type mismatch itself is currently present on all platforms.

That would be incorrect - the _IOW macro uses sizeof

I see, so the current definition is also incorrect; should be size_t instead of int, right?

No. We must use sizeof (int), i.e. 4, here; this flows into the construction of that value above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue seems to be caused by some difference in the definition of ioctl with musl. In glibc, we have this:

extern int ioctl (int __fd, unsigned long int __request, ...);

Is this declared differently in musl?

Indeed this seems to be problem. In the musl headers I see instead:

int ioctl (int, int, ...);

I guess as long as the two headers differ, there's probably no way to fix the source to work with both libcs, except for using some #ifdef ...

@ayakael
Copy link
Contributor Author

ayakael commented Feb 16, 2023

Thank you for the fix!

Would it be worthwhile to backport this and #82173 to release/7.0 and release/6.0 ? Since we target s390x and ppc64le on Alpine Linux, I'd prefer to upstream what I can. I know @am11 objected to this on the grounds that supporting a new platform goes beyond bug fixes, but I'd like to appeal given that these changes, for the most part, only express themselves when building on Alpine Linux.

@ayakael ayakael deleted the mono-musl-support branch February 16, 2023 23:38
@am11
Copy link
Member

am11 commented Feb 18, 2023

@ayakael, could you tell a bit why the s390x support in 7.0 and 6.0 is important? Normally folks tend to use "next version" for new (previously unsupported) platforms / architectures, e.g. libunwind 1.6 added riscv64 support, and Alpine package enabled it for that version: alpinelinux/aports@15c3641. They did not backport to the previous version. This way, it is easy to remember which version added support for which architecture/platform; it is also something we can announce and include in the release notes etc.

Same in the runtime repo that historically, we don't backport new platform support. It makes sense because it gives us some time to test and fix bugs / improve implementation. e.g. so far, there is no support for linux-musl-s390x in our cross infra (eng/common/cross/build-rootfs.sh, toolchain.sh, dotnet/dotnet-buildtools-prereqs-docker etc.), which is something we have kept in sync so far and considered as part of the port. Most people working on runtime do not use the source-build infra for development.

@ayakael
Copy link
Contributor Author

ayakael commented Feb 18, 2023

@ayakael, could you tell a bit why the s390x support in 7.0 and 6.0 is important? Normally folks tend to use "next version" for new (previously unsupported) platforms / architectures, e.g. libunwind 1.6 added riscv64 support, and Alpine package enabled it for that version: alpinelinux/aports@15c3641. They did not backport to the previous version. This way, it is easy to remember which version added support for which architecture/platform; it is also something we can announce and include in the release notes etc.

Same in the runtime repo that historically, we don't backport new platform support. It makes sense because it gives us some time to test and fix bugs / improve implementation. e.g. so far, there is no support for linux-musl-s390x in our cross infra (eng/common/cross/build-rootfs.sh, toolchain.sh, dotnet/dotnet-buildtools-prereqs-docker etc.), which is something we have kept in sync so far and considered as part of the port. Most people working on runtime do not use the source-build infra for development.

When it comes to which platforms we support for dotnet, we try to keep parity with Fedora. Thus, once s390x was supported on fedora-s390x, representing almost all of engineering work, porting to alpine-s390x was a matter of ironing out the musl quirks(mainly the stack issue). My reading of this PR is very just through the lense of a bug fix, as the changes aren't as extensive as a new platform support. Indeed, if the testing infrastructure was there to support it, linux-musl-s390x support should've been developped along-side linux-s390x. That said, I get your point. We already include these patches in the Alpine package for dotnet6 and dotnet7, so it doesn't change anything on Alpine's side, other than sharing the work with other musl-based platforms.

Thanks for mentioning the cross infra needs, it reminds me that I need to complete the work on dotnet/arcade#11608

@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-mono community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[mono] Assertion at sgen-stw.c:77, condition not met
8 participants