Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify OSArchitecture at run-time #60910

Merged
merged 10 commits into from
Apr 25, 2022
Merged

Conversation

am11
Copy link
Member

@am11 am11 commented Oct 27, 2021

The 32-bit .NET application running on 64-bit Unix OS reports same values from RuntimeInformation.OSArchitecture and ProcessArchitecture, rather than the run-time OS' actual architecture.

This PR fixes the issue by querying architecture info at run-time..

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 27, 2021
@ghost
Copy link

ghost commented Oct 27, 2021

Tagging subscribers to this area: @Anipik, @safern, @ViktorHofer
See info in area-owners.md if you want to be subscribed.

Issue Details

The 32-bit .NET application running on 64-bit Unix OS reports same values from RuntimeInformation.OSArchitecture and ProcessArchitecture, rather than the run-time OS' actual architecture.

This PR fixes the issue by querying architecture info at run-time..

Author: am11
Assignees: -
Labels:

area-Infrastructure-libraries

Milestone: -

@jkotas
Copy link
Member

jkotas commented Oct 27, 2021

This was discussed in #58463 and #26612. We opted to not fixing this. If we are changing the course, we should fix for all OSes.

cc @ericstj

@jkotas
Copy link
Member

jkotas commented Oct 27, 2021

What are the cases where this works vs. still does not work?

For example, is this fix going to kick in when running arm64 under qemu on physical x64 machine?

@am11
Copy link
Member Author

am11 commented Oct 27, 2021

I extracted out this function in a standalone program and ran it in a few VMs, qemu/binfmt, libvirt environments. It seems to be reporting the true architecture of kernel (considering containers use host kernel unless multiarch is configured). e.g. with qemu/binfmt (or on Alpine host, qemu-openrc) configured for arm and aarch64 on x64 linux host:

$ ID=mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.14-helix-arm32v7-20210910135806-8a6f4f3
$ docker run --platform linux/arm $ID \
   sh -c 'curl -s http://sprunge.us/63eiY6 | \
       clang -xc - -o /tmp/x && /tmp/x'

outputs OS architecture: arm (2). and with ID=mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.14-helix-arm64v8-20210910135810-8a6f4f3 and --platform linux/aarch64, it prints: OS architecture: arm64 (3).

@jkotas
Copy link
Member

jkotas commented Oct 27, 2021

Qemu is trying to overwrite utsname.machine with what is being emulated here: https://github.com/qemu/qemu/blob/2a54fc454cf0dbf173d5dc95205febe381cfb7cc/linux-user/syscall.c#L10118-L10138

Why is this emulation not kicking?

@am11
Copy link
Member Author

am11 commented Oct 27, 2021

Why is this emulation not kicking?

Qemu seems to be overwriting utsname.machine correctly in the example above. On x64 host, running arm container with qemu-arm reported OS architecture: arm (2) and aarch64 one reported OS architecture: arm64 (3).

@jkotas
Copy link
Member

jkotas commented Oct 27, 2021

Ah ok, I misread your comment. I was expecting that this will return the underlying machine architecture when running on qemu.

Is it correct that this change is only going to make difference in practice for Linux x86 binaries running on Linux x64 - because qemu is not involved in that case?

@am11
Copy link
Member Author

am11 commented Oct 27, 2021

Yes, that is the only difference. It will report x64 from dotnet x86 process running on x64 system. That is basically a direct emulation support in kernel (controlled by a build-time config CONFIG_IA32_EMULATION=y/n) without qemu like overrides.

@jkotas
Copy link
Member

jkotas commented Oct 27, 2021

Should we keep this under Linux x86 ifdef then? It does not have to be as generic as it is.

@am11
Copy link
Member Author

am11 commented Oct 28, 2021

Is there a reason why we shouldn't rely on run-time lookup?

  • Not all emulations behave like qemu (overriding unme function)
  • It matches how it is implemented on Windows.

@jkotas
Copy link
Member

jkotas commented Oct 28, 2021

It matches how it is implemented on Windows.

The Windows implementation is specific to x86 on x64 emulation. It does not work for the new true emulation targets, like arm on x64. There is a different API for those.

@am11
Copy link
Member Author

am11 commented Oct 28, 2021

Correct. My understanding is this:

  1. Different bitness: 32-bit on 64-bit case to report 64-bit OS architecture. It could be x86 on x64 or arm on aarch64 (when linux kernel is compiled with Kernel support for 32-bit EL0 = y). It behaves the same way on variety of unix, not only linux.

  2. Full emulation: this is not a new concept. FreeBSD jails, Solaris zones are present for decades. Ski predates qemu but it was only for IA64 on x64 emulation. Those act same like Rosetta2 and Windows arm emulators which papers over the underlying operating system architecture.

I think it is ok not to make any change for 2. and go against the wishes of underlying technology stack. Other platforms (python, go etc.) use the same approach AFAICT.

@jkotas
Copy link
Member

jkotas commented Oct 28, 2021

I still keep coming back to whether this change is an improvement. This API is fundamentally broken since it only handles subset of emulated OSes. Is there anything useful people can do with it outside Windows x86/x64?

Copy link
Member

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this API has value. I share Jan's concern around this thing being hard to get right predictively.

The way we deal with that is that we document the behavior and expectations for how the API changes over time.

If we encounter some new architecture we don't know about, we report unknown. Callers can decide to fallback to the process architecture if that's their preference. When we have a chance to update the API we can add a mapping for the new architecture.

If we encounter a new layer of emulation that lies to us, then we can say that we'll update in a later version to understand that emulation and report the correct result.

@am11
Copy link
Member Author

am11 commented Oct 28, 2021

If we encounter some new architecture we don't know about, we report unknown. Callers can decide to fallback to the process architecture if that's their preference. When we have a chance to update the API we can add a mapping for the new architecture.

This is not possible in the current design. The native function always reports one of the architecture which runtime supports.

If we encounter a new layer of emulation that lies to us, then we can say that we'll update in a later version to understand that emulation and report the correct result.

OK, if we want to report the host OS architecture in case of emulation rather than "whatever underlying layer is reporting", i.e. output in #60910 (comment) is not what we want, then we need to find another way to handle qemu scenario. I don't think any other platform (python, golang, java etc.) walk this extra mile, so I am not clear about the motivation.


s_osArchPlusOne = osArch + 1;

Debug.Assert(osArch >= 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any meed to moving this line? - It will be optimised out in Release/retail builds anyway during compilation.

am11 and others added 3 commits April 25, 2022 00:52
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@jkotas
Copy link
Member

jkotas commented Apr 25, 2022

Test failure is #68477

@jkotas jkotas merged commit cb5f0af into dotnet:main Apr 25, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 25, 2022
@danmoseley danmoseley added breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet labels Aug 17, 2022
@danmoseley
Copy link
Member

Per discussion offline, this needs a brief breaking change note. Essentially, documenting what @richlander observed here #73974

@jkotas
Copy link
Member

jkotas commented Aug 27, 2022

Per discussion offline, this needs a brief breaking change note.

Submitted dotnet/docs#30894

@jkotas jkotas removed the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Aug 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants