Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for vector instruction default #173

Merged
merged 2 commits into from
Aug 19, 2023
Merged

Conversation

richlander
Copy link
Member

  • .NET defaults to SSE2 instructions.
  • We should default to a newer instruction-set, for higher performance.


The minimum supported [SIMD instruction set](https://en.wikipedia.org/wiki/SIMD) for and required by .NET is [SSE2](https://en.wikipedia.org/wiki/SSE2). SSE2 predates the x64 processor by a couple years. SSE2 is old! However, since it is the minimum supported SIMD instruction set, the native code present and executed in ready-to-run images is (you guessed it) SSE2-based. That means your machine uses slower SSE2 instructions even if it (and it almost certainly does) supports the newer [AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2) instructions.We rely on the JIT to take advantage of newer (and wider) SIMD instructions than SSE2, such as AVX2. The way the JIT does this is good but not aggressive enough to matter for startup or for apps with shorter life spans (measured in minutes).

SSE4 was introduced with the [Intel Core 2](https://en.wikipedia.org/wiki/Intel_Core_(microarchitecture)) launch in 2006, along with x64. That means that all x64 chips are SSE4 capable, and that SSE2 is really just a holdover from our 32-bit computer heritage. That would make SSE2 a poor baseline for x64 software, but a good one for 32-bit.
Copy link
Member

@tannergooding tannergooding Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is potentially a bit misleading. AMD64 was the first x86-64 implementation and was released in ~2003 alongside the Opteron lineup. It had SSE and SSE2 as part of its baseline.

Intel 64 was first released in ~2004 alongside their Xeon (server) and Prescott (desktop) lineups and was referred to as EM64T at the time. I think Core 2 was just the first of the lineup to use the Intel 64 name.

SSE4.1 was announced in 2006 but released alongside Penryn in late 2007, which was a 45nm die-shrink off the Core 2. AMD released SSE4a at a similar time on their Barcelona micro-architecture.

I don't think it makes too much difference, just want to ensure that people don't accidentally think that x64 means SSE3, SSSE3, and SSE4.1 are definitely available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'll clarify that in a later update.

@omajid
Copy link
Member

omajid commented Jan 11, 2021

I wanted to share some Feedback from the Fedora (the Linux distro) community. A little over a year ago, we looked at updating the Fedora baseline from SSE2 to AVX2. After a lot of feedback from users and developers, it was rejected.

Here's the decision from the Fedora Engineering Steering Committee with the rejection.

Here's the entire email thread of the discussion with users and developers.

Granted, it's been over a year. Maybe the hardware landscape has changed significantly, and AVX2 as a baseline now makes sense.

Would it be possible for folks rebuilding .NET (eg, source-build) to opt-out of this and still stay with a SSE2 baseline?


You might be wondering about [AVX-512](https://en.wikipedia.org/wiki/AVX-512). Hardware intrinsics have not yet been defined (for .NET) for AVX-512. AVX-512 is also known to have more [nuanced performance](https://lemire.me/blog/2018/08/25/avx-512-throttling-heavy-instructions-are-maybe-not-so-dangerous/). For now, our vectorized code tops out at AVX2 for the x86-x64 [ISA](https://en.wikipedia.org/wiki/Instruction_set_architecture). Also, too few machines support AVX-512 to consider making it a default choice.

You might be wondering about Arm processors. Assuming we adopted this plan for x86-x64, we'd do something similar for Arm64. The Arm ISA defines [NEON](https://en.wikipedia.org/wiki/ARM_architecture#Advanced_SIMD_(Neon)) as part of [Armv7](https://en.wikipedia.org/wiki/ARM_architecture#32-bit_architecture), NEON (enhanced) in [Armv8A](https://en.wikipedia.org/wiki/AArch64#AArch64_features) and [SVE](https://en.wikipedia.org/wiki/AArch64#Scalable_Vector_Extension_(SVE)) as part of [Armv8.2A](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A). [SVE2](https://en.wikipedia.org/wiki/AArch64#Scalable_Vector_Extension_2_(SVE2)) will appear sometime in the future. The .NET code base isn't vectorized much or at all for Arm32, such that this proposal doesn't apply for 32-bit Arm. On Arm64, we'd need to ensure that we make choices that take Raspberry Pi, laptops and any other form factors into consideration.
Copy link
Member

@tannergooding tannergooding Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ARM64, we currently consider System.Runtime.Intrinsics.Arm.AdvSimd as a "baseline" instruction set and is required by any ARM64 chip running via ryuJIT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Advanced SIMD the same thing as "advanced NEON", as opposed to SVE/SVE2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be. AFAIK, NEON is the branded name for when it was introduced on ARM32 chips, while AdvSIMD is the name used for ARM64 machines and in the architecture manual.

There are some minor differences between them, but they are largely compatible and we designed the System.Runtime.Intrinsics.Arm namespace with that in mind (with input from the folks at ARM).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Wikipedia article refer to Advanced SIMD as NEON in two places:

I have no horse in this race. Just sharing what I read.

@richlander
Copy link
Member Author

@omajid -- Great context / feedback.

I wanted to share some Feedback from the Fedora (the Linux distro) community

I think that that this is good context for us to consider in the .NET community. My personal feeling is that an application platform can make a more aggressive choice that an OS. Order-wiser, you can imagine app platforms making this decision first, and OS platforms going last.

Fedora is -- to my knowledge -- more of a desktop OS. Desktop OS users are going to value compat higher. A cloud-first OS would -- I assume -- value maximum optimization higher.

Fair?

Would it be possible for folks rebuilding .NET (eg, source-build) to opt-out of this and still stay with a SSE2 baseline?

Yes. That would be the intention and would be a good value-prop to call out. Will add that in an update.


There are possible downsides to this proposal. If we target AVX2 by default, people with non-AVX2 machines will have a significantly worse experience. The goal of this document is to propose a solution that delivers most of the expected benefit of AVX2 without much performance regression pain. It is easy to fall into the trap that we only need to worry about developers and think that developers only use the very latest machines. That's not true. There are also developers building client apps for end-users. We also continue to support Windows Server 2012 R2. Some of those machines are likely very old. There is also likely significant differences by country, possibly correlating to GDP. Needless to say, people using .NET as a developer or an end-user use a great variety of machines.

At the same time, we need to make choices that move .NET metrics and capabilities forward. Let's explore this topic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the document is referring to SSE vs AVX2, but what about the additional ISAs such as FMA, BMI1, BMI2, and LZCNT which are separate ISAs but which were introduced alongside AVX2 in Haswell?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. I think this proposal provides a certain flavor and frame on what we should do. I'd look to someone like you to suggest an aligned proposal for those things. Assume that this proposal is accepted, what other analogous choices should we make? In fact @jkotas already suggested some related ideas (different from yours) that we should consider at the same time. Feel free to add your ideas as an appendix to this doc or write a separate one (or give me draft content to add to this). Whatever you'd like.

Let's take a look at each operating system, assuming the Intel ISA.

* On Windows, we need to be conservative. There are a lot of .NET users on Windows, and the majority of .NET desktop apps target Windows. Windows 10 requires AVX, but Win7 requires only SSE2. It is reasonable to expect that we'll continue supporting Windows 7 with .NET 6.0 (and probably not after that).
* On [macOS](https://en.wikipedia.org/wiki/MacOS_version_history#Version_10.13:_%22High_Sierra%22), it appears that [Mac machines have had AVX2 since late 2013](https://en.wikipedia.org/wiki/List_of_Macintosh_models_grouped_by_CPU_type#Haswell), when they adopted [Haswell chips](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2). In terms of macOS, [macOS Big Sur (11.0)](https://support.apple.com/kb/sp833?locale=en_US) appears to be the first version to require Haswell. .NET 5.0, for example, supports [macOS High Sierra (10.13)](https://support.apple.com/kb/SP765?locale=en_US) and later. macOS 10.13 supports hardware significantly before Haswell. For .NET 6.0, we'll likely continue to choose a "10." macOS version as the minimum, possibly [macOS Catalina (10.15)](https://support.apple.com/kb/SP803?locale=en_US), aligning with our past practice of slowly (not aggressively) moving forward OS requirements. Zooming out, macOS x64 is now a legacy platform. We shouldn't rock the boat with a change that could cause a performance regression at this late stage of the macOS x64 lifecycle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on macOS. Macs are expensive and people hold onto them until they die. We have mac machines in our own dotnet CI build pools that don't have AVX2.

I learned the hard way:

https://github.com/MichalStrehovsky/runtimelab/blob/9752e8627cf08c60efbd6f21faecd7d098ad5581/src/tests/nativeaot/SmokeTests/HardwareIntrinsics/x64Vex.csproj#L24-L32

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

people hold onto them until they die.

Or in my case, I'm still holding onto a Macbook Pro from ~2011/2012 because it has AVX but no AVX2 support. This proved immensely useful at catching/repro'ing bugs when first introducing the x86 hardware intrinsics.


The primary issue is that ready to run code for (what is likely to be) performance sensitive methods would no longer be usable on SSEx machines, but would require JITing IL methods instead. The resultant native code would be the same (actually, it would be better; we'll leave that topic for another day), but would require time to produce via JIT compilation. As a result, startup time for apps on SSEx machines would be noticeably degraded. We guess the startup regression would be unacceptable, but need to measure it.

Making AVX2 the default would be the same as dropping support for SSEx. If we do that, we'd need to announce it, and reason about what an SSEx offering looks like, if any.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as dropping support or just that it may cause more initial jitting for hardware without AVX2 support?

That is, if an image is R2R generated for AVX2, but the hardware doesn't support AVX2, would the app not be runnable or would it just have additional startup overhead because it would have to treat the R2R code as invalid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the app not be runnable or would it just have additional startup overhead because it would have to treat the R2R code as invalid?

Just slower. We don't plan a failure case is this scenario.

Is this the same as dropping support or just that it may cause more initial jitting for hardware without AVX2 support?

We haven't done performance measurements yet. Once we have those, it will be easier to answer your question. The size of the cost is really the answer.

I had the wording "in essence" earlier. I should add it back. That better describes what I was trying to say. If the performance regression is high, then the experience will be unusable, which is "in essence" the same as dropping support. We'll see.

@jkotas
Copy link
Member

jkotas commented Jan 11, 2021

My personal feeling is that an application platform can make a more aggressive choice that an OS.

The consequences of using AVX2 as the baseline for Fedora are different from consequences of using AVX2 for R2R images.

If AVX2 is used as baseline for Fedora, new Fedora simply won't run on older hardware.
If AVX2 is used as the default for R2R image, .NET runtime will still run on older hardware, but it will be slower.

Note that we are unconsciously making .NET run slower on older hardware since we are not doing performance testing on older hardware. It is likely that there are undetected performance regressions sneaking through that affect older hardware only.

@dasMulli
Copy link

dasMulli commented Jan 11, 2021

I'd suggest this default could be:

  • Overwritten by the developer by modifying some publish properties
  • Based on the App Model

E.g. WinForms/WPF/MAUI-on-windows Apps could continue to default to SSE2 for a bit to be "compatible" (not slowing down older enterprisey laptops that only run citrix and word anyway) while ASP.NET Core / Worker (as building in Docker?) could default to AVX2.

* Linux Arm32 -- N/A (there is very little vectorized code)
* Windows Arm32 -- N/A (unsupported platform)

Note on [NEON](https://en.wikipedia.org/wiki/ARM_architecture#Advanced_SIMD_(Neon)): The [Raspberry Pi 3 (and later) supports Armv8 NEON](https://en.wikipedia.org/wiki/Raspberry_Pi#Specifications). Apple M1 chips apparently have [great NEON performance](https://lemire.me/blog/2020/12/13/arm-macbook-vs-intel-macbook-a-simd-benchmark/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not called out anywhere in here, but Windows ARM64 and "Apple Hardware" both have x86/x86-64 emulation layers.
For Windows, this includes emulation of all ISAs up to and including SSE4.1 (I imagine it is similar for Apple).

While users would ideally target ARM on ARM hardware, it is likely useful to know that SSE4.1 is effectively a "baseline" for such scenarios and SSE4.2+ is not available.

@richlander
Copy link
Member Author

@dasMulli -- Maybe not clear, but that's more or less what is proposed.

Developers may also have concerns about their end-users. They will be able to generate self-contained apps that are re-compiled (via crossgen2) to target an earlier SIMD instruction set (such as SSE2).

You could do this via project properties (that don't yet exist).

The Windows 32-bit offering should satisfy developers on very old Windows machines.

The proposal don't say this (and should be updated) but you can also target 32-bit Windows if you want maximum reach for apps.

I would propose not target 32-bit or self-contained for any app type, just like today.


Let's take a look at each operating system, assuming the Intel ISA.

* On Windows, we need to be conservative. There are a lot of .NET users on Windows, and the majority of .NET desktop apps target Windows. Windows 10 requires AVX, but Win7 requires only SSE2. It is reasonable to expect that we'll continue supporting Windows 7 with .NET 6.0 (and probably not after that).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know when Win10 started requiring AVX (and if it is AVX or AVX2)?

The current docs still indicate it just requires SSE2, CMPXCHG16B, and a few other features: https://docs.microsoft.com/en-us/windows-hardware/design/minimum/minimum-hardware-requirements-overview#section-30---minimum-hardware-requirements-for-windows10-for-desktop-editions

AFAIK, Intel still produces some low power chips (such as the Atom lineup) which don't have AVX+ and which might appear in tablets or other "low" power devices.
(Certainly not the developer/server market, but it might be a factor for things like Paint.NET or other user-oriented .NET apps that might make their way to .NET 5+).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am likely wrong on the Windows AVX topic. I don't know where I read that.

The low-powered targets should be satisfied by targeting/using the 32-bit product or publishing (and recompiling) as self-contained.

@SingleAccretion
Copy link

How easy would it be for a developer without an AVX2-enabled machine (e. g. me) to obtain the shared framework build for a lower baseline, to avoid the impact of Jit'ting the CoreLib on the inner loop? Note that I do not think that this option should be present on the .NET downloads page or anything like that, at the same time, would, e. g. dotnet/installer have builds like these? I assume not, since that's a support burden, and maybe an unreasonable one. For me personally building dotnet/runtime from source would be an acceptable solution (meaning that basic validation of the build is performed in CI at least on a weekly basis).

This in essence is the same question as the one @omajid had, only scoped down from a community to a single person.

FWIW, I think that for a platform as a whole, transitioning to an AVX2 baseline would be a net win.

@richlander
Copy link
Member Author

@SingleAccretion -- Great question. It would not be easy. That's the rub of this proposal.

What OS do you use?

@SingleAccretion
Copy link

@richlander Windows 10, the processor is Ivy Bridge.

@richlander
Copy link
Member Author

@SingleAccretion -- Could you use the 32-bit product instead? The proposal is to leave it as SSE2.

@SingleAccretion
Copy link

@richlander Hmm, I am afraid that would not be an option for me personally, since my field of interest (at least right now, but I would not expect that to change in the next few years) could be described as "writing high-performance code in .NET" (and, in the future, contributing to the platform), which more or less excludes 32 bit as an option (since the majority of such code is written for 64 bit). I realize that my situation is a unique one, and would again reemphasize that I personally would be fine with a solution that's just workable, it doesn't need to be nice.

@richlander
Copy link
Member Author

@SingleAccretion -- Thanks for the context. Your machine is likely already on the edge of being able to satisfy your goal if we define AVX2 as being a key part of writing high-performance code. That said, there are many aspects of writing high-performance code that are not dependent on either SIMD or pointer size. It will become a question for you on what it is you are focused on for high-performance code. As an aside, 32-bit code is known to be faster and more efficient in various scenarios. 32-bit code is not slow.

We will be spending more time on making it easier to build the product from source in 6.0. That will help for anyone what wants a runtime with different code and/or characteristics.

Supporting ecosystem data:

- [Steam hardware survey](https://store.steampowered.com/hwsurvey) (see "Other Settings")
- [Azure VMs](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/series/) -- these all support AVX2; we expect same for other clouds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS Lambda launched support for AVX2 in November 2020. https://aws.amazon.com/blogs/compute/creating-faster-aws-lambda-functions-with-avx2/

Fargate, the AWS managed compute environment for containers, also supports AVX2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Those changes are very much inline with this proposal.

* On [macOS](https://en.wikipedia.org/wiki/MacOS_version_history#Version_10.13:_%22High_Sierra%22), it appears that [Mac machines have had AVX2 since late 2013](https://en.wikipedia.org/wiki/List_of_Macintosh_models_grouped_by_CPU_type#Haswell), when they adopted [Haswell chips](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2). In terms of macOS, [macOS Big Sur (11.0)](https://support.apple.com/kb/sp833?locale=en_US) appears to be the first version to require Haswell. .NET 5.0, for example, supports [macOS High Sierra (10.13)](https://support.apple.com/kb/SP765?locale=en_US) and later. macOS 10.13 supports hardware significantly before Haswell. For .NET 6.0, we'll likely continue to choose a "10." macOS version as the minimum, possibly [macOS Catalina (10.15)](https://support.apple.com/kb/SP803?locale=en_US), aligning with our past practice of slowly (not aggressively) moving forward OS requirements. Zooming out, macOS x64 is now a legacy platform. We shouldn't rock the boat with a change that could cause a performance regression at this late stage of the macOS x64 lifecycle.
* On Linux, we can assume that there is at least as much diversity of hardware as Windows, however, we can also assume that .NET usage on Linux is more narrow, targeted at developers and (mostly) production deployments.

The following is a draft plan on how to approach different systems for this proposal:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the developer knows what their publishing platform supports better then the defaults used by .NET will there be a way for the developer to override what the JIT and Ready to Run will use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

The JIT already does the right thing, no no issue there. For R2R, you can create a self-contained app, and re-compile the app and .NET libraries for a different SIMD instruction set. We don't have a gesture for specifying a SIMD instruction set yet, but it is the basis for this whole proposal.

* Windows x86 -- Target SSE2 (status-quo; match Windows 7)
* Windows x64 -- Target AVX2
* Linux x64 -- Target AVX2
* macOS x64 -- Target SSE2 (status-quo; alternatively, target SSE4 if it is straightforward and has significant value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target SSE4 if it is straightforward and has significant value

I'd drop this alternative part and just leave it with SSE2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but could you explain why? It is unlikely that anyone would notice. SSE4 has been in place for about 15 years.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not worth the effort at this point and there is non-zero risk/cost associated with this (e.g. riskier cross-releases backports).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the cross-release backport risk (this is just a crossgen default), but I agree we should stick with SSE2 and that's the recommendation. I think that's what we'll do. I'll delete the SSE4 in a later update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree.

SSE4.1 is a 13-15 year old baseline and is almost a requirement to get decent perf out of many of the current vectorized code paths in the BCL. Otherwise, you are missing fairly "basic" operations like blend, dot product, and align right, which are used in things ranging from the System.Numerics.Vector types to the text encoding/decoding algorithms backing string.

  • It's so old that the x86 emulation layers in Windows and Apple Silicon are able to emulate it and make it "available"
  • We don't even provide just Sse2 fallbacks for some of our code paths that use Ssse3/Sse41 because it would make the code more complicated for likely little practical benefit. Instead, users just get the "software" (non-vectorized) fallback. For example: https://source.dot.net/#http2cat/StringUtilities.cs,703

For macOS, in particular, 10.13 (High Sierra) is the oldest still maintained version. Looking at the hardware requirements, it is a Late 2009 MacBook at the worst: https://support.apple.com/kb/SP765?viewlocale=en_US&locale=en_US

Looking at the technical specs: https://support.apple.com/kb/SP548?locale=en_US, this is running a Core 2 Duo which has SSE4.1

While there may be some users on even older hardware or versions (potentially back to snow leopard) they certainly aren't supported by Apple nor will they likely be supported by .NET 6 based on what versions of macOS we currently support.

I would imagine similar exists for practically all OEM computers running Windows 7 or later. Windows 7 was released in 2009 and while there are likely some users that have upgraded to it from XP or Vista based machines, I would speculate those are likely few and far between in practice.

Copy link
Member Author

@richlander richlander Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's measure and see what the difference is. I agree that the compat risk here is very low.

That said, I don't think changing macOS x64 is worth it.

@richlander
Copy link
Member Author

Update ... we intend to get some variant of this proposal implemented in .NET 6.0 Preview 3. There is some change that it could happy in Preview 2, but unlikely. I'll share more when I know more.

@richlander
Copy link
Member Author

We're almost done .NET 6. All of the underlying capability for this feature got built and it has been tested. We'll likely turn it on early in .NET 7. We ran out of runway to make it real in .NET 6.

@FirehawkV21
Copy link

Now, I have this random thought And I'll probably make it a proposal, but I wonder if CrossGen2 could generate R2R code for each instruction set. This could allow the dev to pick which instruction sets to optimize and still be compatible.

@richlander
Copy link
Member Author

Good idea. It does that already as of .NET 6. I don't believe that it is documented and it may still be inconvenient to use. This is definitely something to improve.

We are still working on the plan that I documented. It is a bit hard. We discovered and resolved some obscure bugs and there are non-obvious perf implications we are still considering (for old machines, new cheap machines, and emulators). You would think this would be easy but it isn't. For folks creating self-contained apps, this should be easier. We should document all the issues at some point.

@trylek

@richlander
Copy link
Member Author

There isn't a near term path to making this change, as intended.

The basic problem is that it offers asymmetric winners and losers. The potential wins are likely to be nice, but loses would be dramatic. There is more detail as to why this is the case, but it doesn't really matter.

The plan was oriented on the idea that all machines are uniform and that we can cut off support for old machines at some date. This isn't true. It turns out that low-end machines (surprise!) ship with lower-end chips (that may not have AVX/2) and (even more relevant) emulators sometimes support SSE4 and no higher. x64 (on Arm64) emulators were not part of our thinking when this proposal was written.

The composite images we are shipping in .NET 8 use AVX/2. They are intended for cloud production scenarios where AVX/2 hardware is plentiful.

@richlander richlander closed this Aug 17, 2023
@rickbrew
Copy link

@richlander So did .NET 7 not ship with crossgen/R2R binaries utilizing AVX2? I could've sworn I saw that change go in, maybe I'm mistaken

@richlander
Copy link
Member Author

I do not believe so. We still run into bugs because of the mis-match between R2R and JIT on their SIMD preference.

If you want tips on how to use AVX2, do reach out to me.

@richlander richlander reopened this Aug 19, 2023
@richlander
Copy link
Member Author

I realized that this file was in the proposed directory. Even though we didn't follow through with the plan, it makes sense to merge the spec into that directory.

@richlander richlander merged commit 6f92b6c into main Aug 19, 2023
@richlander richlander deleted the vectorinstructions branch August 19, 2023 18:10
@richlander
Copy link
Member Author

richlander commented Aug 19, 2023

Search for "composite" to see a variant of the product that DOES implement this spec.

https://hub.docker.com/_/microsoft-dotnet-aspnet/

Also, native AOT apps can do the same.

https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/docs/optimizing.md

<PropertyGroup>
   <IlcInstructionSet>x86-x64-v3</IlcInstructionSet>
</PropertyGroup>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.