Adding AVX512 path to Base64 encoding/Decoding #92241

DeepakRajendrakumaran · 2023-09-18T19:16:43Z

Overview

This PR implements an AVX512 code path for Base64 encoding/Decoding. This is based on the work by Wojciech Muła and Daniel Lemire. There is a fast AVX512VBMI path and the fallback uses AVX512F/BW - I'll refer to these as VBMI_AVAILABLE and VBMI_UNAVAILABLE here on. For performance purposes, this will be compared to an AVX2 implementation which will be referred to as BASE_Version
Reference for the algorithm:

https://arxiv.org/pdf/1910.05109.pdf

This version uses intrinsics directly and not generic vector libraries due to lack of current support in JIt/Vector libraries to produce optimal code. Some additional support which would be required in order to use generic vector library for implementing this would be

Add ShuffleUnsafe for Vector512
Extend Vector512.Shuffle() to lower to intrinsics instead of going to fallback for more cases.
Expand Vector512 surface area to incorporate more high level functions

Even the current implementation can be further optimized by adding the multishift() implementation. This is a further optimization

Generated code

(Will be focusing on the actual encoding/decoding code within the loop only)

Encoding

VBMI_AVAILABLE

VBMI_UNAVAILABLE

BASE_VERSION

Decoding

VBMI_AVAILABLE

VBMI_UNAVAILABLE

BASE_VERSION

Performance

ON ICX -

BASE_VERSION vs VBMI_UNAVAILABLE

BASE_VERSION vs VBMI_AVAILABLE

DeepakRajendrakumaran · 2023-09-18T19:17:56Z

@BruceForstall @tannergooding @dotnet/avx512-contrib

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Encoder.cs

danmoseley · 2023-09-19T01:05:32Z

Do we need a new entry in the third party notices file?

EgorBo · 2023-09-19T09:53:07Z

Do we need a new entry in the third party notices file?

We should already have them if I am not mistaken (unless avx512 uses a different article)

DeepakRajendrakumaran · 2023-09-19T16:01:12Z

Do we need a new entry in the third party notices file?

We should already have them if I am not mistaken (unless avx512 uses a different article)

I'm not sure about this-

The original sse and avx implementations use a similar algorithm and this is the reference provided - https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Encoder.cs#L13-L14.

This implementation(the VBMI version) uses a modified version based on this(https://github.com/dotnet/runtime/pull/92241/files#diff-db463201901c2d83d2b563871ae11fafee9d5afe94e4d014b77212996b25f770R635) - https://arxiv.org/pdf/1910.05109.pdf

There is an AVX512 based implementation in there - https://github.com/aklomp/base64/tree/master/lib/arch/avx512. But it has only the encode and it requires 'multishift' which we do not currently support

danmoseley · 2023-09-19T20:06:46Z

We generally acknowledge significant reuse in the TPN file even if there's a link from the sources. I see

runtime/THIRD-PARTY-NOTICES.TXT

Line 345 in 17b60e3

License notice for vectorized base64 encoding / decoding

but I'm not sure it points to that pdf (eg it doesn't include Lemire in the list)

but, whatever @EgorBo recommends..

EgorBo · 2023-09-19T20:24:52Z

I am not an expert in THIRD-PARTY-NOTICES.TXT 🙂 but seems like the link to that article is worth adding here https://github.com/dotnet/runtime/blob/main/THIRD-PARTY-NOTICES.TXT#L345

There is an AVX512 based implementation in there - https://github.com/aklomp/base64/tree/master/lib/arch/avx512.

Do we use any of the code from that repo? I see it has BSD-2 license

DeepakRajendrakumaran · 2023-09-19T20:39:44Z

I am not an expert in THIRD-PARTY-NOTICES.TXT 🙂 but seems like the link to that article is worth adding here https://github.com/dotnet/runtime/blob/main/THIRD-PARTY-NOTICES.TXT#L345

There is an AVX512 based implementation in there - https://github.com/aklomp/base64/tree/master/lib/arch/avx512.

Do we use any of the code from that repo? I see it has BSD-2 license

Not directly but the logic(including shuffle constants are similar). But this implementation uses '_mm512_multishift_epi64_epi8' and that's not the one I'm using. This(https://github.com/WojciechMula/base64simd/blob/master/encode/encode.avx512vbmi.cpp) in particular is probably worth mentioning in hindsight(Since I used the non multishift version in my implementation with VBMI)

I used the below as resources to understand available implementations and available options(But they are all related). Which brings up the question which of these exactly I should be referencing

esp this for understanding : http://0x80.pl/notesen/2016-04-03-avx512-base64.html#id29

https://github.com/WojciechMula/base64simd/tree/master
https://arxiv.org/pdf/1910.05109.pdf
https://github.com/lemire/fastbase64/blob/master/src/fastavx512bwbase64.c
https://github.com/aklomp/base64/tree/master/lib/arch/avx512

ghost · 2023-09-20T10:59:34Z

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

Overview

This PR implements an AVX512 code path for Base64 encoding/Decoding. This is based on the work by Wojciech Muła and Daniel Lemire. There is a fast AVX512VBMI path and the fallback uses AVX512F/BW - I'll refer to these as VBMI_AVAILABLE and VBMI_UNAVAILABLE here on. For performance purposes, this will be compared to an AVX2 implementation which will be referred to as BASE_Version
Reference for the algorithm:

https://arxiv.org/pdf/1910.05109.pdf

This version uses intrinsics directly and not generic vector libraries due to lack of current support in JIt/Vector libraries to produce optimal code. Some additional support which would be required in order to use generic vector library for implementing this would be

Add ShuffleUnsafe for Vector512
Extend Vector512.Shuffle() to lower to intrinsics instead of going to fallback for more cases.
Expand Vector512 surface area to incorporate more high level functions

Even the current implementation can be further optimized by adding the multishift() implementation. This is a further optimization

Generated code

(Will be focusing on the actual encoding/decoding code within the loop only)

Encoding

VBMI_AVAILABLE

VBMI_UNAVAILABLE

BASE_VERSION

Decoding

VBMI_AVAILABLE

VBMI_UNAVAILABLE

BASE_VERSION

Performance

ON ICX -

BASE_VERSION vs VBMI_UNAVAILABLE

BASE_VERSION vs VBMI_AVAILABLE

Author:	DeepakRajendrakumaran
Assignees:	-
Labels:	`area-System.Buffers`, `community-contribution`, `needs-area-label`
Milestone:	-

DeepakRajendrakumaran · 2023-09-20T23:20:03Z

I am not an expert in THIRD-PARTY-NOTICES.TXT 🙂 but seems like the link to that article is worth adding here https://github.com/dotnet/runtime/blob/main/THIRD-PARTY-NOTICES.TXT#L345

There is an AVX512 based implementation in there - https://github.com/aklomp/base64/tree/master/lib/arch/avx512.

Do we use any of the code from that repo? I see it has BSD-2 license

Not directly but the logic(including shuffle constants are similar). But this implementation uses '_mm512_multishift_epi64_epi8' and that's not the one I'm using. This(https://github.com/WojciechMula/base64simd/blob/master/encode/encode.avx512vbmi.cpp) in particular is probably worth mentioning in hindsight(Since I used the non multishift version in my implementation with VBMI)

I used the below as resources to understand available implementations and available options(But they are all related). Which brings up the question which of these exactly I should be referencing

esp this for understanding : http://0x80.pl/notesen/2016-04-03-avx512-base64.html#id29

https://github.com/WojciechMula/base64simd/tree/master https://arxiv.org/pdf/1910.05109.pdf https://github.com/lemire/fastbase64/blob/master/src/fastavx512bwbase64.c https://github.com/aklomp/base64/tree/master/lib/arch/avx512

@EgorBo I modified the reference to point to https://github.com/WojciechMula/base64simd/tree/master. Which has the closest versions to the implementation I went with. Will this require me adding anything to notice?

DeepakRajendrakumaran · 2023-09-21T18:06:24Z

I am not an expert in THIRD-PARTY-NOTICES.TXT 🙂 but seems like the link to that article is worth adding here https://github.com/dotnet/runtime/blob/main/THIRD-PARTY-NOTICES.TXT#L345

There is an AVX512 based implementation in there - https://github.com/aklomp/base64/tree/master/lib/arch/avx512.

Do we use any of the code from that repo? I see it has BSD-2 license

Not directly but the logic(including shuffle constants are similar). But this implementation uses '_mm512_multishift_epi64_epi8' and that's not the one I'm using. This(https://github.com/WojciechMula/base64simd/blob/master/encode/encode.avx512vbmi.cpp) in particular is probably worth mentioning in hindsight(Since I used the non multishift version in my implementation with VBMI)
I used the below as resources to understand available implementations and available options(But they are all related). Which brings up the question which of these exactly I should be referencing
esp this for understanding : http://0x80.pl/notesen/2016-04-03-avx512-base64.html#id29
https://github.com/WojciechMula/base64simd/tree/master https://arxiv.org/pdf/1910.05109.pdf https://github.com/lemire/fastbase64/blob/master/src/fastavx512bwbase64.c https://github.com/aklomp/base64/tree/master/lib/arch/avx512

@EgorBo I modified the reference to point to https://github.com/WojciechMula/base64simd/tree/master. Which has the closest versions to the implementation I went with. Will this require me adding anything to notice?

Have updated THIRD PARTY NOTICE based on conversation with @tannergooding . Removing fallback avx512Bw path meant I had to add only 2 references.

EgorBo

Thanks! Looks way simpler now

DeepakRajendrakumaran · 2023-10-09T18:26:06Z

@tannergooding @BruceForstall Any comments on this? I'd be great if we can move this forward this week,

BruceForstall · 2023-10-10T02:30:48Z

I'm not the right person to review this. If @tannergooding can't review, maybe @stephentoub can review (or pick an appropriate reviewer).

tannergooding · 2023-10-24T15:32:23Z

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs

+
+            // This algorithm requires AVX512VBMI support.
+            // Vbmi was first introduced in CannonLake and is avaialable from IceLake on.
+            // This makes it okay to use Vbmi instructions since Vector512.IsHardwareAccelerated returns True only from IceLake on.


This comment isn't quite accurate.

Vector512.IsHardwareAccelerated can be made to return true for Skylake-X and up to before IceLake via an environment variable. This is why the caller has the check for Vector512.IsHardwareAccelerated && Avx512Vbmi.IsSupported and why this function has [CompExactlyDependsOn(typeof(Avx512Vbmi))]

We're fine with it not being usable pre IceLake since those often incur heavier downclocking and its unnecessary complexity for a non-default scenario.

Removed line 667

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs

tannergooding · 2023-10-24T15:35:52Z

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs

+                str = Avx512Vbmi.PermuteVar64x8(multiAdd2.AsByte(), vbmiPackedLanesControl).AsSByte();
+
+                AssertWrite<Vector512<sbyte>>(dest, destStart, destLength);
+                Vector512.Store(str.AsByte(), dest);


nit: Most other places in the JIT we do str.Store(dest) since its an extension method and can be accessed using instance syntax.

I'm not sure I fully understand how this works

It's likely conflicting because dest is a byte* while str is a Vector512<sbyte> and so it can't resolve

str.Store((sbyte*)dest) should fix it, or str.AsByte().Store(dest). The former is less IL, most notably.

Ah..I messed up and was using str.AsSbyte().Store(dest) It's fixed now. Thank you

tannergooding

LGTM. Just a request to cleanup a couple minor things.

Signed-off-by: Deepak Rajendrakumaran <deepak.rajendrakumaran@intel.com>

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Encoder.cs

DeepakRajendrakumaran · 2023-10-25T16:32:19Z

LGTM. Just a request to cleanup a couple minor things.

I've committed the clean-up changes. Please let me know if you want me to make any other changes

* Adding AVX512 path to Base64 encoding/Decoding * Addressing review Comments. Signed-off-by: Deepak Rajendrakumaran <deepak.rajendrakumaran@intel.com> * Removing fallback path. * Updating Third Party Notice. * Addressing review comments --------- Signed-off-by: Deepak Rajendrakumaran <deepak.rajendrakumaran@intel.com>

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 18, 2023

ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 18, 2023

EgorBo reviewed Sep 18, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs Outdated Show resolved Hide resolved

EgorBo reviewed Sep 18, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs Outdated Show resolved Hide resolved

EgorBo reviewed Sep 18, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs Outdated Show resolved Hide resolved

EgorBo reviewed Sep 18, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Encoder.cs Outdated Show resolved Hide resolved

DeepakRajendrakumaran force-pushed the encoding branch from 8b593f6 to d29d4d0 Compare September 19, 2023 00:14

build-analysis bot mentioned this pull request Sep 19, 2023

Tracking issue for CI build timeouts #76454

Closed

marek-safar added the area-System.Buffers label Sep 20, 2023

build-analysis bot mentioned this pull request Sep 20, 2023

NuGet failing with Response status code does not indicate success: 503 (Service Unavailable) dotnet/arcade#11723

Open

5 tasks

EgorBo approved these changes Sep 21, 2023

View reviewed changes

This was referenced Sep 21, 2023

MSBuild crashing in the build #92290

Open

ReadsWritesClosedFinish_StreamDisposed test failures #92350

Closed

BruceForstall added avx512 Related to the AVX-512 architecture and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Sep 25, 2023

BruceForstall added this to the 9.0.0 milestone Sep 25, 2023

BruceForstall requested a review from tannergooding September 27, 2023 23:02

BruceForstall mentioned this pull request Oct 12, 2023

Intel architecture improvements for .NET 9 #93196

Closed

33 tasks

tannergooding reviewed Oct 24, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs Show resolved Hide resolved

tannergooding reviewed Oct 24, 2023

View reviewed changes

tannergooding approved these changes Oct 24, 2023

View reviewed changes

DeepakRajendrakumaran added 4 commits October 24, 2023 09:57

Adding AVX512 path to Base64 encoding/Decoding

911b517

Addressing review Comments.

7b351ad

Signed-off-by: Deepak Rajendrakumaran <deepak.rajendrakumaran@intel.com>

Removing fallback path.

7fb00e3

Updating Third Party Notice.

2969ddb

MihaZupan reviewed Oct 24, 2023

View reviewed changes

DeepakRajendrakumaran force-pushed the encoding branch 2 times, most recently from 4b2f7d1 to fc05beb Compare October 24, 2023 20:28

Addressing review comments

fc05beb

tannergooding merged commit 9ad24ae into dotnet:main Oct 25, 2023
175 checks passed

ghost locked as resolved and limited conversation to collaborators Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding AVX512 path to Base64 encoding/Decoding #92241

Adding AVX512 path to Base64 encoding/Decoding #92241

DeepakRajendrakumaran commented Sep 18, 2023

DeepakRajendrakumaran commented Sep 18, 2023

danmoseley commented Sep 19, 2023

EgorBo commented Sep 19, 2023

DeepakRajendrakumaran commented Sep 19, 2023

danmoseley commented Sep 19, 2023

EgorBo commented Sep 19, 2023

DeepakRajendrakumaran commented Sep 19, 2023

ghost commented Sep 20, 2023

Overview

Generated code

Encoding

Decoding

Performance

ON ICX -

DeepakRajendrakumaran commented Sep 20, 2023

DeepakRajendrakumaran commented Sep 21, 2023

EgorBo left a comment

DeepakRajendrakumaran commented Oct 9, 2023

BruceForstall commented Oct 10, 2023

tannergooding Oct 24, 2023

DeepakRajendrakumaran Oct 24, 2023

tannergooding Oct 24, 2023

DeepakRajendrakumaran Oct 24, 2023

tannergooding Oct 24, 2023

DeepakRajendrakumaran Oct 24, 2023

tannergooding left a comment

DeepakRajendrakumaran commented Oct 25, 2023

Adding AVX512 path to Base64 encoding/Decoding #92241

Adding AVX512 path to Base64 encoding/Decoding #92241

Conversation

DeepakRajendrakumaran commented Sep 18, 2023

Overview

Generated code

Encoding

Decoding

Performance

ON ICX -

DeepakRajendrakumaran commented Sep 18, 2023

danmoseley commented Sep 19, 2023

EgorBo commented Sep 19, 2023

DeepakRajendrakumaran commented Sep 19, 2023

danmoseley commented Sep 19, 2023

EgorBo commented Sep 19, 2023

DeepakRajendrakumaran commented Sep 19, 2023

ghost commented Sep 20, 2023

Overview

Generated code

Encoding

Decoding

Performance

ON ICX -

DeepakRajendrakumaran commented Sep 20, 2023

DeepakRajendrakumaran commented Sep 21, 2023

EgorBo left a comment

Choose a reason for hiding this comment

DeepakRajendrakumaran commented Oct 9, 2023

BruceForstall commented Oct 10, 2023

tannergooding Oct 24, 2023

Choose a reason for hiding this comment

DeepakRajendrakumaran Oct 24, 2023

Choose a reason for hiding this comment

tannergooding Oct 24, 2023

Choose a reason for hiding this comment

DeepakRajendrakumaran Oct 24, 2023

Choose a reason for hiding this comment

tannergooding Oct 24, 2023

Choose a reason for hiding this comment

DeepakRajendrakumaran Oct 24, 2023

Choose a reason for hiding this comment

tannergooding left a comment

Choose a reason for hiding this comment

DeepakRajendrakumaran commented Oct 25, 2023