Vectorize Convert.ToBase64String using SSSE3 #21833

EgorBo · 2019-01-06T16:53:11Z

This PR improves Convert.ToBase64String performance using SSSE3 instructions.
It's based on "Base64 encoding with SIMD instructions" article by Wojciech Muła

Benchmark:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;

namespace ConsoleApp143
{
    public class ToBase64StringBenchmarks
    {
        public static IEnumerable<object[]> TestDataForGraph()
        {
            var rand = new Random(314666); // fixed "seed"
            for (int i = 0; i < 100; i++)
            {
                var data = new byte[i];
                for (int j = 0; j < i; j++)
                    data[j] = (byte)rand.Next(0, byte.MaxValue);
                yield return new object[] { data, i };
            }
        }

        [Benchmark]
        [ArgumentsSource(nameof(TestDataForGraph))]
        public string ToBase64(byte[] testData, int inputSize /* argument for report */) =>
            Convert.ToBase64String(testData, Base64FormattingOptions.InsertLineBreaks);

        static unsafe void Main(string[] args) => 
            BenchmarkSwitcher.FromAssembly(typeof(ToBase64StringBenchmarks).Assembly).Run(args);
    }
}

Windows 10.0.17134.523, Core i7-8700K 3.7GHz (Coffee Lake):

macOS 10.13.6, Core i7-4980HQ 2.8GHz (Haswell):

SSSE3-based implementation is limited with input.Length>36 condition in order to avoid regressions for smaller values (the best value for my Skylake, Coffee Lake and Haswell based machines).

stephentoub · 2019-01-07T02:19:04Z

For smaller input arrays according to my benchmark, performance shows up after input.Length >= 50

The graph doesn't show below 24... is there a regression for small values? (It's pretty common to use base-64 encoding with small values, such as in various HTTP headers.)

src/System.Private.CoreLib/shared/System/Convert.cs

tannergooding · 2019-01-07T03:29:44Z

BTW, when I did port I had to manually reverse all values in _mm256_setr - maybe it makes sense to add Vector.CreateReversed in order to simplify such cases?

The current Create methods for Vector64, Vector128, and Vector256 take the values in the same order as the native setr methods (which is e0, e1, ...)

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs

gfoidl · 2019-01-07T09:40:23Z

FYI: https://github.com/dotnet/corefx/issues/32365 (will do when I get some time for this) (and https://github.com/gfoidl/Base64)

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs

EgorBo · 2019-01-07T14:47:04Z

@gfoidl oh, didn't see your work. I did this just to practice and test Intrinsics API 🙂

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs

EgorBo · 2019-02-10T10:02:18Z

@tannergooding @stephentoub @fiigii @gfoidl I updated the PR and its description (added graphs). Could you please take a look?
I tried to keep it simple and small and to avoid any regressions for small values

src/System.Private.CoreLib/shared/System/Convert.cs

gfoidl · 2019-02-12T10:57:05Z

src/System.Private.CoreLib/shared/System/Convert.cs

+                Vector128<byte> result = Sse2.SubtractSaturate(indices, tt5);
+                Vector128<sbyte> compareResult = Sse2.CompareGreaterThan(tt7, indices.AsSByte());
+                result = Sse2.Or(result, Sse2.And(compareResult.AsByte(), tt8));
+                result = Ssse3.Shuffle(s_base64ShiftLut, result);


Is s_base64ShiftLut kept is a register or read from memory everytime?
Hoisting this outside the loop maybe better.

s_base64ShiftLut is a static readonly field (constant) - so I guess it should be kept in a register, will check asm once again for loads in this place

gfoidl · 2019-02-12T10:57:56Z

src/System.Private.CoreLib/shared/System/Convert.cs

+
+                // Do it for the second part of the vector (rotate it first in order to re-use asciiToStringMaskLo)
+                result = Sse2.Shuffle(result.AsUInt32(), 0x4E /*_MM_SHUFFLE(1,0,3,2)*/).AsByte();
+                result = Ssse3.Shuffle(result, s_base64TwoBytesStringMaskLo);


Same for s_base64TwoBytesStringMaskLo.

gfoidl · 2019-02-12T10:59:25Z

src/System.Private.CoreLib/shared/System/Convert.cs

+                result = Sse2.Shuffle(result.AsUInt32(), 0x4E /*_MM_SHUFFLE(1,0,3,2)*/).AsByte();
+                result = Ssse3.Shuffle(result, s_base64TwoBytesStringMaskLo);
+
+                if (insertLineBreaks && (charcount += 16) >= base64LineBreakPosition)


I would move the case with insertLineBreaks into a separate method, so that the codegen for either case can be optimized.

This may also prevent some spills in the simd-registers (if there are any).

I didn't notice any noticeable performance regressions after I added this block for any values when insertLineBreaks is false

src/System.Private.CoreLib/shared/System/Convert.cs

stephentoub · 2019-04-23T14:39:41Z

@EgorBo, are you still working on this?

EgorBo · 2019-04-24T13:26:25Z

@stephentoub updated the comments.
I guess this PR intersects with @gfoidl dotnet/corefx#34529 who started to work on this earlier (and my PR focuses only on Encoding).

gfoidl · 2019-04-24T13:32:43Z

@EgorBo I wouldn't call it "intersects", as the other PR is for span-based byte -> byte encoding / decoding, whilst this one is for byte -> string (with line-breaks). So similar, but different targets.

If there would be no need for line-breaks, so the base64 encoding in Convert could be based on System.Buffers.Text.Base64.

danmoseley · 2019-05-28T18:44:46Z

Resolved merge conflict so we can get test results.

danmoseley · 2019-05-28T18:46:40Z

@tannergooding if tests pass is this ready to merge?

tannergooding · 2019-05-28T18:54:08Z

I'll give this one more pass after lunch.

src/System.Private.CoreLib/shared/System/Convert.cs

sandreenko · 2019-11-02T03:58:04Z

@EgorBo do you think that PR can be finished before the consolidation (in next 2 weeks)?

# Conflicts: # THIRD-PARTY-NOTICES.TXT # src/System.Private.CoreLib/shared/System/Convert.cs

# Conflicts: # THIRD-PARTY-NOTICES.TXT

tannergooding · 2019-11-04T18:36:47Z

src/System.Private.CoreLib/shared/System/Convert.cs

@@ -2492,19 +2494,146 @@ public static unsafe bool TryToBase64Chars(ReadOnlySpan<byte> bytes, Span<char>
            }
        }

+        internal static readonly Vector128<byte> s_base64ShuffleMask = Vector128.Create((byte)


A short comment describing each constant would be useful.

It's also not clear why these are static readonly, but several of the others (such as tt0-tt8) are not

Given https://github.com/dotnet/coreclr/issues/17225 and https://github.com/dotnet/coreclr/issues/26976, it would be more efficient processing and space-wise to use the ROS<byte> read-only property trick on these, especially since they're only used by code behind a Ssse3.IsSupported check.

tannergooding · 2019-11-04T18:38:09Z

src/System.Private.CoreLib/shared/System/Convert.cs

+                Vector128<byte> indices = Sse2.Or(t1, t3);
+
+                // lookup function "Single pshufb method" (lookup_pshufb_improved)
+                Vector128<byte> result = Sse2.SubtractSaturate(indices, tt5);


Any reason this isn't a static local function (since it was a separate function in the original algorithm)? Inlining?

tannergooding · 2019-11-04T18:41:30Z

src/System.Private.CoreLib/shared/System/Convert.cs

+                result = Sse2.Shuffle(result.AsUInt32(), 0x4E /*_MM_SHUFFLE(1,0,3,2)*/).AsByte();
+                result = Ssse3.Shuffle(result, localTwoBytesStringMaskLo);
+
+                if (insertLineBreaks && (charcount += 16) >= base64LineBreakPosition)


Having the side effect only hit if insertLineBreaks is true, but required for both the true and false scenarios is non-obvious.

It would be nice to move the charCount += 16 out separately

saucecontrol · 2019-11-04T20:36:28Z

SSSE3-based implementation is limited with input.Length>36 condition in order to avoid regressions for smaller values (the best value for my Skylake, Coffee Lake and Haswell based machines).

Is the 36-byte cutover point appropriate for 32-bit as well? There are more than 8 active XMM registers used in the inner loop, so there will likely be some stack shuffling offsetting the SSE gains.

maryamariyan · 2019-11-06T21:05:30Z

Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:

In your coreclr repository clone, create patch by running git format-patch origin
In your runtime repository clone, apply the patch by running git apply --directory src/coreclr <path to the patch created in step 1>

maryamariyan · 2019-12-02T19:38:45Z

Thank you for your contribution. As announced in #27549 the dotnet/runtime repository will be used going forward for changes to this code base. Closing this PR as no more changes will be accepted into master for this repository. If you’d like to continue working on this change please move it to dotnet/runtime.

EgorBo added 5 commits January 6, 2019 04:44

Vectorize Convert.ToBase64String

d218652

Fallback to ConvertToBase64Array for corner cases

8217470

Only Base64FormattingOptions.None is supported so far

d0d89ca

fix typo

1bf78f5

Clean up

1c187ea

stephentoub reviewed Jan 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.cs Outdated Show resolved Hide resolved

stephentoub reviewed Jan 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jan 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jan 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs Outdated Show resolved Hide resolved

gfoidl reviewed Jan 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs Outdated Show resolved Hide resolved

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs Outdated Show resolved Hide resolved

Add initial SSSE3-based impl

3fcdabf

tannergooding reviewed Jan 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jan 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs Outdated Show resolved Hide resolved

EgorBo commented Jan 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.Base64.Avx2.cs Outdated Show resolved Hide resolved

gfoidl mentioned this pull request Jan 10, 2019

Base64 encoding with simd-support dotnet/corefx#34529

Merged

EgorBo added 5 commits February 3, 2019 22:39

Merge remote-tracking branch 'dotnet/master' into base64-vectorize

6a5e3af

Merge SSSE3-based impl with ConvertToBase64Array

ed74c5b

remove avx

02547ba

Add copy-right and use SSSE3 when inputLength >= 36

9b8c9d1

move to a separate method, also move constant vectors

2ab1c5e

EgorBo changed the title ~~Vectorize Convert.ToBase64String using AVX2~~ Vectorize Convert.ToBase64String using SSSE3 Feb 10, 2019

rename static readonly fields (add s_ prefix)

8acc598

fiigii reviewed Feb 10, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.cs Outdated Show resolved Hide resolved

remove Ssse3.IsSupported from static readonly vectors

df53ee9

gfoidl reviewed Feb 12, 2019

View reviewed changes

EgorBo added 2 commits April 24, 2019 15:32

Merge remote-tracking branch 'dotnet/master' into base64-vectorize

f29eab7

Add more comments

cd42c3c

update THIRD-PARTY-NOTICES.TXT

72ea550

EgorBo and others added 2 commits April 24, 2019 16:40

update comments

aef8747

Merge branch 'master' into base64-vectorize

a281f2a

tannergooding reviewed May 28, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Convert.cs Outdated Show resolved Hide resolved

gfoidl mentioned this pull request Jun 10, 2019

Spanified Webencoders.Base64UrlEncode dotnet/aspnetcore#11047

Merged

jkotas added the area-System.Runtime label Nov 2, 2019

EgorBo added 5 commits November 4, 2019 17:52

Merge branch 'master' of github.com:EgorBo/coreclr into base64-vectorize

1f80164

# Conflicts: # THIRD-PARTY-NOTICES.TXT # src/System.Private.CoreLib/shared/System/Convert.cs

Fix build error (StoreScalar)

3239269

Merge branch 'master' of github.com:dotnet/coreclr into base64-vectorize

d772632

# Conflicts: # THIRD-PARTY-NOTICES.TXT

Update THIRD-PARTY-NOTICES.TXT

77207a2

formatting

55c7dac

sandreenko requested a review from tannergooding November 4, 2019 17:54

tannergooding reviewed Nov 4, 2019

View reviewed changes

maryamariyan closed this Dec 2, 2019

benaadams mentioned this pull request Jan 31, 2020

Ubuntu arm Cross testing is paused in all PRs (nodes are offline) dotnet/runtime#11763

Closed

EgorBo mentioned this pull request May 30, 2022

Speed up Convert.ToBase64String() dotnet/runtime#69884

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize Convert.ToBase64String using SSSE3 #21833

Vectorize Convert.ToBase64String using SSSE3 #21833

EgorBo commented Jan 6, 2019 •

edited

Loading

stephentoub commented Jan 7, 2019

tannergooding commented Jan 7, 2019

gfoidl commented Jan 7, 2019

EgorBo commented Jan 7, 2019

EgorBo commented Feb 10, 2019 •

edited

Loading

gfoidl Feb 12, 2019

EgorBo Feb 12, 2019

gfoidl Feb 12, 2019

gfoidl Feb 12, 2019

EgorBo Feb 12, 2019 •

edited

Loading

stephentoub commented Apr 23, 2019

EgorBo commented Apr 24, 2019

gfoidl commented Apr 24, 2019

danmoseley commented May 28, 2019

danmoseley commented May 28, 2019

tannergooding commented May 28, 2019

sandreenko commented Nov 2, 2019

tannergooding Nov 4, 2019

saucecontrol Nov 4, 2019

tannergooding Nov 4, 2019 •

edited

Loading

tannergooding Nov 4, 2019

saucecontrol commented Nov 4, 2019

maryamariyan commented Nov 6, 2019

maryamariyan commented Dec 2, 2019

Vectorize Convert.ToBase64String using SSSE3 #21833

Vectorize Convert.ToBase64String using SSSE3 #21833

Conversation

EgorBo commented Jan 6, 2019 • edited Loading

Windows 10.0.17134.523, Core i7-8700K 3.7GHz (Coffee Lake):

macOS 10.13.6, Core i7-4980HQ 2.8GHz (Haswell):

stephentoub commented Jan 7, 2019

tannergooding commented Jan 7, 2019

gfoidl commented Jan 7, 2019

EgorBo commented Jan 7, 2019

EgorBo commented Feb 10, 2019 • edited Loading

gfoidl Feb 12, 2019

Choose a reason for hiding this comment

EgorBo Feb 12, 2019

Choose a reason for hiding this comment

gfoidl Feb 12, 2019

Choose a reason for hiding this comment

gfoidl Feb 12, 2019

Choose a reason for hiding this comment

EgorBo Feb 12, 2019 • edited Loading

Choose a reason for hiding this comment

stephentoub commented Apr 23, 2019

EgorBo commented Apr 24, 2019

gfoidl commented Apr 24, 2019

danmoseley commented May 28, 2019

danmoseley commented May 28, 2019

tannergooding commented May 28, 2019

sandreenko commented Nov 2, 2019

tannergooding Nov 4, 2019

Choose a reason for hiding this comment

saucecontrol Nov 4, 2019

Choose a reason for hiding this comment

tannergooding Nov 4, 2019 • edited Loading

Choose a reason for hiding this comment

tannergooding Nov 4, 2019

Choose a reason for hiding this comment

saucecontrol commented Nov 4, 2019

maryamariyan commented Nov 6, 2019

maryamariyan commented Dec 2, 2019

EgorBo commented Jan 6, 2019 •

edited

Loading

EgorBo commented Feb 10, 2019 •

edited

Loading

EgorBo Feb 12, 2019 •

edited

Loading

tannergooding Nov 4, 2019 •

edited

Loading