Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify int to hexadecimal char conversions #1273

Merged
merged 25 commits into from
Feb 12, 2020

Conversation

marek-safar
Copy link
Contributor

This replaces about 7 different implementations with shortest version

@marek-safar marek-safar force-pushed the size-work branch 3 times, most recently from aeabc62 to d97412a Compare January 3, 2020 12:50
{
value &= 0xf;
return (char)(value > 9 ? value - 10 + 'A' : value + '0');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect

static ReadOnlySpan<byte> s_hexEncodeMap => new byte [] { 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 65, 66, 67, 68, 69, 70 };

public static char ToCharUpper(int value) => (char)(s_hexEncodeMap[value & 0xF]); // or even Unsafe.AddByteOffset

should be a bit faster if it's used in hot paths

@marek-safar marek-safar force-pushed the size-work branch 3 times, most recently from 2415368 to 030e604 Compare January 3, 2020 15:51
return (char)(value > 9 ? value - 10 + 'a' : value + '0');
}

#if NETCOREAPP || NETSTANDARD2_1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've found is that as we move on to newer/more complex builds we keep needing to add to these. Prefer expressing it negatively (i.e. #if !NETSTANDARD2_0 && !NET48 && ...), one for each flavor that fails.

This is because we expect few new build configurations to not want this, but a newer netstandard build would want it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to go with NETCOREAPP only. I am not sure how many NET48 version I'd need to list as for example NET48 is not used anywhere.

@GrabYourPitchforks
Copy link
Member

FYI we have an internal method WriteHexByte used by some of the formatters that supports mixed casing and is completely branch-free / lookup table-free. A while back we clocked it as the fastest implementation in the formatters it's used in. Feel free to give it a try here if you wish.

#if NETCOREAPP || NETSTANDARD2_1
public static unsafe string ToStringLower(ReadOnlySpan<byte> bytes)
{
fixed (byte* bytesPtr = bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fixed (byte* bytesPtr = bytes)
Debug.Assert(!bytes.IsEmpty);
fixed (byte* bytesPtr = &MemoryMarshal.GetReference(bytes))

Should be a little bit faster. But I don't know if it's worth in contrast to the delegate invocation in string.Create.

foreach (byte b in new ReadOnlySpan<byte>((byte*)args.Ptr, args.Length))
{
chars[p++] = ToCharLower(b >> 4);
chars[p++] = ToCharLower(b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces quite a lot of bound checks. It may be better to iterate over chars (ideally by for-loop) to eliminate the bound checks for chars. For b you can operate on the pointer bytesPtr itself, no need to create a ROS for this.

With the other suggestions (especially) from @GrabYourPitchforks the creation could become completely branch-free.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that this could be vectorized (presumable not in this PR, but in a follow up (maybe I'll try something then)).

{
public static char ToCharUpper(int value)
{
value &= 0xf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think upper case is common here.

Suggested change
value &= 0xf;
value &= 0xF;

Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@marek-safar
Copy link
Contributor Author

FYI we have an internal method WriteHexByte

@GrabYourPitchforks I didn't notice that implementation. I moved it to the common file and used it as the prefered fastest implementation.

{
value &= 0xF;
return (char)(value > 9 ? value - 10 + 'a' : value + '0');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you measure this and find it to be faster than:

private static ReadOnlySpan<byte> HexEncodeUpper => new byte [] { (byte)'0', (byte)'1', (byte)'2', (byte)'3', (byte)'4', (byte)'5', (byte)'6', (byte)'7', (byte)'8', (byte)'9', (byte)'A', (byte)'B', (byte)'C', (byte)'D', (byte)'E', (byte)'F' };
private static ReadOnlySpan<byte> HexEncodeLower => new byte [] { (byte)'0', (byte)'1', (byte)'2', (byte)'3', (byte)'4', (byte)'5', (byte)'6', (byte)'7', (byte)'8', (byte)'9', (byte)'a', (byte)'b', (byte)'c', (byte)'d', (byte)'e', (byte)'f' };

public static char ToCharUpper(int value) => (char)HexEncodeUpper[value & 0xF];
public static char ToCharLower(int value) => (char)HexEncodeLower[value & 0xF];

? (Especially once the JIT properly removes the bounds check on this, if it doesn't already.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or using the same tricks/match employed by ToCharsBuffer above.)

Copy link
Member

@EgorBo EgorBo Jan 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub

Especially once the JIT properly removes the bounds check on this, if it doesn't already.

It doesn't today but I have a fix for JIT: EgorBo@bcc095e
so it will remove them for things like array[i & c] and array[i % c] if c is less than array.Length.

Copy link
Contributor Author

@marek-safar marek-safar Jan 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you measure this and find it to be faster than:

I didn't as these are convenience methods rarely called more than a few times. The Span versions are intended for perf sensitive code.

Also, introducing global memory blob in each assembly which uses this code seems to me contra-productive size-wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you measure this and find it to be faster than:

Here are the micro-benchmarks for 3 different cases. The sources for them at https://gist.github.com/marek-safar/b26e2bfc9a60a2bd348b3f36d3cb8288

BenchmarkDotNet=v0.12.0, OS=macOS 10.15.2 (19C57) [Darwin 19.2.0]
Intel Core i7-7920HQ CPU 3.10GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha.1.20061.5
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.6301, CoreFX 4.700.19.56404), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.6007, CoreFX 5.0.20.6007), X64 RyuJIT

|           Method |      Mean |     Error |    StdDev |
|----------------- |----------:|----------:|----------:|
|   Test_Condition | 0.0257 ns | 0.0310 ns | 0.0290 ns |
|      Test_String | 0.1618 ns | 0.0320 ns | 0.0299 ns |
|         Test_ROS | 0.5877 ns | 0.0473 ns | 0.0506 ns |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd interested in numbers from arm64 if you can run it there

Certainly. I'm doing a build for this now and should be able to test sometime after lunch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my Samsung Galaxy Book 2:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Snapdragon 850 2.96 GHz, 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=5.0.100-alpha1-015515
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.19.52501, CoreFX 5.0.19.52503), Arm64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.19.52501, CoreFX 5.0.19.52503), Arm64 RyuJIT


|           Method |                Input |     Mean |   Error |  StdDev |
|----------------- |--------------------- |---------:|--------:|--------:|
|    StringVersion | 18446744073709551615 | 147.3 ns | 0.78 ns | 0.73 ns |
| ConditionVersion | 18446744073709551615 | 125.9 ns | 0.32 ns | 0.28 ns |
|       ROSVersion | 18446744073709551615 | 119.1 ns | 0.40 ns | 0.35 ns |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ConditionVersion in this case the one with the 2 branches or the one with your suggestion (and only 1 branch)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its thr code from @marek-safar's gist, with no changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the single branch condition perform on Samsung Galaxy Book 2 in comparison to the rest? If it is on-par/better than ROSVersion, then that helps answer which approach we should take (granted, with @EgorBo's changes, the ROSVersion could get better).

{
destination[computedOutputLength] = (byte)HexConverter.ToCharUpper((int)value);
value >>= 4;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several folks like @ahsonkhan spent a while tuning all of the code in this area. It'd be good to validate these changes against our perf tests to ensure nothing regresses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above says otherwise. Anyway, this seems to be for some reason pointed out more than once here so I changed the code to do at most 8 iterations instead of 16 which the code does today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above says otherwise.

That's not how I read it. To me, the comment highlights that the author went to great lengths thinking about perf and the implications of how it was written, mentioning additional work that could be tried in the future.

Please run the perf tests to ensure it hasn't regressed. If your changes make it faster, all the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub I misunderstood the comment and you are right the code was tuned. I tweaked the code and benchmarked the method TryFormatUInt64X with the following results on my machine (about 10% speedup for the longest casein this PR).

BenchmarkDotNet=v0.12.0, OS=macOS 10.15.2 (19C57) [Darwin 19.2.0]
Intel Core i7-7920HQ CPU 3.10GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha.1.20061.5
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.6301, CoreFX 4.700.19.56404), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.6007, CoreFX 5.0.20.6007), X64 RyuJIT


|                    Method |                Input |      Mean |     Error |    StdDev | Ratio | RatioSD |
|-------------------------- |--------------------- |----------:|----------:|----------:|------:|--------:|
| Existing_TryFormatUInt64X |                    1 |  9.682 ns | 0.2332 ns | 0.2685 ns |  1.00 |    0.00 |
|       PR_TryFormatUInt64X |                    1 |  9.219 ns | 0.2087 ns | 0.2143 ns |  0.95 |    0.04 |
|                           |                      |           |           |           |       |         |
| Existing_TryFormatUInt64X |                 1000 | 13.786 ns | 0.4389 ns | 1.2802 ns |  1.00 |    0.00 |
|       PR_TryFormatUInt64X |                 1000 | 11.812 ns | 0.2820 ns | 0.4306 ns |  0.93 |    0.07 |
|                           |                      |           |           |           |       |         |
| Existing_TryFormatUInt64X |               123456 | 14.584 ns | 0.8399 ns | 0.8625 ns |  1.00 |    0.00 |
|       PR_TryFormatUInt64X |               123456 | 14.661 ns | 0.3340 ns | 0.8379 ns |  1.05 |    0.11 |
|                           |                      |           |           |           |       |         |
| Existing_TryFormatUInt64X | 18446744073709551615 | 26.105 ns | 0.5646 ns | 1.5265 ns |  1.00 |    0.00 |
|       PR_TryFormatUInt64X | 18446744073709551615 | 24.166 ns | 0.5207 ns | 1.0518 ns |  0.91 |    0.06 |

@EgorBo I don't think ROS version will be ever faster. It has extra memory read which I guess will be hard to beat with the branch prediction in modern CPUs

Copy link
Member

@EgorBo EgorBo Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marek-safar if #1644 gets merged it will remove bound checks for the ROS version so it will be just and and a few movs:

public static char ToCharUpper(int value) => 
	(char)(s_hexEncodeMap[value & 0xF]);

Codegen for ROS-version (in my branch):

; Method Program:ToCharUpper(int):ushort
       and      ecx, 15
       movsxd   rax, ecx
       mov      rdx, 0xD1FFAB1E
       movzx    rax, byte  ptr [rax+rdx]
       ret

vs codegen for this PR's impl:

       and      ecx, 15
       cmp      ecx, 9
       jg       SHORT G_M2907_IG04
       lea      eax, [rcx+48]
       jmp      SHORT G_M2907_IG05
G_M2907_IG04:
       lea      eax, [rcx+87]
G_M2907_IG05:
       movzx    rax, ax
       ret   

Not sure if it's faster though, I guess branch predictor + speculative execution indeed help here especially in benchmarks where input is usually constant.

Copy link
Member

@ahsonkhan ahsonkhan Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured the changes in this PR (currently) and it looks like there is some regression (~20-40% for large numbers):

public class HexFormatTest
{
    [Params(-1, 0, 1, long.MaxValue, int.MaxValue, int.MinValue, long.MinValue)]
    public long Value;

    private byte[] _output;

    [Params('x', 'X')]
    public StandardFormat Format;

    [GlobalSetup]
    public void Setup()
    {
        _output = new byte[1_000];
    }

    [Benchmark(Baseline = true)]
    public bool Before()
    {
        return Utf8Formatter.TryFormat(Value, _output, out _, Format);
    }

    [Benchmark]
    public bool After()
    {
        return TryFormat(Value, _output, out _, Format);
    }
}

The only change/diff is:

- string hexTable = (useLower) ? FormattingHelpers.HexTableLower : FormattingHelpers.HexTableUpper;
- while ((uint)(--computedOutputLength) < (uint)destination.Length)
- {
-    destination[computedOutputLength] = (byte)hexTable[(int)value & 0xf];
-    value >>= 4;
- }

+ if (useLower)
+ {
+     while ((uint)(--computedOutputLength) < (uint)destination.Length)
+     {
+         destination[computedOutputLength] = (byte)ToCharLower((int)value);
+         value >>= 4;
+     }
+ }
+ else
+ {
+     while ((uint)(--computedOutputLength) < (uint)destination.Length)
+     {
+         destination[computedOutputLength] = (byte)ToCharUpper((int)value);
+         value >>= 4;
+     }
}
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha1-015914
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
  Job-AZDGGY : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  
Method Value Format Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Before -2147483648 X 15.206 ns 0.3289 ns 0.3077 ns 15.181 ns 14.644 ns 15.717 ns 1.00 0.00 - - - -
After -2147483648 X 20.875 ns 0.4065 ns 0.3802 ns 20.901 ns 20.090 ns 21.486 ns 1.37 0.04 - - - -
Before -2147483648 x 15.270 ns 0.2164 ns 0.1918 ns 15.310 ns 14.829 ns 15.493 ns 1.00 0.00 - - - -
After -2147483648 x 22.338 ns 0.4745 ns 0.6495 ns 22.388 ns 21.488 ns 24.055 ns 1.46 0.05 - - - -
Before -1 X 15.203 ns 0.2947 ns 0.2756 ns 15.182 ns 14.801 ns 15.802 ns 1.00 0.00 - - - -
After -1 X 17.541 ns 0.2877 ns 0.2691 ns 17.637 ns 16.929 ns 17.795 ns 1.15 0.03 - - - -
Before -1 x 15.204 ns 0.2914 ns 0.2726 ns 15.336 ns 14.726 ns 15.493 ns 1.00 0.00 - - - -
After -1 x 18.770 ns 0.3571 ns 0.3340 ns 18.887 ns 18.174 ns 19.304 ns 1.23 0.03 - - - -
Before -9223372036854775808 X 15.364 ns 0.2791 ns 0.2474 ns 15.403 ns 14.830 ns 15.732 ns 1.00 0.00 - - - -
After -9223372036854775808 X 17.920 ns 0.3819 ns 0.3573 ns 18.005 ns 17.098 ns 18.494 ns 1.17 0.03 - - - -
Before -9223372036854775808 x 16.198 ns 0.3513 ns 0.6060 ns 16.146 ns 15.195 ns 17.338 ns 1.00 0.00 - - - -
After -9223372036854775808 x 16.324 ns 0.3005 ns 0.2952 ns 16.374 ns 15.836 ns 16.821 ns 1.03 0.05 - - - -
Before 0 X 5.907 ns 0.1366 ns 0.1277 ns 5.933 ns 5.685 ns 6.183 ns 1.00 0.00 - - - -
After 0 X 5.321 ns 0.1182 ns 0.1106 ns 5.365 ns 5.133 ns 5.529 ns 0.90 0.03 - - - -
Before 0 x 5.802 ns 0.1224 ns 0.1085 ns 5.834 ns 5.632 ns 6.004 ns 1.00 0.00 - - - -
After 0 x 6.393 ns 0.1138 ns 0.1009 ns 6.403 ns 6.151 ns 6.573 ns 1.10 0.03 - - - -
Before 1 X 5.870 ns 0.1346 ns 0.1259 ns 5.901 ns 5.620 ns 6.078 ns 1.00 0.00 - - - -
After 1 X 5.371 ns 0.1138 ns 0.1064 ns 5.386 ns 5.118 ns 5.518 ns 0.92 0.03 - - - -
Before 1 x 5.780 ns 0.0850 ns 0.0754 ns 5.763 ns 5.666 ns 5.886 ns 1.00 0.00 - - - -
After 1 x 6.223 ns 0.1333 ns 0.1246 ns 6.276 ns 6.048 ns 6.368 ns 1.08 0.03 - - - -
Before 2147483647 X 10.822 ns 0.2222 ns 0.2078 ns 10.790 ns 10.510 ns 11.329 ns 1.00 0.00 - - - -
After 2147483647 X 13.118 ns 0.1054 ns 0.0880 ns 13.127 ns 12.926 ns 13.276 ns 1.21 0.02 - - - -
Before 2147483647 x 11.118 ns 0.2521 ns 0.4610 ns 11.295 ns 10.237 ns 12.117 ns 1.00 0.00 - - - -
After 2147483647 x 12.773 ns 0.2848 ns 0.3048 ns 12.855 ns 12.345 ns 13.197 ns 1.18 0.06 - - - -
Before 9223372036854775807 X 15.325 ns 0.2527 ns 0.2364 ns 15.346 ns 14.856 ns 15.694 ns 1.00 0.00 - - - -
After 9223372036854775807 X 21.039 ns 0.3554 ns 0.3324 ns 21.013 ns 20.432 ns 21.605 ns 1.37 0.03 - - - -
Before 9223372036854775807 x 15.123 ns 0.3338 ns 0.3428 ns 15.241 ns 14.651 ns 15.612 ns 1.00 0.00 - - - -
After 9223372036854775807 x 21.597 ns 0.4643 ns 0.4968 ns 21.527 ns 20.895 ns 22.589 ns 1.43 0.05 - - - -

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the disassembly for before/after in case it is of interest:
https://www.diffchecker.com/Eimp7IRP
sharplab.io link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there now minimal perf difference for this API based on this PR or are we still seeing noticeable regressions?

This replaces about 7 different implementations with single version
@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 13, 2020

@marek-safar - if possible, can you please avoid rebasing/force pushing commits and just add commits to the branch, after PR review has started? Otherwise, it is difficult to see and review just the changes that were made in recent commits and forces the reviewer to re-review all the code again. We squash commits before merging anyway, and you can consider squashing/re-writing the commits once the PR has been approved, before merging, if you like.

As a reviewer, it is easier, which is why I tend to avoid squashing commits till the end when I submit PRs.

@@ -78,7 +78,7 @@ public static void ToCharsBuffer(byte value, Span<char> buffer, int startingInde

internal static unsafe string ToString(ReadOnlySpan<byte> bytes, Casing casing = Casing.Upper)
{
#if NET45 || NET461 || NET462 || NET472 || NETSTANDARD1_0 || NETSTANDARD1_3 || NETSTANDARD2_0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not just be #if (NETFRAMEWORK && !NET48) || ...?

We've said NET48 is the final .NET Framework version, so it shouldn't need to expand in the future. We likewise might be able to do NETSTANDARD && !NETSTANDARD2_1, unless we think there will be another netstandard version in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave expressed this way.

  • It shows what all modes are hitting this
  • It calls things to attention if we add a NET452
  • It hedges against there one day being a NET481.
    • Yeah, we've said there won't be; but it's not like (NETFRAMEWORK && !NET48) is much smaller.
  • It avoids mixing && and ||.

public static unsafe string ToString(ReadOnlySpan<byte> bytes, Casing casing = Casing.Upper)
{
#if NET45 || NET46 || NET461 || NET462 || NET47 || NET471 || NET472 || NETSTANDARD1_0 || NETSTANDARD1_3 || NETSTANDARD2_0
Span<char> result = stackalloc char[bytes.Length * 2];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems dangerous. Are all callers limited in how large bytes is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see an assert at https://github.com/dotnet/runtime/pull/1273/files#diff-63d8e37549a2478236a87c352cdd1024L1132 that's been removed here. If we're sticking with the stackalloc, we should put that assert back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the assert (not sure how useful that's for legacy builds)

Copy link
Member

@ahsonkhan ahsonkhan Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes.Length <= 16 assert is not guaranteed to be true and this stackalloc could be unsafe. For example, PhysicalAddress takes an unbounded byte array from the user and we pass it here in it's ToString(). That will result in stackoverflow. For lengths > 16 (or some reasonable small constant value, say 128), we should consider doing arraypool.rent or maybe just allocate.

byte[] a = new byte[2048];
var address = new PhysicalAddress(a);
Console.WriteLine(address.ToString());

Copy link
Contributor Author

@marek-safar marek-safar Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter for this case as the assembly is not built in affected configurations but I changed it anyway as it could show up in System.Security.Cryptography.Pkcs

public static char ToCharUpper(int value)
{
value &= 0xF;
return (char)(value > 9 ? value - 10 + 'A' : value + '0');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did we land on this implementation (are we not going for the ROSpan version)? What was the reason for not going with what @tannergooding suggested here (single branch): #1273 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tanner version is not producing expected output and ROSpan version needs codegen support first (the code can be updated once that lands)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like its because the subtraction was off; it needs to subtract '9' + 1. The following passes the unit tests (with the logic all being constant folded):

static void Main(string[] args)
{
    for (int i = 0; i < 0x10; i++)
    {
        Console.WriteLine(ToUpper(i));
    }
}

public static char ToUpper(int value)
{
    value &= 0xF;
    value += '0';

    if (value > '9')
    {
        value += ('A' - ('9' + 1));
    }

    return (char)value;
}

Copy link
Member

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than some final perf validation + fixing test nit & merge conflict, looks good to me.

@marek-safar
Copy link
Contributor Author

The latest conditional version is a bit faster on my machine and it should not hurt on other archs.

|                Method |                Input |      Mean |    Error |   StdDev |
|---------------------- |--------------------- |----------:|---------:|---------:|
|         StringVersion | 18446744073709551615 |  99.78 ns | 1.219 ns | 1.080 ns |
|      ConditionVersion | 18446744073709551615 |  94.37 ns | 1.927 ns | 2.219 ns |
| ConditionVersionFinal | 18446744073709551615 |  89.30 ns | 0.967 ns | 0.755 ns |
|            ROSVersion | 18446744073709551615 | 105.72 ns | 2.486 ns | 3.053 ns |

@marek-safar
Copy link
Contributor Author

Failures are unrelated due to #32126

@jaredpar
Copy link
Member

Failures are unrelated due to #32126

This PR also had failures in System.Net.NetworkInformation.Functional.Tests. Based on the time line it appears this PR possibly introduced them. The test has been failing everywhere since this PR was merged. Reactivated #18090.

@jaredpar
Copy link
Member

Stand up today identified this test as failing in 3.1 as well so this may be hardware related.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.