Add methods to convert between hexadecimal strings and bytes #17837

GSPP · 2016-07-12T15:50:34Z

It is quite common to need to convert bytes to hex strings and back. The .NET Framework does not have an API to do that. Look at what lengths people have gone to to solve this problem for them: How do you convert Byte Array to Hexadecimal String, and vice versa? Many of those solutions are somehow bad (e.g. slow or broken or not validating).

BitConverter.ToString(byte[]) generates strings of a practically useless format: BitConverter.ToString(new byte[] { 0x00, 0x01, 0xAA, 0xFF }) => 00-01-AA-FF. I don't understand how this decision was made. This seems to be a special solution for some specific purpose (debug output?). Using this API requires Replace(str, "-", "") which is ugly and slow. Also, this API is not discoverable because it does not have "hex" in it's name.

I propose adding an API that does the following:

Supports char[] and string
Supports specifying the output buffer to support zero-allocation scenarios
Supports upper- and lowercase
Fast (probably, this means working in-place using unsafe code)
Validates input
No support for any kind of hex prefix (such as "0x"), whitespace or digit separator (such as "-" or "\n"). This is out of scope.

The use case for char[] in- and outputs is low-allocation scenarios.

The API could look like this:

static class HexConvert {
 byte[] ToBytes(string input, int start, int count, HexFormat hexFormat);
 void ToBytes(string input, int start, int count, ArraySegment<byte> outputBuffer, HexFormat hexFormat);
 //Repeat the same for ArraySegment<char> as input

 string ToString(ArraySegment<byte> bytes);
 char[] ToChars(ArraySegment<byte> bytes);
 void ToChars(ArraySegment<byte> bytes, ArraySegment<char> outputBuffer);

 //Plus obvious convenience overloads

 int HexDigitToInteger(char val); //0-9A-F => 0-15.
 char IntegerToHexDigit(int value, HexFormat hexFormat); //0-15 => 0-9A-F
}

[Flags]
enum HexFormat { UseDefaults = 0, Lowercase = 1 } //extensible

A possible extension would be to support Stream and IEnumerable<byte> as sources and destinations. I would consider this out of scope.

I feel that converting hexadecimal strings is an important and frequent enough thing that this should be supported well. Of course, a "light" version of this proposal is also feasible (just support for string and byte[] with no user-provided output buffer). Not sure if anyone wants/needs lowercase hex, I don't but others might.

This seems cheap enough to implement. Testing should be easily automatable. I don't see any particular usability problems. This is suitable for community contribution.

The text was updated successfully, but these errors were encountered:

joshfree · 2016-07-12T16:59:36Z

Thanks for creating this issue @GSPP.

cc: @terrajobst

https://aka.ms/apireview

juliusfriedman · 2016-07-12T20:38:08Z

Replacement is not the ONLY way, you can just skip the char if you need to.

///Could easily allow upper or lower case here...   
[System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]
public static byte HexCharToByte(char c, bool upperCase = false) { c = char.ToUpperInvariant(c); return (byte)(upperCase ? char.ToUpperInvariant((char)(c > '9' ? c - 'A' + 10 : c - '0')) : (c > '9' ? c - 'A' + 10 : c - '0')); }

        /// <summary>
        /// Converts a String in the form 0011AABB to a Byte[] using the chars in the string as bytes to caulcate the decimal value.
        /// </summary>
        /// <notes>
        /// Reduced string allocations from managed version substring
        /// About 10 milliseconds faster then Managed when doing it 100,000 times. otherwise no change
        /// </notes>
        /// <param name="str"></param>
        /// <returns></returns>
        public static byte[] HexStringToBytes(string str, int start = 0, int length = -1)
        {
            if (length.Equals(Common.Binary.Zero)) return null;

            if (length <= -1) length = str.Length;

            if (start > length - start) throw new System.ArgumentOutOfRangeException("start");

            if (length > length - start) throw new System.ArgumentOutOfRangeException("length");

            System.Collections.Generic.List<byte> result = new System.Collections.Generic.List<byte>(length >> 1); // / 2

            //Dont check the results for overflow
            unchecked
            {
                //Iterate the pointer using the managed length ....
                //Todo, optomize with reverse or i - 1
                for (int i = start, e = length; i < e; i += 2)
                {
                    //to reduce string manipulations pre call
                    if(str[i] == '-') ++i;

                    //Todo, Native and Unsafe
                    //Convert 2 Chars to a byte
                    result.Add((byte)(HexCharToByte(str[i]) << 4 | HexCharToByte(str[i + 1])));
                }
            }

            //Dont use a List..

            //Return the bytes
            return result.ToArray();
        }

I agree that the format is useless in MOST cases...

However if you think about it a little more and try to be as outside the box as possible you will clearly see that there is an extra index for every octet, finally then you can achieve some interesting things by realigning the data or using the extra indexes....

AlexGhiondea · 2016-11-23T23:13:33Z

@GSPP how about augmenting the BitConverter type with these new methods?

I feel like that would be a better place for these since that is already a type people use for converting between various formats.

AlexGhiondea · 2016-11-24T00:10:57Z

There are at least a couple of other issues tracking something like this:
https://github.com/dotnet/corefx/issues/1647
https://github.com/dotnet/corefx/issues/4441

Let's track it with a single issue (this one).

It does look like this is something for which there is interest in implementing.

@GSPP, @GrabYourPitchforks @KrzysztofCwalina is anyone of you interested in writing up a formal API proposal for this?

CodesInChaos · 2016-11-24T17:17:50Z

If the slices API lands before this, I'd go with something like:

// all instance methods are thread-safe
public abstract class BinaryEncoding
{
	protected abstract void ToBytesOverride(ReadOnlySpan<char> sourceChars, Span<byte> destinationBytes);
	protected abstract int GetByteLengthOverride(int stringLength);
	protected abstract void ToCharsOverride(ReadOnlySpan<byte> sourceBytes, Span<char> destinationChars);
	protected abstract int GetStringLengthOverride(int byteLength);

	public byte[] ToBytes(ReadOnlySpan<char> str);
	public void ToBytes(ReadOnlySpan<char> sourceChars, Span<byte> destinationBytes);
	public int GetByteLength(int stringLength);
	
	public string ToString(ReadOnlySpan<byte> bytes);
	public char[] ToChars(ReadOnlySpan<byte> bytes);
	public void ToChars(ReadOnlySpan<byte> sourceBytes, Span<char> destinationChars);
	public int GetStringLength(int byteLength);
}

public class HexEncoding
{
	public static HexEncoding UpperCase { get; }
	public static HexEncoding LowerCase { get; }
}

perhaps add some additional overloads with byte[], char[], string inputs.

karelz · 2016-11-24T18:22:46Z

We don't take any new API additions if they would be obsolete when Span lands.
Marking as blocked on Span dotnet/corefx#13892 - @AlexGhiondea feel free to change it if you think I am wrong.

GSPP · 2018-08-12T09:42:22Z

Span has landed so this can be unblocked.

GrabYourPitchforks · 2019-02-13T23:32:13Z

I propose a slightly different API, instead putting this on the Convert class near the existing base64 APIs. That would help discoverability, and it gives us an existing template for the recommended API shape.

namespace System
{
    // NEW methods on EXISTING type
    public static class Convert
    {
        public static byte[] FromHexString(ReadOnlySpan<char> chars); // doesn't match an existing overload of FromBase64String, but could be useful
        public static byte[] FromHexString(string s);

        public static string ToHexString(byte[] inArray, HexFormattingOptions options = default);
        public static string ToHexString(byte[] inArray, int offset, int length, HexFormattingOptions options = default);
        public static string ToHexString(ReadOnlySpan<byte> bytes, HexFormattingOptions options = default);
    }

    [Flags]
    public enum HexFormattingOptions
    {
        None = 0,
        InsertLineBreaks = 1,
        Lowercase = 2,
    }
}

It also makes sense IMO to have OperationStatus-based APIs. Those APIs could be done in the corefxlab repo to live alongside the existing Base64Encoder and Base64Decoder types in that repo. Then all of the OperationStatus-based base64 and hex APIs can be reviewed at once if we decide to move them into corefx.

vcsjones · 2019-02-14T01:48:50Z

To give some parity with the *CharArray from base64, I would consider a few other methods:

public static class Convert
{
    public bool TryFromHexString(ReadOnlySpan<char> chars, Span<byte> buffer, out int bytesWritten);
    public bool TryFromHexString(string s, Span<byte> buffer, out int bytesWritten);
    public bool TryToHexString(ReadOnlySpan<byte> bytes, Span<char> buffer, out int charsWritten, HexFormattingOptions options = default);
}

The rationale for this is in my typical usage I expect a hex string to be short (think a SHA digest) which tops out at 64 bytes right now.

I'm not strongly tied to these, but worth considering during review.

Other considerations, does there need to be an options for the FromHexString that provide similar functionality like specifying if the input is expected to be upper or lower, or would the input be treated case insensitive?

What about continuing from line breaks?

GrabYourPitchforks · 2019-02-14T05:13:38Z

@vcsjones I envision HexFormattingOptions.InsertLineBreaks as inserting every 72 chars, just like base64 does today. And parsing would allow all of uppercase / lowercase / whitespace, hence no options parameter.

GSPP · 2019-02-14T15:53:44Z

I'd not put it on Convert. My argument for that is this: Imagine there were 10 different such formats (Hex, base64, base85, ...). Then, we clearly would like each of them to be their own static class. Anything else would make for a very crowded API. Now we don't have 10 formats but from this thought experiment we can see that creating a separate class is the better option. Especially, since more formats might be added in the future. For example, Bitcoin uses a base85 format which is more efficient. Who knows if additional formats might be desirable in the future.

Should this API accept any other input besides spans? It could accept arrays, strings, ArraySegment and even pointers. In my opinion it should only take spans. That would make for the cleanest and least crowded API. Anything else is easily converted to span. In particular I don't want to see overloads with index and count parameters. Thank god we no longer need those!

We certainly need methods that do not throw in the error case. My original proposal missed those.

InsertLineBreaks as inserting every 72 chars

Is this a widely known standard format? Is it only for base64 or also for hex?

New API proposal:

namespace System
{
    public static class HexConvert
    {
        //convert to bytes, throwing
        public static byte[] GetBytes(ReadOnlySpan<char> chars, HexFormattingOptions formattingOptions = default);
        public static void GetBytes(ReadOnlySpan<char> chars, Span<byte> output, out int bytesWritten, HexFormattingOptions formattingOptions = default);

        //convert to bytes, try
        public static byte[] TryGetBytes(ReadOnlySpan<char> chars, HexFormattingOptions formattingOptions = default); //can return null
        public static bool TryGetBytes(ReadOnlySpan<char> chars, Span<byte> output, out int bytesWritten, HexFormattingOptions formattingOptions = default);

         //convert to chars
         string GetString(ReadOnlySpan<byte> bytes);
         char[] GetChars(ReadOnlySpan<byte> bytes);

         //plus non-throwing overloads
         //plus overloads with output buffer
         //plus overloads without HexFormattingOptions (I was lazy so I made them default arguments)
    }

    [Flags]
    public enum HexFormattingOptions
    {
        None = 0,
        Lowercase = 1,
    }
}

I chose the name prefix to be "Get" so that "TryGet" is fitting. This succinct naming would not be possible when placing these methods in Convert.

AlexeiScherbakov · 2019-02-17T16:49:15Z

There are many variants of API. It can be used in web,cryptography and many other places. Some hex conversion is used in json for writing non-ASCII chars - \u0000 - \uFFFF

I think that .NET Core must provide optimized variants for each purpose

Writing API

It can be uppercase/lowercase
It can use separators or not use
Separators can be single char or multi (char[] or string)
It can create new instance of string and char[] or use Span, StringBuilder, TextWriter or Stream as output
Source can be Span or types like,(s)byte,(u)short,char,(u)int,(u)long
If source is Span function can take entire span or it part (for direct byte[] calls)

All above give us more than 100 function declarations. If it will have byte[] inputs - it take much more

For example just writing API for ReadOnlySpan as source, with all kinds of separators, and only lowercase give us 6 optimized variants

public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, char separator);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, ReadOnlySpan<char> separator);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, int offset, int count);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, int offset, int count, char separator);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, int offset, int count, ReadOnlySpan<char> separator);

This is bad idea to place it in existing places, I think, that better to place this API in external netcoreapp2.1+netstandard2.1 package. Something like System.Text.Hex

AlexeiScherbakov · 2019-02-17T17:01:01Z

Also, as @vcsjones mentioned for short strings, we need some different allocation mechanism, to avoid dirty hacks for reducing allocations count: (https://docs.microsoft.com/en-us/dotnet/csharp/how-to/modify-string-contents#unsafe-modifications-to-string)

public static unsafe string ToHexStringLowerCase(this byte[] array)
{
	string ret = new string('\0', array.Length * 2);
	
	fixed(char* strPtr=ret)
	{
		char* ptr = strPtr;
		for (int i = 0; i < array.Length; i++)
		{
			byte bytic = array[i];
			*ptr = HexChars.HexDigitsLowerCase[(bytic >> 4) & 0xF];
			ptr++;
			*ptr = HexChars.HexDigitsLowerCase[bytic & 0xF];
			ptr++;
		}
	}
	return ret;
}

For example in this case we can allocate string memory (x2 of byte array length) and write to it, without copy contructor.

tkp1n · 2019-07-13T10:49:26Z

I consolidated the different proposals and comments made in this discussion and came up with the following API. I also provide a sample implementation (internally utilizing AVX2, SSE4.1, SSSE3, SSE2 or Scalar/Lookup with the following performance characteristics) of the API to experiment with via a seperate GitHub repository and nuget package.

Sample implementation to play arround with

https://github.com/tkp1n/HexMate
https://www.nuget.org/packages/HexMate

Adressed proposals and comments:

Parity with base64 APIs on System.Convert (@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463419321 and @vcsjones, https://github.com/dotnet/corefx/issues/10013#issuecomment-463452088)
OperationStatus-based APIs as currently in System.Buffers.Text.Base64 (@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463419321)
Allocation-free or -friendly implementation is possible (see repo above, @AlexeiScherbakov, https://github.com/dotnet/corefx/issues/10013#issuecomment-464481397)

Open questions:

Would it make sense to add APIs where one could choose a separator at the cost of additional overloads and deviating from base64 API parity? (@AlexeiScherbakov, https://github.com/dotnet/corefx/issues/10013#issuecomment-464480043)
Why should InsertLineBreaks add a new-line every 72 characters for hex but every 76 for base64? (@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463492630)
Should the parsing routines accept --separated input to allow processing strings returned by BitConverter.ToString(byte[])?
Would GetMaxDecodedFromUtf8Length and GetMaxEncodedToUtf8Length make sense for Hex as the calculation is much simpler as with base64 where those methods are provided?

API proposal

namespace System
{
    // NEW type
    [Flags]
    public enum HexFormattingOptions
    {
        None = 0,
        InsertLineBreaks = 1,
        Lowercase = 2
    }

    // NEW methods on EXISTING type
    public static class Convert
    {
        // Decode from chars
        public static byte[] FromHexCharArray(char[] inArray, int offset, int length) => throw null;
        public static bool TryFromHexChars(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten) => throw null;

        // Decode from strings
        public static byte[] FromHexString(string s) => throw null;
        public static byte[] FromHexString(ReadOnlySpan<char> chars) => throw null;
        public static bool TryFromHexString(string s, Span<byte> bytes, out int bytesWritten) => throw null;

        // Encode to chars
        public static int ToHexCharArray(byte[] inArray, int offsetIn, int length, char[] outArray, int offsetOut, HexFormattingOptions options = default) => throw null;
        public static bool TryToHexChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten, HexFormattingOptions options = default) => throw null;

        // Encode to strings
        public static string ToHexString(byte[] inArray, HexFormattingOptions options = default) => throw null;
        public static string ToHexString(byte[] inArray, int offset, int length, HexFormattingOptions options = default) => throw null;
        public static string ToHexString(ReadOnlySpan<byte> bytes, HexFormattingOptions options = default) => throw null;
    }
}

namespace System.Buffers.Text
{
    // NEW type
    public static class Hex
    {
        // Decode
        public static OperationStatus DecodeFromUtf8(ReadOnlySpan<byte> utf8, Span<byte> bytes, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true) => throw null;
        public static OperationStatus DecodeFromUtf8InPlace(Span<byte> buffer, out int bytesWritten) => throw null;
        
        // Encode
        public static OperationStatus EncodeToUtf8(ReadOnlySpan<byte> bytes, Span<byte> utf8, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true) => throw null;
        public static OperationStatus EncodeToUtf8InPlace(Span<byte> buffer, int dataLength, out int bytesWritten) => throw null;
    }
}

GSPP · 2019-07-13T14:57:11Z

I think we should demonstrate a need for inserting line breaks first. I am not aware of any common scenario where this would be used. If this feature is added then the line length must be variable. I cannot imagine any fixed line length to be appropriate.

What are the intended use cases?

Should the parsing routines accept --separated input

By default, the parsing routines should be maximally strict. This means no support for whitespace or any separator. When we parse data we generally want to be as strict as possible in order to find bugs in the originating systems and in order to not accidentally accept invalid data. The motto be strict in what you produce and liberal in what you accept is false. It has lead to the mess that is web development. The correct approach is to be as strict as possible and relax only if required.

Parsing should accept any case of the letters A-F, though. I see no good alternative to that.

I see no need for supporting a separator. This is a rare need. An easy workaround exists at a performance cost: str.Replace("-", "").

terrajobst · 2019-08-20T18:14:35Z

@GrabYourPitchforks, it seems we should survey what internal implementations we have and what all the places are where we expose something similar. It seems new proposals were added after you marked it as read for review.

bugproof · 2019-12-05T15:59:30Z

Really looking forward to seeing this. I benchmarked @tkp1n implementation and it's faster than everything else in .NET Core. Compared to these methods: https://stackoverflow.com/a/624379/2828480

GrabYourPitchforks · 2020-04-19T18:38:48Z

I have some follow-up thoughts on this.

My instinct is that @tkp1n's comment at #17837 (comment) mostly nails it, but I'd recommend some further changes. First, let's not bother with anything in System.Buffers.Text just yet, reserving this specific issue for new APIs in System.Convert. Second, upon further reflection I don't think we need a HexFormattingOption enum. It could instead be an optional parameter bool lowercase = false at the end of the method signature. There's no need to introduce a new enum type into the System namespace just yet.

I spent some time building up a list of places in the framework where we convert bytes to hex strings. See the end of this comment for a partial list. Every single one of these matches can be replaced with a call to one of the Convert APIs as proposed in this issue.

Now, for a rude Q and A.

What's the primary scenario for this?

The 99% use case for converting to hex is to produce a case-insensitive identifier from a binary buffer (say, a GUID or similar). It's also frequently used in cryptography to represent symmetric key material or digests.

In particular, the majority of hex use cases are for small-ish inputs and outputs (say, 256 bits max in the common case). That's also about the bounds of what we would expect where identifiers might appear.

This means that we don't need to worry about hyper-optimizing any such implementation. Until we get evidence that this particular API is bottlenecking real-world scenarios, a simple for loop should suffice as an implementation.

What about customizing joiners or line breaks?

For the primary scenario of identifiers and other small-ish payloads, callers typically don't want any type of joiner.

Where there are exceptions they tend to be things we can't easily account for in a generalized API. For example, some callers want the format "00-11-22-..." (using joiners). Some callers want the format "0x00, 0x11, ..." (using a prefix and joiners). Some callers want "%00%11..." (using just a prefix) as the format. Hex editors might want simple spaces as a joiner, except where after every 60 characters a newline might occur, and that should be followed by a horizontal tab instead of a space.

I anticipate that such callers will continue to use their own custom logic. Since they represent the minority use case I'm not too worried about them for this scenario.

What about Convert.FromHexString?

The behavior would be that it allows only [0-9A-Fa-f]. The incoming string length must be evenly divisible by 2. (Empty strings are allowed.)

Framework matches

libraries\Common\tests\System\Net\Http\LoopbackServer.AuthenticationHelpers.cs:
  299:                     sb.Append(b.ToString("x2"));

libraries\Common\tests\System\Security\Cryptography\ByteUtils.cs:
  59:                 builder.Append(bytes[i].ToString("X2"));

libraries\System.CodeDom\src\Microsoft\CSharp\CSharpCodeGenerator.cs:
  2512:                     Output.Write(b.ToString("X2", CultureInfo.InvariantCulture));

libraries\System.CodeDom\src\Microsoft\VisualBasic\VBCodeGenerator.cs:
  2273:                     Output.Write(b.ToString("X2", CultureInfo.InvariantCulture));

libraries\System.DirectoryServices.AccountManagement\src\System\DirectoryServices\AccountManagement\Utils.cs:
  62:                 stringizedArray.Append(b.ToString("x2", CultureInfo.InvariantCulture));
  82:                 stringizedBinarySid.Append(b.ToString("x2", CultureInfo.InvariantCulture));

libraries\System.DirectoryServices.AccountManagement\src\System\DirectoryServices\AccountManagement\AD\ADStoreCtx_Query.cs:
  496:                         stringguid.Append(b.ToString("x2", CultureInfo.InvariantCulture));
  630:                 stringizedBinarySid.Append(b.ToString("x2", CultureInfo.InvariantCulture));

(below implementation is incorrect)
libraries\System.Drawing.Common\tests\mono\System.Drawing\BitmapTests.cs:
  397:                 sOutput.Append(arrInput[i].ToString("X2"));

libraries\System.Net.Http\src\System\Net\Http\SocketsHttpHandler\AuthenticationHelper.Digest.cs:
  245:                     bool formatted = result[i].TryFormat(byteX2, out int charsWritten, "x2");

libraries\System.Private.CoreLib\src\System\Diagnostics\Tracing\TraceLogging\XplatEventLogger.cs:
  156:                 byteArray[i].TryFormat(hex, out int charsWritten, hexFormat);

libraries\System.Private.CoreLib\src\System\Reflection\AssemblyNameFormatter.cs:
  75:                         sb.Append(b.ToString("x2", CultureInfo.InvariantCulture));

libraries\System.Reflection.MetadataLoadContext\tests\src\TestUtils\TestUtils.cs:
  127:                 builder.Append(bytes[i].ToString("X2"));

libraries\System.Security.Cryptography.Xml\src\System\Security\Cryptography\Xml\SignedXmlDebugLog.cs:
  203:                 builder.Append(b.ToString("x2", CultureInfo.InvariantCulture));

libraries\System.Security.Cryptography.Xml\tests\XmlDsigXsltTransformTest.cs:
  117:                 sb.Append(b.ToString("X2"));

libraries\System.Text.Encodings.Web\tests\UrlEncoderTests.cs:
  296:             return string.Concat(_utf8EncodingThrowOnInvalidBytes.GetBytes(char.ConvertFromUtf32(codePoint)).Select(b => string.Format(CultureInfo.InvariantCulture, "%{0:X2}", b)));

libraries\System.Security.Cryptography.X509Certificates\src\Internal\Cryptography\Helpers.cs:
  17:         public static char[] ToHexArrayUpper(this byte[] bytes)
  24:         private static void ToHexArrayUpper(ReadOnlySpan<byte> bytes, Span<char> chars)
  36:         public static string ToHexStringUpper(this byte[] bytes) =>

tkp1n · 2020-04-20T11:54:37Z

@GrabYourPitchforks thank you for picking this up. I generally agree with the points you've brought up.

What's the primary scenario for this?

Let me add to your list two additional use cases I could think of: The tracing of binary data in motion (e.g., low-level network protocol) and hex dumps of binary files. Both typically encode a quantity of data well above the otherwise typical "max 256 bits"-cases. Ingesting those traces/dumps would consequently also use the decode methods with larger data sizes.

Investigating whether a vectorized implementation was worth it for those cases showed that AVX2 could be up to 10x faster in decoding 2KB of hex-encoded data (see table below / full comparison here).

There are already vectorized implementations in the broader ecosystem (e.g., this one in ASP.NET Core) that could be migrated to the API proposed here if we allow for some "hyper-optimization" to ensure the migration causes no perf regressions.

I would suggest a for-loop approach for small input data sizes and using an SSSE3 or AVX2 approach above a given threshold. All of this is an implementation detail, however, and we should first settle on the API, I guess. More than happy to provide a PR by the way, once the API is approved.

After integrating your feedback to my comment above, the API proposal would look like this:

namespace System
{
    // NEW methods on EXISTING type
    public static class Convert
    {
        // Decode from chars
        public static byte[] FromHexCharArray(char[] inArray, int offset, int length) => throw null;
        public static bool TryFromHexChars(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten) => throw null;

        // Decode from strings
        public static byte[] FromHexString(string s) => throw null;
        public static byte[] FromHexString(ReadOnlySpan<char> chars) => throw null;
        public static bool TryFromHexString(string s, Span<byte> bytes, out int bytesWritten) => throw null;

        // Encode to chars
        public static int ToHexCharArray(byte[] inArray, int offsetIn, int length, char[] outArray, int offsetOut, bool lowercase = false) => throw null;
        public static bool TryToHexChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten, bool lowercase = false) => throw null;

        // Encode to strings
        public static string ToHexString(byte[] inArray, bool lowercase = false) => throw null;
        public static string ToHexString(byte[] inArray, int offset, int length, bool lowercase = false) => throw null;
        public static string ToHexString(ReadOnlySpan<byte> bytes, bool lowercase = false) => throw null;
    }
}

GSPP · 2020-04-25T17:18:24Z

Maybe get in a simple implementation soon and rely on a community contribution to provide a super fast vectorized implementation. Sounds like a fun project for someone.

GrabYourPitchforks · 2020-04-25T20:24:30Z

@tkp1n For your scenarios, would those tracers want to do a straight hex dump "00112233..." or would they want things to be formatted "00-11-22-..."? I've never seen any amount of data larger than a few dozen bytes dumped to hex as-is without some kind of formatting to make it more human-readable.

In the case of the aspnet core code you linked to above, it can be written in terms of the 100% fully type-safe HexConverter internal class already in CoreLib, and its performance is within 1 nanosecond of the unsafe SSE3 version.

Method	value	Mean	Error	StdDev
UseSse3_Unsafe	3735928559	1.324 ns	0.0298 ns	0.0249 ns
UseBitTwiddle_Safe_B	3735928559	2.237 ns	0.0259 ns	0.0216 ns

tkp1n · 2020-04-27T07:25:39Z

@GSPP provide a super fast vectorized implementation

I did that already: https://github.com/tkp1n/HexMate/

@GrabYourPitchforks a straight hex dump or would they want things to be formatted

I'd say that depends on whether the primary reason for tracing is the consumption of the trace by a human or by a machine (with maybe a human looking at it from time to time). I've seen both, but over a certain amount of data there is almost always some formatting applied.

Another not yet mentioned source of "large" hex strings is the xs:hexBinary primitive datatype in XML schemas. Such values are also encoded by the internal HexConverter class in CoreLib (via BinHexEncoder) that is undoubtedly fast for small inputs but falls behind vectorized implementations quickly for more substantial buffers. Decoding of hexBinary is done in BinHexDecoder using (yet) another implementation tolerating a mix of upper- and lowercase characters as well as whitespace.

terrajobst · 2020-06-04T18:11:09Z

Video

We decided to go with a simpler route.

namespace System
{
    public static class Convert
    {
        public static byte[] FromHexString(string s);
        public static byte[] FromHexString(ReadOnlySpan<char> chars);

        public static string ToHexString(byte[] inArray);
        public static string ToHexString(byte[] inArray, int offset, int length);
        public static string ToHexString(ReadOnlySpan<byte> bytes);
    }
}

tkp1n · 2020-06-04T21:27:47Z

@terrajobst or @GrabYourPitchforks feel free to assign this to me. I‘d be happy to provide a PR for this.

GrabYourPitchforks · 2020-06-05T00:43:39Z

@tkp1n done! Thanks :)

joshfree assigned Priya91 Jul 12, 2016

Priya91 assigned AlexGhiondea and unassigned Priya91 Aug 22, 2016

AlexGhiondea removed their assignment Nov 24, 2016

terrajobst assigned GrabYourPitchforks Aug 20, 2019

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020

GrabYourPitchforks removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 25, 2020

GrabYourPitchforks added the api-ready-for-review label Apr 25, 2020

terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review untriaged New issue has not been triaged by the area owner labels Jun 4, 2020

terrajobst unassigned GrabYourPitchforks Jun 4, 2020

terrajobst added the help wanted [up-for-grabs] Good issue for external contributors label Jun 4, 2020

GrabYourPitchforks assigned tkp1n Jun 5, 2020

GrabYourPitchforks removed the help wanted [up-for-grabs] Good issue for external contributors label Jun 5, 2020

tkp1n mentioned this issue Jun 6, 2020

Add methods to convert between hexadecimal strings and bytes #37546

Merged

danmoseley mentioned this issue Jun 10, 2020

Allow BitConverter.ToString to format without dashes #14014

Closed

bartonjs closed this as completed in #37546 Jul 18, 2020

tkp1n mentioned this issue Jul 21, 2020

Add vectorized implementation of hex encoding/decoding #39702

Closed

qmfrederik mentioned this issue Nov 21, 2020

Add a managed Git implementation dotnet/Nerdbank.GitVersioning#521

Merged

12 tasks

ghost locked as resolved and limited conversation to collaborators Dec 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add methods to convert between hexadecimal strings and bytes #17837

Add methods to convert between hexadecimal strings and bytes #17837

GSPP commented Jul 12, 2016

joshfree commented Jul 12, 2016

juliusfriedman commented Jul 12, 2016 •

edited

Loading

AlexGhiondea commented Nov 23, 2016

AlexGhiondea commented Nov 24, 2016

CodesInChaos commented Nov 24, 2016 •

edited by karelz

Loading

karelz commented Nov 24, 2016 •

edited

Loading

GSPP commented Aug 12, 2018

GrabYourPitchforks commented Feb 13, 2019

vcsjones commented Feb 14, 2019 •

edited

Loading

GrabYourPitchforks commented Feb 14, 2019

GSPP commented Feb 14, 2019

AlexeiScherbakov commented Feb 17, 2019

AlexeiScherbakov commented Feb 17, 2019

tkp1n commented Jul 13, 2019

GSPP commented Jul 13, 2019 •

edited

Loading

terrajobst commented Aug 20, 2019

bugproof commented Dec 5, 2019 •

edited

Loading

GrabYourPitchforks commented Apr 19, 2020

tkp1n commented Apr 20, 2020

GSPP commented Apr 25, 2020

GrabYourPitchforks commented Apr 25, 2020 •

edited

Loading

tkp1n commented Apr 27, 2020

terrajobst commented Jun 4, 2020 •

edited

Loading

tkp1n commented Jun 4, 2020

GrabYourPitchforks commented Jun 5, 2020

Add methods to convert between hexadecimal strings and bytes #17837

Add methods to convert between hexadecimal strings and bytes #17837

Comments

GSPP commented Jul 12, 2016

joshfree commented Jul 12, 2016

juliusfriedman commented Jul 12, 2016 • edited Loading

AlexGhiondea commented Nov 23, 2016

AlexGhiondea commented Nov 24, 2016

CodesInChaos commented Nov 24, 2016 • edited by karelz Loading

karelz commented Nov 24, 2016 • edited Loading

GSPP commented Aug 12, 2018

GrabYourPitchforks commented Feb 13, 2019

vcsjones commented Feb 14, 2019 • edited Loading

GrabYourPitchforks commented Feb 14, 2019

GSPP commented Feb 14, 2019

AlexeiScherbakov commented Feb 17, 2019

AlexeiScherbakov commented Feb 17, 2019

tkp1n commented Jul 13, 2019

Sample implementation to play arround with

Adressed proposals and comments:

Open questions:

API proposal

GSPP commented Jul 13, 2019 • edited Loading

terrajobst commented Aug 20, 2019

bugproof commented Dec 5, 2019 • edited Loading

GrabYourPitchforks commented Apr 19, 2020

tkp1n commented Apr 20, 2020

GSPP commented Apr 25, 2020

GrabYourPitchforks commented Apr 25, 2020 • edited Loading

tkp1n commented Apr 27, 2020

terrajobst commented Jun 4, 2020 • edited Loading

tkp1n commented Jun 4, 2020

GrabYourPitchforks commented Jun 5, 2020

juliusfriedman commented Jul 12, 2016 •

edited

Loading

CodesInChaos commented Nov 24, 2016 •

edited by karelz

Loading

karelz commented Nov 24, 2016 •

edited

Loading

vcsjones commented Feb 14, 2019 •

edited

Loading

GSPP commented Jul 13, 2019 •

edited

Loading

bugproof commented Dec 5, 2019 •

edited

Loading

GrabYourPitchforks commented Apr 25, 2020 •

edited

Loading

terrajobst commented Jun 4, 2020 •

edited

Loading