Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove string allocation from XmlTextEncoder.WriteCharEntity #61774

Closed

Conversation

kronic
Copy link
Contributor

@kronic kronic commented Nov 18, 2021

No description provided.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 18, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Xml and removed community-contribution Indicates that the PR has been added by a community member labels Nov 18, 2021
@ghost
Copy link

ghost commented Nov 18, 2021

Tagging subscribers to this area: @dotnet/area-system-xml
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: kronic
Assignees: -
Labels:

area-System.Xml

Milestone: -

@kronic kronic changed the title remove string allocation from XmlTextEncoderWriteCharEntityImpl remove string allocation from XmlTextEncoder.WriteCharEntityImpl Nov 18, 2021
@kronic kronic changed the title remove string allocation from XmlTextEncoder.WriteCharEntityImpl Remove string allocation from XmlTextEncoder.WriteCharEntityImpl Nov 18, 2021
stephentoub
stephentoub previously approved these changes Nov 18, 2021
@Wraith2
Copy link
Contributor

Wraith2 commented Nov 18, 2021

LGTM

@jeffhandley jeffhandley added community-contribution Indicates that the PR has been added by a community member needs more info labels Nov 19, 2021
@jeffhandley
Copy link
Member

I've marked this PR as "needs more info" for now. We'd like to better understand the risk/reward for the series of XML changes. See #61773 (comment). Thanks!

@drieseng
Copy link
Contributor

@kronic How's the performance of TextWriter.Write(ReadOnlySpan<char>) compared to TextWriter.Write(String)? In the ROS overload, we now have a try/finally to return the rented array. Do we have a concrete TextWriter that does not override TextWriter.Write(ReadOnlySpan<char>)?

@stephentoub Do we really need to return the rented array in case of an exception? Does it outweight the cost of the try/finally?

@kronic
Copy link
Contributor Author

kronic commented Nov 19, 2021

@drieseng I don't know what performance TextWriter.Write (ReadOnlySpan ) has. These are problems in the TextWriter code.

@kronic
Copy link
Contributor Author

kronic commented Nov 19, 2021

@drieseng Why use try / finally if, when an exception occurs, the rented array will return a finalizer?

@drieseng
Copy link
Contributor

@kronic I understand that, from your point of view, you're not touching TextWriter so its performance characterics are not your concern. But you are switching from using TextWriter.Write(String) to TextWriter.Write(ReadOnlySpan<char>), so - personally - I'd verify that performance of XmlTextEncoder does not regress due to this change.

About the array renting: I'm not sure what the general policy is for rented arrays in case of an exception flow (that did not have a try/catch or try/finally when the renting was introduced).

@kronic
Copy link
Contributor Author

kronic commented Nov 19, 2021

@drieseng as far as I can tell from the try / finally code is not used

@kronic
Copy link
Contributor Author

kronic commented Nov 19, 2021

@drieseng

Method Ch Mean Error StdDev Gen 0 Allocated
Span ) 41.15 ns 0.178 ns 0.158 ns 0.0191 160 B
String ) 39.75 ns 0.390 ns 0.365 ns 0.0229 192 B
Span 3 41.34 ns 0.249 ns 0.233 ns 0.0191 160 B
String 3 41.25 ns 0.244 ns 0.204 ns 0.0229 192 B
Span [ 41.16 ns 0.167 ns 0.148 ns 0.0191 160 B
String [ 39.83 ns 0.393 ns 0.367 ns 0.0229 192 B
Benchmark

[MemoryDiagnoser]
public class StringWriterBenchmark
{
	[Params('3', ')', '[')]
	public char Ch { get; set; }

	[Benchmark]
	public void Span()
	{
		using StringWriter stringWriter = new();
		Span<char> span = stackalloc char[8];
		((int) Ch).TryFormat(span, out var charsWritten, "X", NumberFormatInfo.InvariantInfo);
		stringWriter.Write(span.Slice(0, charsWritten));
	}

	[Benchmark]
	public void String()
	{
		using StringWriter stringWriter = new();
		stringWriter.Write(((int) Ch).ToString("X", NumberFormatInfo.InvariantInfo));
	}
}

@stephentoub
Copy link
Member

@drieseng, what try/finally and rented array are you taking about?

@drieseng
Copy link
Contributor

@stephentoub I'm talking about the base implementation in TextWriter here.

@kronic You're not testing the TextWriter.Write(ReadOnlySpan<char>) implementation. StringWriter is overriding that method.

@Wraith2
Copy link
Contributor

Wraith2 commented Nov 19, 2021

There's a rental in TextWriter.Write(ROS<char>) so that it can pass a char[] to the Write(Char[]) overload, I'm not sure why that couldn't just be a direct call to the ROS overload but I know enough to suspect that there's a good reason.

@kronic
Copy link
Contributor Author

kronic commented Nov 19, 2021

@drieseng TextWritter abstract class. How to test it?

@stephentoub
Copy link
Member

Do we really need to return the rented array in case of an exception? Does it outweigh the cost of the try/finally?

We're not strict about returning in the case of exception, but I'd be surprised if it made a meaningful difference in this case to avoid the try/finally. The primary cost of a try/finally is preventing inlining, which this virtual is very unlikely to be when used via the XML APIs. On top of that, any TextWriter-derived type that cares about eeking out every last ounce of perf needs to override this method, which both StringWriter and StreamWriter do.

@stephentoub
Copy link
Member

There's a rental in TextWriter.Write(ROS) so that it can pass a char[] to the Write(Char[]) overload, I'm not sure why that couldn't just be a direct call to the ROS overload but I know enough to suspect that there's a good reason.

A direct call to the ROS overload on what? It's the base implementation of Write(ROS)`; that can't call itself.

@Wraith2
Copy link
Contributor

Wraith2 commented Nov 19, 2021

Yeah, being abstract it isn't going to call that, it's a fallback. As you said any implementation that cares about perf will override the ROS. I don't see a problem.

@kronic kronic changed the title Remove string allocation from XmlTextEncoder.WriteCharEntityImpl Remove string allocation from XmlTextEncoder.WriteCharEntity Nov 19, 2021
@kronic
Copy link
Contributor Author

kronic commented Dec 8, 2021

@krwq Until you merge, I want to conduct a performance test

@kronic
Copy link
Contributor Author

kronic commented Dec 9, 2021

@krwq @stephentoub
Benchmark

using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Xml;

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[module: SkipLocalsInit]

[MemoryDiagnoser]
public class Program
{
	private readonly char[] _chars = Enumerable.Range(char.MinValue, char.MaxValue).Select(x => (char) x)
		.Where(x => char.IsLetterOrDigit(x) || char.IsPunctuation(x) || char.IsSeparator(x))
		.ToArray();

	private readonly XmlWriter _xw = new XmlTextWriter(Stream.Null, Encoding.UTF8);
	public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

	[Benchmark]
	public void WriteEntity()
	{
		var writer = _xw;

		for (var i = 0; i < 100; i++)
		{
			for (var j = 0; j < _chars.Length; j++)
			{
				writer.WriteCharEntity(_chars[j]);
			}
		}
	}
}

runtime

Method Mean Error StdDev Gen 0 Allocated
WriteEntity 162.8 ms 1.38 ms 1.15 ms 19000.0000 152 MB

current pr

Method Mean Error StdDev Allocated
WriteEntity 181.8 ms 1.65 ms 1.54 ms 384 B

my

Method Mean Error StdDev Allocated
WriteEntity 134.7 ms 1.43 ms 1.27 ms 392 B
private static int WriteCharToSpan(Span<char> destination, int ch)
{
     Debug.Assert(destination.Length >= 12);
     destination[0] = '&';
     destination[1] = '#';
     destination[2] = 'x';
     bool result = ((uint)ch).TryFormat(destination.Slice(3), out int charsWritten, "X");
     Debug.Assert(result);
     destination[charsWritten + 3] = ';';
     return charsWritten;
}

@krwq
Copy link
Member

krwq commented Dec 9, 2021

@kronic, can you clarify what "my" means (i.e. point to a specific commit or something) and also compare that alongside with @stephentoub's version?

@krwq krwq added the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 9, 2021
@stephentoub
Copy link
Member

And current pr is not what I have in my commit.

@kronic
Copy link
Contributor Author

kronic commented Dec 9, 2021

I will make two commits with different versions and run a performance test

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 9, 2021
@stephentoub
Copy link
Member

I will make two commits with different versions and run a performance test

I have zero doubt that manually expanding what TryWrite is doing will be slightly faster; TryWrite needs to validate on each write that there's space remaining in the destination span.

But it's also simpler, more maintainable, more understandable. Unless there's a compelling top-level scenario where those nanoseconds are proven to matter, I would like the version that doesn't allocate, is just as fast as the one that does allocate, and that's a one-line replacement.

@kronic
Copy link
Contributor Author

kronic commented Dec 9, 2021

@stephentoub
I found several places where I could remove the allocation and reuse the method XmlTextEncoder.WriteCharToSpan
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

@kronic
Copy link
Contributor Author

kronic commented Dec 9, 2021

@stephentoub Usage scenario. every few hours a snapshot of a database of 10-50 GB is taken and there is a need to save it in xml.

@stephentoub
Copy link
Member

stephentoub commented Dec 9, 2021

Usage scenario. every few hours a snapshot of a database of 10-50 GB is taken and there is a need to save it in xml.

And what percentage of those chars being written hit the code path where the character is written as an entity? What does this change do to the execution time of that? Do you have an approximation for what that operation is doing that can be benchmarkwd? I'm skeptical that the microbenchmark on WriteCharEntity here is representative of that whole operation.

@kronic
Copy link
Contributor Author

kronic commented Dec 9, 2021

@stephentoub It is difficult to estimate, but columns in the database with the symbol type are quite common.
There is also a scenario to save this to json and it is several times faster

@stephentoub
Copy link
Member

It is difficult to estimate, but columns in the database with the symbol type are quite common.

When you profile saving that 10GB, what percentage of the time is spent inside WriteCharEntity?

@kronic
Copy link
Contributor Author

kronic commented Dec 9, 2021

Benchmark

using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Xml;

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[module: SkipLocalsInit]

[MemoryDiagnoser]
public class Program
{
	private readonly char[] _chars = Enumerable.Range(char.MinValue, char.MaxValue).Select(x => (char) x)
		.Where(x => char.IsLetterOrDigit(x) || char.IsPunctuation(x) || char.IsSeparator(x))
		.ToArray();

	private readonly XmlWriter _xw = new XmlTextWriter(Stream.Null, Encoding.UTF8);
	public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

	[Benchmark]
	public void WriteEntity()
	{
		var writer = _xw;

		for (var i = 0; i < 100; i++)
		{
			for (var j = 0; j < _chars.Length; j++)
			{
				writer.WriteCharEntity(_chars[j]);
			}
		}
	}
}

runtime

Method Mean Error StdDev Gen 0 Allocated
WriteEntity 162.8 ms 1.38 ms 1.15 ms 19000.0000 152 MB

stephentoub version

Method Mean Error StdDev Allocated
WriteEntity 180.1 ms 3.38 ms 3.47 ms 384 B
my version

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.IO;
using System.Text;
using System.Diagnostics;

namespace System.Xml
{
    // XmlTextEncoder
    //
    // This class does special handling of text content for XML.  For example
    // it will replace special characters with entities whenever necessary.
    internal sealed class XmlTextEncoder
    {
        //
        // Fields
        //
        // output text writer
        private readonly TextWriter _textWriter;

        // true when writing out the content of attribute value
        private bool _inAttribute;

        // quote char of the attribute (when inAttribute)
        private char _quoteChar;

        // caching of attribute value
        private StringBuilder? _attrValue;
        private bool _cacheAttrValue;

        //
        // Constructor
        //
        internal XmlTextEncoder(TextWriter textWriter)
        {
            _textWriter = textWriter;
            _quoteChar = '"';
        }

        //
        // Internal methods and properties
        //
        internal char QuoteChar
        {
            set
            {
                _quoteChar = value;
            }
        }

        internal void StartAttribute(bool cacheAttrValue)
        {
            _inAttribute = true;
            _cacheAttrValue = cacheAttrValue;
            if (cacheAttrValue)
            {
                if (_attrValue == null)
                {
                    _attrValue = new StringBuilder();
                }
                else
                {
                    _attrValue.Length = 0;
                }
            }
        }

        internal void EndAttribute()
        {
            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Length = 0;
            }

            _inAttribute = false;
            _cacheAttrValue = false;
        }

        internal string AttributeValue
        {
            get
            {
                if (_cacheAttrValue)
                {
                    Debug.Assert(_attrValue != null);
                    return _attrValue.ToString();
                }
                else
                {
                    return string.Empty;
                }
            }
        }

        internal void WriteSurrogateChar(char lowChar, char highChar)
        {
            if (!XmlCharType.IsLowSurrogate(lowChar) ||
                 !XmlCharType.IsHighSurrogate(highChar))
            {
                throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, highChar);
            }

            _textWriter.Write(highChar);
            _textWriter.Write(lowChar);
        }

        internal void Write(char[] array, int offset, int count)
        {
            if (null == array)
            {
                throw new ArgumentNullException(nameof(array));
            }

            if (0 > offset)
            {
                throw new ArgumentOutOfRangeException(nameof(offset));
            }

            if (0 > count)
            {
                throw new ArgumentOutOfRangeException(nameof(count));
            }

            if (count > array.Length - offset)
            {
                throw new ArgumentOutOfRangeException(nameof(count));
            }

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(array, offset, count);
            }

            int endPos = offset + count;
            int i = offset;
            char ch = (char)0;
            while (true)
            {
                int startPos = i;
                while (i < endPos && XmlCharType.IsAttributeValueChar(ch = array[i]))
                {
                    i++;
                }

                if (startPos < i)
                {
                    _textWriter.Write(array, startPos, i - startPos);
                }
                if (i == endPos)
                {
                    break;
                }

                switch (ch)
                {
                    case (char)0x9:
                        _textWriter.Write(ch);
                        break;
                    case (char)0xA:
                    case (char)0xD:
                        if (_inAttribute)
                        {
                            WriteCharEntityImpl(ch);
                        }
                        else
                        {
                            _textWriter.Write(ch);
                        }
                        break;

                    case '<':
                        WriteEntityRefImpl("lt");
                        break;
                    case '>':
                        WriteEntityRefImpl("gt");
                        break;
                    case '&':
                        WriteEntityRefImpl("amp");
                        break;
                    case '\'':
                        if (_inAttribute && _quoteChar == ch)
                        {
                            WriteEntityRefImpl("apos");
                        }
                        else
                        {
                            _textWriter.Write('\'');
                        }
                        break;
                    case '"':
                        if (_inAttribute && _quoteChar == ch)
                        {
                            WriteEntityRefImpl("quot");
                        }
                        else
                        {
                            _textWriter.Write('"');
                        }
                        break;
                    default:
                        if (XmlCharType.IsHighSurrogate(ch))
                        {
                            if (i + 1 < endPos)
                            {
                                WriteSurrogateChar(array[++i], ch);
                            }
                            else
                            {
                                throw new ArgumentException(SR.Xml_SurrogatePairSplit);
                            }
                        }
                        else if (XmlCharType.IsLowSurrogate(ch))
                        {
                            throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
                        }
                        else
                        {
                            Debug.Assert((ch < 0x20 && !XmlCharType.IsWhiteSpace(ch)) || (ch > 0xFFFD));
                            WriteCharEntityImpl(ch);
                        }
                        break;
                }
                i++;
            }
        }

        internal void WriteSurrogateCharEntity(char lowChar, char highChar)
        {
            if (!XmlCharType.IsLowSurrogate(lowChar) || !XmlCharType.IsHighSurrogate(highChar))
            {
                throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, highChar);
            }

            int surrogateChar = XmlCharType.CombineSurrogateChar(lowChar, highChar);

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(highChar);
                _attrValue.Append(lowChar);
            }

            Span<char> span = stackalloc char[12];
            int charsWritten = WriteCharToSpan(span, surrogateChar);
            _textWriter.Write(span.Slice(0, charsWritten));
        }

        internal void Write(ReadOnlySpan<char> text)
        {
            if (text.IsEmpty)
            {
                return;
            }

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(text);
            }

            // scan through the string to see if there are any characters to be escaped
            int len = text.Length;
            int i = 0;
            int startPos = 0;
            char ch = (char)0;
            while (true)
            {
                while (i < len && XmlCharType.IsAttributeValueChar(ch = text[i]))
                {
                    i++;
                }

                if (i == len)
                {
                    // reached the end of the string -> write it whole out
                    _textWriter.Write(text);
                    return;
                }
                if (_inAttribute)
                {
                    if (ch == 0x9)
                    {
                        i++;
                        continue;
                    }
                }
                else
                {
                    if (ch == 0x9 || ch == 0xA || ch == 0xD || ch == '"' || ch == '\'')
                    {
                        i++;
                        continue;
                    }
                }
                // some character that needs to be escaped is found:
                break;
            }

            while (true)
            {
                if (startPos < i)
                {
                    _textWriter.Write(text.Slice(startPos, i - startPos));
                }

                if (i == len)
                {
                    break;
                }

                switch (ch)
                {
                    case (char)0x9:
                        _textWriter.Write(ch);
                        break;
                    case (char)0xA:
                    case (char)0xD:
                        if (_inAttribute)
                        {
                            WriteCharEntityImpl(ch);
                        }
                        else
                        {
                            _textWriter.Write(ch);
                        }
                        break;
                    case '<':
                        WriteEntityRefImpl("lt");
                        break;
                    case '>':
                        WriteEntityRefImpl("gt");
                        break;
                    case '&':
                        WriteEntityRefImpl("amp");
                        break;
                    case '\'':
                        if (_inAttribute && _quoteChar == ch)
                        {
                            WriteEntityRefImpl("apos");
                        }
                        else
                        {
                            _textWriter.Write('\'');
                        }
                        break;
                    case '"':
                        if (_inAttribute && _quoteChar == ch)
                        {
                            WriteEntityRefImpl("quot");
                        }
                        else
                        {
                            _textWriter.Write('"');
                        }
                        break;
                    default:
                        if (XmlCharType.IsHighSurrogate(ch))
                        {
                            if (i + 1 < len)
                            {
                                WriteSurrogateChar(text[++i], ch);
                            }
                            else
                            {
                                throw XmlConvert.CreateInvalidSurrogatePairException(text[i], ch);
                            }
                        }
                        else if (XmlCharType.IsLowSurrogate(ch))
                        {
                            throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
                        }
                        else
                        {
                            Debug.Assert((ch < 0x20 && !XmlCharType.IsWhiteSpace(ch)) || (ch > 0xFFFD));
                            WriteCharEntityImpl(ch);
                        }
                        break;
                }
                i++;
                startPos = i;
                while (i < len && XmlCharType.IsAttributeValueChar(ch = text[i]))
                {
                    i++;
                }
            }
        }

        internal void WriteRawWithSurrogateChecking(string text)
        {
            if (text == null)
            {
                return;
            }

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(text);
            }

            int len = text.Length;
            int i = 0;
            char ch = (char)0;

            while (true)
            {
                while (i < len && (XmlCharType.IsCharData((ch = text[i])) || ch < 0x20))
                {
                    i++;
                }
                if (i == len)
                {
                    break;
                }
                if (XmlCharType.IsHighSurrogate(ch))
                {
                    if (i + 1 < len)
                    {
                        char lowChar = text[i + 1];
                        if (XmlCharType.IsLowSurrogate(lowChar))
                        {
                            i += 2;
                            continue;
                        }
                        else
                        {
                            throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, ch);
                        }
                    }
                    throw new ArgumentException(SR.Xml_InvalidSurrogateMissingLowChar);
                }
                else if (XmlCharType.IsLowSurrogate(ch))
                {
                    throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
                }
                else
                {
                    i++;
                }
            }

            _textWriter.Write(text);
            return;
        }

        internal void WriteRaw(char[] array, int offset, int count)
        {
            if (null == array)
            {
                throw new ArgumentNullException(nameof(array));
            }

            if (0 > count)
            {
                throw new ArgumentOutOfRangeException(nameof(count));
            }

            if (0 > offset)
            {
                throw new ArgumentOutOfRangeException(nameof(offset));
            }

            if (count > array.Length - offset)
            {
                throw new ArgumentOutOfRangeException(nameof(count));
            }

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(array, offset, count);
            }

            _textWriter.Write(array, offset, count);
        }

        internal void WriteCharEntity(char ch)
        {
            if (XmlCharType.IsSurrogate(ch))
            {
                throw new ArgumentException(SR.Xml_InvalidSurrogateMissingLowChar);
            }

            Span<char> span = stackalloc char[12];
            int charsWritten = WriteCharToSpan(span, ch);
            ReadOnlySpan<char> ros = span.Slice(0, charsWritten);

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(ros);
            }

            _textWriter.Write(ros);
        }

        internal void WriteEntityRef(string name)
        {
            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append('&');
                _attrValue.Append(name);
                _attrValue.Append(';');
            }

            WriteEntityRefImpl(name);
        }

        //
        // Private implementation methods
        //

        private void WriteCharEntityImpl(char ch)
        {
            Span<char> span = stackalloc char[12];
            int charsWritten = WriteCharToSpan(span, ch);
            _textWriter.Write(span.Slice(0, charsWritten));
        }

        private static int WriteCharToSpan(Span<char> destination, int ch)
        {
            Debug.Assert(destination.Length >= 12);
            destination[0] = '&';
            destination[1] = '#';
            destination[2] = 'x';
            ((uint)ch).TryFormat(destination.Slice(3), out int charsWritten, "X");
            Debug.Assert(charsWritten != 0);
            destination[charsWritten + 3] = ';';
            return charsWritten;
        }

        private void WriteEntityRefImpl(string name)
        {
            _textWriter.Write('&');
            _textWriter.Write(name);
            _textWriter.Write(';');
        }
    }
}

Method Mean Error StdDev Allocated
WriteEntity 128.7 ms 1.62 ms 1.36 ms 722 B

@kronic
Copy link
Contributor Author

kronic commented Dec 9, 2021

@stephentoub I understand that the percentage is small, but it is. I think my version is not much harder to understand, but it will remove the allocation and make the code faster and makes it possible to reuse it, I don't see anything wrong with that.

@kronic
Copy link
Contributor Author

kronic commented Dec 9, 2021

@stephentoub @krwq
I would like to finish already. it remains to choose an option. You decide which one?

@stephentoub
Copy link
Member

stephentoub commented Dec 9, 2021

You and I get very different numbers. How exactly are you building System.Private.Xml.dll with these changes and then running the benchmarks? Regardless, I deterministically see the TryWrite version as being at least 10% faster than what's currently in main, not slower as your numbers suggest. This is exactly the kind of scenario TryWriter is geared towards, enabling devs to write the simple thing and let the system generate the optimal set of calls under the covers. So I want anything that should be using to use it rather than trying to work around any perceived current perf limitations by open-coding a replacement; then when TryWrite improves, so too does all the code using it... we improve the primitive implementations, and everyone benefits. The code is also simpler, very clearly expressing the intent of what should be written.

If this were a super hot path, then yeah, it might be worth an exception to eek out a few more cycles of savings. But we're talking about an API that's already a corner-case, used fairly rarely, a very small percentage of time, and then within that we're talking about a small percentage of that small percentage of possible improvement in open-coding the solution vs just doing the simple thing with TryWrite. I would be very surprised if you were able to measure the impact of that difference on your 10-50GB workload; I'd welcome you trying it out and sharing the profiles that prove me wrong.

In the meantime, please use TryWrite, at the call site rather than in a helper.

Thanks.

@kronic
Copy link
Contributor Author

kronic commented Dec 10, 2021

@stephentoub

fist build

build clr+libs -c release -arch x64

rebuild

build -c release -arch x64 -projects src/libraries/*/System.Private.Xml.sln

run test

dotnet run -c release -f net6.0 -arch x64 --filter *Program* --corerun D:\GitHub\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\CoreRun.exe
Benchmark

using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Xml;

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[module: SkipLocalsInit]

[MemoryDiagnoser]
public class Program
{
	private readonly char[] _chars = Enumerable.Range(char.MinValue, char.MaxValue).Select(x => (char) x)
		.Where(x => char.IsLetterOrDigit(x) || char.IsPunctuation(x) || char.IsSeparator(x))
		.ToArray();

	private readonly XmlWriter _xw = new XmlTextWriter(Stream.Null, Encoding.UTF8);
	public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

	[Benchmark]
	public void WriteEntity()
	{
		var writer = _xw;

		for (var i = 0; i < 100; i++)
		{
			for (var j = 0; j < _chars.Length; j++)
			{
				writer.WriteCharEntity(_chars[j]);
			}
		}
	}
}

runtime

Method Mean Error StdDev Gen 0 Allocated
WriteEntity 159.6 ms 0.94 ms 0.83 ms 19000.0000 152 MB
Benchmark log

// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 1 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet restore  /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\34c86f8a-8098-462d-a26a-7860fa1dff1a
// command took 0,92s and exited with 0
// start dotnet build -c Release  --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\34c86f8a-8098-462d-a26a-7860fa1dff1a
// command took 11,56s and exited with 1
// start dotnet build -c Release  --no-restore --no-dependencies /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\34c86f8a-8098-462d-a26a-7860fa1dff1a
// command took 1,53s and exited with 0
// ***** Done, took 00:00:14 (14.05 sec)   *****
// Found 1 benchmarks:
//   Program.WriteEntity: DefaultJob

// **************************
// Benchmark: Program.WriteEntity: DefaultJob
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "34c86f8a-8098-462d-a26a-7860fa1dff1a.dll" --benchmarkName "Program.WriteEntity" --job "Default" --benchmarkId 0 in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\34c86f8a-8098-462d-a26a-7860fa1dff1a\bin\Release\net6.0
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET 6.0.0 (6.0.21.52210), X64 RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob

OverheadJitting  1: 1 op, 143700.00 ns, 143.7000 us/op
WorkloadJitting  1: 1 op, 185894500.00 ns, 185.8945 ms/op

WorkloadPilot    1: 2 op, 325941800.00 ns, 162.9709 ms/op
WorkloadPilot    2: 3 op, 475318600.00 ns, 158.4395 ms/op
WorkloadPilot    3: 4 op, 648750000.00 ns, 162.1875 ms/op

WorkloadWarmup   1: 4 op, 636597400.00 ns, 159.1493 ms/op
WorkloadWarmup   2: 4 op, 639207600.00 ns, 159.8019 ms/op
WorkloadWarmup   3: 4 op, 638555600.00 ns, 159.6389 ms/op
WorkloadWarmup   4: 4 op, 635842000.00 ns, 158.9605 ms/op
WorkloadWarmup   5: 4 op, 638733100.00 ns, 159.6833 ms/op
WorkloadWarmup   6: 4 op, 642670300.00 ns, 160.6676 ms/op
WorkloadWarmup   7: 4 op, 647835000.00 ns, 161.9588 ms/op
WorkloadWarmup   8: 4 op, 638414300.00 ns, 159.6036 ms/op

// BeforeActualRun
WorkloadActual   1: 4 op, 639148700.00 ns, 159.7872 ms/op
WorkloadActual   2: 4 op, 635894200.00 ns, 158.9735 ms/op
WorkloadActual   3: 4 op, 637180000.00 ns, 159.2950 ms/op
WorkloadActual   4: 4 op, 635098100.00 ns, 158.7745 ms/op
WorkloadActual   5: 4 op, 638554500.00 ns, 159.6386 ms/op
WorkloadActual   6: 4 op, 638619300.00 ns, 159.6548 ms/op
WorkloadActual   7: 4 op, 640533700.00 ns, 160.1334 ms/op
WorkloadActual   8: 4 op, 636448700.00 ns, 159.1122 ms/op
WorkloadActual   9: 4 op, 641501000.00 ns, 160.3752 ms/op
WorkloadActual  10: 4 op, 637913600.00 ns, 159.4784 ms/op
WorkloadActual  11: 4 op, 646427900.00 ns, 161.6070 ms/op
WorkloadActual  12: 4 op, 632694700.00 ns, 158.1737 ms/op
WorkloadActual  13: 4 op, 641402900.00 ns, 160.3507 ms/op
WorkloadActual  14: 4 op, 638826900.00 ns, 159.7067 ms/op
WorkloadActual  15: 4 op, 671799600.00 ns, 167.9499 ms/op

// AfterActualRun
WorkloadResult   1: 4 op, 639148700.00 ns, 159.7872 ms/op
WorkloadResult   2: 4 op, 635894200.00 ns, 158.9735 ms/op
WorkloadResult   3: 4 op, 637180000.00 ns, 159.2950 ms/op
WorkloadResult   4: 4 op, 635098100.00 ns, 158.7745 ms/op
WorkloadResult   5: 4 op, 638554500.00 ns, 159.6386 ms/op
WorkloadResult   6: 4 op, 638619300.00 ns, 159.6548 ms/op
WorkloadResult   7: 4 op, 640533700.00 ns, 160.1334 ms/op
WorkloadResult   8: 4 op, 636448700.00 ns, 159.1122 ms/op
WorkloadResult   9: 4 op, 641501000.00 ns, 160.3752 ms/op
WorkloadResult  10: 4 op, 637913600.00 ns, 159.4784 ms/op
WorkloadResult  11: 4 op, 646427900.00 ns, 161.6070 ms/op
WorkloadResult  12: 4 op, 632694700.00 ns, 158.1737 ms/op
WorkloadResult  13: 4 op, 641402900.00 ns, 160.3507 ms/op
WorkloadResult  14: 4 op, 638826900.00 ns, 159.7067 ms/op
GC:  76 0 0 638835680 4
Threading:  0 0 4

// AfterAll
// Benchmark Process 20416 has exited with code 0.

Mean = 159.647 ms, StdErr = 0.222 ms (0.14%), N = 14, StdDev = 0.830 ms
Min = 158.174 ms, Q1 = 159.158 ms, Median = 159.647 ms, Q3 = 160.047 ms, Max = 161.607 ms
IQR = 0.889 ms, LowerFence = 157.824 ms, UpperFence = 161.380 ms
ConfidenceInterval = [158.711 ms; 160.583 ms] (CI 99.9%), Margin = 0.936 ms (0.59% of Mean)
Skewness = 0.5, Kurtosis = 3.15, MValue = 2

// ***** BenchmarkRunner: Finish  *****

// * Export *
  rch\results\Program-report.csv
  rch\results\Program-report-github.md
  rch\results\Program-report.html

// * Detailed results *
Program.WriteEntity: DefaultJob
Runtime = .NET 6.0.0 (6.0.21.52210), X64 RyuJIT; GC = Concurrent Workstation
Mean = 159.647 ms, StdErr = 0.222 ms (0.14%), N = 14, StdDev = 0.830 ms
Min = 158.174 ms, Q1 = 159.158 ms, Median = 159.647 ms, Q3 = 160.047 ms, Max = 161.607 ms
IQR = 0.889 ms, LowerFence = 157.824 ms, UpperFence = 161.380 ms
ConfidenceInterval = [158.711 ms; 160.583 ms] (CI 99.9%), Margin = 0.936 ms (0.59% of Mean)
Skewness = 0.5, Kurtosis = 3.15, MValue = 2
-------------------- Histogram --------------------
[157.722 ms ; 162.059 ms) | @@@@@@@@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-10700K CPU 3.80GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT


|      Method |     Mean |   Error |  StdDev |      Gen 0 | Allocated |
|------------ |---------:|--------:|--------:|-----------:|----------:|
| WriteEntity | 159.6 ms | 0.94 ms | 0.83 ms | 19000.0000 |    152 MB |

// * Hints *
Outliers
  Program.WriteEntity: Default -> 1 outlier  was  removed (167.95 ms)

// * Legends *
  Mean      : Arithmetic mean of all measurements
  Error     : Half of 99.9% confidence interval
  StdDev    : Standard deviation of all measurements
  Gen 0     : GC Generation 0 collects per 1000 operations
  Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  1 ms      : 1 Millisecond (0.001 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:00:17 (17.47 sec), executed benchmarks: 1

Global total time: 00:00:31 (31.53 sec), executed benchmarks: 1
// * Artifacts cleanup *

stephentoub version

Method Mean Error StdDev Allocated
WriteEntity 176.9 ms 1.85 ms 1.64 ms 931 B
Benchmark log

// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 1 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet restore  /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\feb61c5f-5fad-4d84-ba50-4e0ea65e779c
// command took 0,92s and exited with 0
// start dotnet build -c Release  --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\feb61c5f-5fad-4d84-ba50-4e0ea65e779c
// command took 11,6s and exited with 1
// start dotnet build -c Release  --no-restore --no-dependencies /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\feb61c5f-5fad-4d84-ba50-4e0ea65e779c
// command took 1,57s and exited with 0
// start dotnet publish -c Release  --no-build --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\feb61c5f-5fad-4d84-ba50-4e0ea65e779c
// command took 0,8s and exited with 0
// ***** Done, took 00:00:15 (15.38 sec)   *****
// Found 1 benchmarks:
//   Program.WriteEntity: Job-AENGIS(Toolchain=CoreRun)

// **************************
// Benchmark: Program.WriteEntity: Job-AENGIS(Toolchain=CoreRun)
// *** Execute ***
// Launch: 1 / 1
// Execute: D:\GitHub\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\2d86707d-2f33-4fac-9ddc-c50b6022937b\CoreRun.exe "feb61c5f-5fad-4d84-ba50-4e0ea65e779c.dll" --benchmarkName "Program.WriteEntity" --job "Toolchain=CoreRun" --benchmarkId 0 in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\feb61c5f-5fad-4d84-ba50-4e0ea65e779c\bin\Release\net6.0\publish
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET 7.0.0 (42.42.42.42424), X64 RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob

OverheadJitting  1: 1 op, 167700.00 ns, 167.7000 us/op
WorkloadJitting  1: 1 op, 295519200.00 ns, 295.5192 ms/op

WorkloadPilot    1: 2 op, 357981900.00 ns, 178.9909 ms/op
WorkloadPilot    2: 3 op, 533923300.00 ns, 177.9744 ms/op

WorkloadWarmup   1: 3 op, 543323600.00 ns, 181.1079 ms/op
WorkloadWarmup   2: 3 op, 540817100.00 ns, 180.2724 ms/op
WorkloadWarmup   3: 3 op, 543364400.00 ns, 181.1215 ms/op
WorkloadWarmup   4: 3 op, 538547700.00 ns, 179.5159 ms/op
WorkloadWarmup   5: 3 op, 564461700.00 ns, 188.1539 ms/op
WorkloadWarmup   6: 3 op, 548004200.00 ns, 182.6681 ms/op

// BeforeActualRun
WorkloadActual   1: 3 op, 533261100.00 ns, 177.7537 ms/op
WorkloadActual   2: 3 op, 529441000.00 ns, 176.4803 ms/op
WorkloadActual   3: 3 op, 525278600.00 ns, 175.0929 ms/op
WorkloadActual   4: 3 op, 531574200.00 ns, 177.1914 ms/op
WorkloadActual   5: 3 op, 526571000.00 ns, 175.5237 ms/op
WorkloadActual   6: 3 op, 533311400.00 ns, 177.7705 ms/op
WorkloadActual   7: 3 op, 527909300.00 ns, 175.9698 ms/op
WorkloadActual   8: 3 op, 536006200.00 ns, 178.6687 ms/op
WorkloadActual   9: 3 op, 536299500.00 ns, 178.7665 ms/op
WorkloadActual  10: 3 op, 531637600.00 ns, 177.2125 ms/op
WorkloadActual  11: 3 op, 521450900.00 ns, 173.8170 ms/op
WorkloadActual  12: 3 op, 527291400.00 ns, 175.7638 ms/op
WorkloadActual  13: 3 op, 539889700.00 ns, 179.9632 ms/op
WorkloadActual  14: 3 op, 552511900.00 ns, 184.1706 ms/op
WorkloadActual  15: 3 op, 529025200.00 ns, 176.3417 ms/op

// AfterActualRun
WorkloadResult   1: 3 op, 533261100.00 ns, 177.7537 ms/op
WorkloadResult   2: 3 op, 529441000.00 ns, 176.4803 ms/op
WorkloadResult   3: 3 op, 525278600.00 ns, 175.0929 ms/op
WorkloadResult   4: 3 op, 531574200.00 ns, 177.1914 ms/op
WorkloadResult   5: 3 op, 526571000.00 ns, 175.5237 ms/op
WorkloadResult   6: 3 op, 533311400.00 ns, 177.7705 ms/op
WorkloadResult   7: 3 op, 527909300.00 ns, 175.9698 ms/op
WorkloadResult   8: 3 op, 536006200.00 ns, 178.6687 ms/op
WorkloadResult   9: 3 op, 536299500.00 ns, 178.7665 ms/op
WorkloadResult  10: 3 op, 531637600.00 ns, 177.2125 ms/op
WorkloadResult  11: 3 op, 521450900.00 ns, 173.8170 ms/op
WorkloadResult  12: 3 op, 527291400.00 ns, 175.7638 ms/op
WorkloadResult  13: 3 op, 539889700.00 ns, 179.9632 ms/op
WorkloadResult  14: 3 op, 529025200.00 ns, 176.3417 ms/op
GC:  0 0 0 2792 3
Threading:  0 0 3

// AfterAll
// Benchmark Process 7928 has exited with code 0.

Mean = 176.880 ms, StdErr = 0.438 ms (0.25%), N = 14, StdDev = 1.640 ms
Min = 173.817 ms, Q1 = 175.815 ms, Median = 176.836 ms, Q3 = 177.766 ms, Max = 179.963 ms
IQR = 1.951 ms, LowerFence = 172.889 ms, UpperFence = 180.693 ms
ConfidenceInterval = [175.030 ms; 178.730 ms] (CI 99.9%), Margin = 1.850 ms (1.05% of Mean)
Skewness = 0.06, Kurtosis = 2.16, MValue = 2

// ***** BenchmarkRunner: Finish  *****

// * Export *
  rch\results\Program-report.csv
  rch\results\Program-report-github.md
  rch\results\Program-report.html

// * Detailed results *
Program.WriteEntity: Job-AENGIS(Toolchain=CoreRun)
Runtime = .NET 7.0.0 (42.42.42.42424), X64 RyuJIT; GC = Concurrent Workstation
Mean = 176.880 ms, StdErr = 0.438 ms (0.25%), N = 14, StdDev = 1.640 ms
Min = 173.817 ms, Q1 = 175.815 ms, Median = 176.836 ms, Q3 = 177.766 ms, Max = 179.963 ms
IQR = 1.951 ms, LowerFence = 172.889 ms, UpperFence = 180.693 ms
ConfidenceInterval = [175.030 ms; 178.730 ms] (CI 99.9%), Margin = 1.850 ms (1.05% of Mean)
Skewness = 0.06, Kurtosis = 2.16, MValue = 2
-------------------- Histogram --------------------
[172.924 ms ; 180.856 ms) | @@@@@@@@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-10700K CPU 3.80GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  Job-AENGIS : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

Toolchain=CoreRun

|      Method |     Mean |   Error |  StdDev | Allocated |
|------------ |---------:|--------:|--------:|----------:|
| WriteEntity | 176.9 ms | 1.85 ms | 1.64 ms |     931 B |

// * Hints *
Outliers
  Program.WriteEntity: Toolchain=CoreRun -> 1 outlier  was  removed (184.17 ms)

// * Legends *
  Mean      : Arithmetic mean of all measurements
  Error     : Half of 99.9% confidence interval
  StdDev    : Standard deviation of all measurements
  Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  1 ms      : 1 Millisecond (0.001 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:00:13 (13.61 sec), executed benchmarks: 1

Global total time: 00:00:28 (29 sec), executed benchmarks: 1
// * Artifacts cleanup *

my version

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.IO;
using System.Text;
using System.Diagnostics;

namespace System.Xml
{
    // XmlTextEncoder
    //
    // This class does special handling of text content for XML.  For example
    // it will replace special characters with entities whenever necessary.
    internal sealed class XmlTextEncoder
    {
        //
        // Fields
        //
        // output text writer
        private readonly TextWriter _textWriter;

        // true when writing out the content of attribute value
        private bool _inAttribute;

        // quote char of the attribute (when inAttribute)
        private char _quoteChar;

        // caching of attribute value
        private StringBuilder? _attrValue;
        private bool _cacheAttrValue;

        //
        // Constructor
        //
        internal XmlTextEncoder(TextWriter textWriter)
        {
            _textWriter = textWriter;
            _quoteChar = '"';
        }

        //
        // Internal methods and properties
        //
        internal char QuoteChar
        {
            set
            {
                _quoteChar = value;
            }
        }

        internal void StartAttribute(bool cacheAttrValue)
        {
            _inAttribute = true;
            _cacheAttrValue = cacheAttrValue;
            if (cacheAttrValue)
            {
                if (_attrValue == null)
                {
                    _attrValue = new StringBuilder();
                }
                else
                {
                    _attrValue.Length = 0;
                }
            }
        }

        internal void EndAttribute()
        {
            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Length = 0;
            }

            _inAttribute = false;
            _cacheAttrValue = false;
        }

        internal string AttributeValue
        {
            get
            {
                if (_cacheAttrValue)
                {
                    Debug.Assert(_attrValue != null);
                    return _attrValue.ToString();
                }
                else
                {
                    return string.Empty;
                }
            }
        }

        internal void WriteSurrogateChar(char lowChar, char highChar)
        {
            if (!XmlCharType.IsLowSurrogate(lowChar) ||
                 !XmlCharType.IsHighSurrogate(highChar))
            {
                throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, highChar);
            }

            _textWriter.Write(highChar);
            _textWriter.Write(lowChar);
        }

        internal void Write(char[] array, int offset, int count)
        {
            if (null == array)
            {
                throw new ArgumentNullException(nameof(array));
            }

            if (0 > offset)
            {
                throw new ArgumentOutOfRangeException(nameof(offset));
            }

            if (0 > count)
            {
                throw new ArgumentOutOfRangeException(nameof(count));
            }

            if (count > array.Length - offset)
            {
                throw new ArgumentOutOfRangeException(nameof(count));
            }

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(array, offset, count);
            }

            int endPos = offset + count;
            int i = offset;
            char ch = (char)0;
            while (true)
            {
                int startPos = i;
                while (i < endPos && XmlCharType.IsAttributeValueChar(ch = array[i]))
                {
                    i++;
                }

                if (startPos < i)
                {
                    _textWriter.Write(array, startPos, i - startPos);
                }
                if (i == endPos)
                {
                    break;
                }

                switch (ch)
                {
                    case (char)0x9:
                        _textWriter.Write(ch);
                        break;
                    case (char)0xA:
                    case (char)0xD:
                        if (_inAttribute)
                        {
                            WriteCharEntityImpl(ch);
                        }
                        else
                        {
                            _textWriter.Write(ch);
                        }
                        break;

                    case '<':
                        WriteEntityRefImpl("lt");
                        break;
                    case '>':
                        WriteEntityRefImpl("gt");
                        break;
                    case '&':
                        WriteEntityRefImpl("amp");
                        break;
                    case '\'':
                        if (_inAttribute && _quoteChar == ch)
                        {
                            WriteEntityRefImpl("apos");
                        }
                        else
                        {
                            _textWriter.Write('\'');
                        }
                        break;
                    case '"':
                        if (_inAttribute && _quoteChar == ch)
                        {
                            WriteEntityRefImpl("quot");
                        }
                        else
                        {
                            _textWriter.Write('"');
                        }
                        break;
                    default:
                        if (XmlCharType.IsHighSurrogate(ch))
                        {
                            if (i + 1 < endPos)
                            {
                                WriteSurrogateChar(array[++i], ch);
                            }
                            else
                            {
                                throw new ArgumentException(SR.Xml_SurrogatePairSplit);
                            }
                        }
                        else if (XmlCharType.IsLowSurrogate(ch))
                        {
                            throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
                        }
                        else
                        {
                            Debug.Assert((ch < 0x20 && !XmlCharType.IsWhiteSpace(ch)) || (ch > 0xFFFD));
                            WriteCharEntityImpl(ch);
                        }
                        break;
                }
                i++;
            }
        }

        internal void WriteSurrogateCharEntity(char lowChar, char highChar)
        {
            if (!XmlCharType.IsLowSurrogate(lowChar) || !XmlCharType.IsHighSurrogate(highChar))
            {
                throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, highChar);
            }

            int surrogateChar = XmlCharType.CombineSurrogateChar(lowChar, highChar);

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(highChar);
                _attrValue.Append(lowChar);
            }

            Span<char> span = stackalloc char[12];
            int charsWritten = WriteCharToSpan(span, surrogateChar);
            _textWriter.Write(span.Slice(0, charsWritten));
        }

        internal void Write(ReadOnlySpan<char> text)
        {
            if (text.IsEmpty)
            {
                return;
            }

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(text);
            }

            // scan through the string to see if there are any characters to be escaped
            int len = text.Length;
            int i = 0;
            int startPos = 0;
            char ch = (char)0;
            while (true)
            {
                while (i < len && XmlCharType.IsAttributeValueChar(ch = text[i]))
                {
                    i++;
                }

                if (i == len)
                {
                    // reached the end of the string -> write it whole out
                    _textWriter.Write(text);
                    return;
                }
                if (_inAttribute)
                {
                    if (ch == 0x9)
                    {
                        i++;
                        continue;
                    }
                }
                else
                {
                    if (ch == 0x9 || ch == 0xA || ch == 0xD || ch == '"' || ch == '\'')
                    {
                        i++;
                        continue;
                    }
                }
                // some character that needs to be escaped is found:
                break;
            }

            while (true)
            {
                if (startPos < i)
                {
                    _textWriter.Write(text.Slice(startPos, i - startPos));
                }

                if (i == len)
                {
                    break;
                }

                switch (ch)
                {
                    case (char)0x9:
                        _textWriter.Write(ch);
                        break;
                    case (char)0xA:
                    case (char)0xD:
                        if (_inAttribute)
                        {
                            WriteCharEntityImpl(ch);
                        }
                        else
                        {
                            _textWriter.Write(ch);
                        }
                        break;
                    case '<':
                        WriteEntityRefImpl("lt");
                        break;
                    case '>':
                        WriteEntityRefImpl("gt");
                        break;
                    case '&':
                        WriteEntityRefImpl("amp");
                        break;
                    case '\'':
                        if (_inAttribute && _quoteChar == ch)
                        {
                            WriteEntityRefImpl("apos");
                        }
                        else
                        {
                            _textWriter.Write('\'');
                        }
                        break;
                    case '"':
                        if (_inAttribute && _quoteChar == ch)
                        {
                            WriteEntityRefImpl("quot");
                        }
                        else
                        {
                            _textWriter.Write('"');
                        }
                        break;
                    default:
                        if (XmlCharType.IsHighSurrogate(ch))
                        {
                            if (i + 1 < len)
                            {
                                WriteSurrogateChar(text[++i], ch);
                            }
                            else
                            {
                                throw XmlConvert.CreateInvalidSurrogatePairException(text[i], ch);
                            }
                        }
                        else if (XmlCharType.IsLowSurrogate(ch))
                        {
                            throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
                        }
                        else
                        {
                            Debug.Assert((ch < 0x20 && !XmlCharType.IsWhiteSpace(ch)) || (ch > 0xFFFD));
                            WriteCharEntityImpl(ch);
                        }
                        break;
                }
                i++;
                startPos = i;
                while (i < len && XmlCharType.IsAttributeValueChar(ch = text[i]))
                {
                    i++;
                }
            }
        }

        internal void WriteRawWithSurrogateChecking(string text)
        {
            if (text == null)
            {
                return;
            }

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(text);
            }

            int len = text.Length;
            int i = 0;
            char ch = (char)0;

            while (true)
            {
                while (i < len && (XmlCharType.IsCharData((ch = text[i])) || ch < 0x20))
                {
                    i++;
                }
                if (i == len)
                {
                    break;
                }
                if (XmlCharType.IsHighSurrogate(ch))
                {
                    if (i + 1 < len)
                    {
                        char lowChar = text[i + 1];
                        if (XmlCharType.IsLowSurrogate(lowChar))
                        {
                            i += 2;
                            continue;
                        }
                        else
                        {
                            throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, ch);
                        }
                    }
                    throw new ArgumentException(SR.Xml_InvalidSurrogateMissingLowChar);
                }
                else if (XmlCharType.IsLowSurrogate(ch))
                {
                    throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
                }
                else
                {
                    i++;
                }
            }

            _textWriter.Write(text);
            return;
        }

        internal void WriteRaw(char[] array, int offset, int count)
        {
            if (null == array)
            {
                throw new ArgumentNullException(nameof(array));
            }

            if (0 > count)
            {
                throw new ArgumentOutOfRangeException(nameof(count));
            }

            if (0 > offset)
            {
                throw new ArgumentOutOfRangeException(nameof(offset));
            }

            if (count > array.Length - offset)
            {
                throw new ArgumentOutOfRangeException(nameof(count));
            }

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(array, offset, count);
            }

            _textWriter.Write(array, offset, count);
        }

        internal void WriteCharEntity(char ch)
        {
            if (XmlCharType.IsSurrogate(ch))
            {
                throw new ArgumentException(SR.Xml_InvalidSurrogateMissingLowChar);
            }

            Span<char> span = stackalloc char[12];
            int charsWritten = WriteCharToSpan(span, ch);
            ReadOnlySpan<char> ros = span.Slice(0, charsWritten);

            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append(ros);
            }

            _textWriter.Write(ros);
        }

        internal void WriteEntityRef(string name)
        {
            if (_cacheAttrValue)
            {
                Debug.Assert(_attrValue != null);
                _attrValue.Append('&');
                _attrValue.Append(name);
                _attrValue.Append(';');
            }

            WriteEntityRefImpl(name);
        }

        //
        // Private implementation methods
        //

        private void WriteCharEntityImpl(char ch)
        {
            Span<char> span = stackalloc char[12];
            int charsWritten = WriteCharToSpan(span, ch);
            _textWriter.Write(span.Slice(0, charsWritten));
        }

        private static int WriteCharToSpan(Span<char> destination, int ch)
        {
            Debug.Assert(destination.Length >= 12);
            destination[0] = '&';
            destination[1] = '#';
            destination[2] = 'x';
            ((uint)ch).TryFormat(destination.Slice(3), out int charsWritten, "X");
            Debug.Assert(charsWritten != 0);
            destination[charsWritten + 3] = ';';
            return charsWritten;
        }

        private void WriteEntityRefImpl(string name)
        {
            _textWriter.Write('&');
            _textWriter.Write(name);
            _textWriter.Write(';');
        }
    }
}

Method Mean Error StdDev Allocated
WriteEntity 128.6 ms 1.73 ms 1.61 ms 722 B
Benchmark log

// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 1 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet restore  /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\b57f3715-7688-41f3-b00c-133f2c903acd
// command took 0,93s and exited with 0
// start dotnet build -c Release  --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\b57f3715-7688-41f3-b00c-133f2c903acd
// command took 11,59s and exited with 1
// start dotnet build -c Release  --no-restore --no-dependencies /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\b57f3715-7688-41f3-b00c-133f2c903acd
// command took 1,92s and exited with 0
// start dotnet publish -c Release  --no-build --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\b57f3715-7688-41f3-b00c-133f2c903acd
// command took 0,68s and exited with 0
// ***** Done, took 00:00:15 (15.49 sec)   *****
// Found 1 benchmarks:
//   Program.WriteEntity: Job-HORVZT(Toolchain=CoreRun)

// **************************
// Benchmark: Program.WriteEntity: Job-HORVZT(Toolchain=CoreRun)
// *** Execute ***
// Launch: 1 / 1
// Execute: D:\GitHub\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\f5a3dc8c-278a-4e9a-8926-e12e05f3837d\CoreRun.exe "b57f3715-7688-41f3-b00c-133f2c903acd.dll" --benchmarkName "Program.WriteEntity" --job "Toolchain=CoreRun" --benchmarkId 0 in D:\GitRepository\ConsoleApp2\ConsoleApp2\bin\release\net6.0\b57f3715-7688-41f3-b00c-133f2c903acd\bin\Release\net6.0\publish
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET 7.0.0 (42.42.42.42424), X64 RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob

OverheadJitting  1: 1 op, 166400.00 ns, 166.4000 us/op
WorkloadJitting  1: 1 op, 212323400.00 ns, 212.3234 ms/op

WorkloadPilot    1: 2 op, 270237400.00 ns, 135.1187 ms/op
WorkloadPilot    2: 3 op, 408142000.00 ns, 136.0473 ms/op
WorkloadPilot    3: 4 op, 541811100.00 ns, 135.4528 ms/op

WorkloadWarmup   1: 4 op, 508142700.00 ns, 127.0357 ms/op
WorkloadWarmup   2: 4 op, 504768500.00 ns, 126.1921 ms/op
WorkloadWarmup   3: 4 op, 514149400.00 ns, 128.5374 ms/op
WorkloadWarmup   4: 4 op, 511812200.00 ns, 127.9531 ms/op
WorkloadWarmup   5: 4 op, 512746900.00 ns, 128.1867 ms/op
WorkloadWarmup   6: 4 op, 516877200.00 ns, 129.2193 ms/op
WorkloadWarmup   7: 4 op, 510244200.00 ns, 127.5611 ms/op

// BeforeActualRun
WorkloadActual   1: 4 op, 514199900.00 ns, 128.5500 ms/op
WorkloadActual   2: 4 op, 515452300.00 ns, 128.8631 ms/op
WorkloadActual   3: 4 op, 509163600.00 ns, 127.2909 ms/op
WorkloadActual   4: 4 op, 511126100.00 ns, 127.7815 ms/op
WorkloadActual   5: 4 op, 504025200.00 ns, 126.0063 ms/op
WorkloadActual   6: 4 op, 509890100.00 ns, 127.4725 ms/op
WorkloadActual   7: 4 op, 504829900.00 ns, 126.2075 ms/op
WorkloadActual   8: 4 op, 508241100.00 ns, 127.0603 ms/op
WorkloadActual   9: 4 op, 515878100.00 ns, 128.9695 ms/op
WorkloadActual  10: 4 op, 523312100.00 ns, 130.8280 ms/op
WorkloadActual  11: 4 op, 515320000.00 ns, 128.8300 ms/op
WorkloadActual  12: 4 op, 524708000.00 ns, 131.1770 ms/op
WorkloadActual  13: 4 op, 515095500.00 ns, 128.7739 ms/op
WorkloadActual  14: 4 op, 523419900.00 ns, 130.8550 ms/op
WorkloadActual  15: 4 op, 518535800.00 ns, 129.6339 ms/op

// AfterActualRun
WorkloadResult   1: 4 op, 514199900.00 ns, 128.5500 ms/op
WorkloadResult   2: 4 op, 515452300.00 ns, 128.8631 ms/op
WorkloadResult   3: 4 op, 509163600.00 ns, 127.2909 ms/op
WorkloadResult   4: 4 op, 511126100.00 ns, 127.7815 ms/op
WorkloadResult   5: 4 op, 504025200.00 ns, 126.0063 ms/op
WorkloadResult   6: 4 op, 509890100.00 ns, 127.4725 ms/op
WorkloadResult   7: 4 op, 504829900.00 ns, 126.2075 ms/op
WorkloadResult   8: 4 op, 508241100.00 ns, 127.0603 ms/op
WorkloadResult   9: 4 op, 515878100.00 ns, 128.9695 ms/op
WorkloadResult  10: 4 op, 523312100.00 ns, 130.8280 ms/op
WorkloadResult  11: 4 op, 515320000.00 ns, 128.8300 ms/op
WorkloadResult  12: 4 op, 524708000.00 ns, 131.1770 ms/op
WorkloadResult  13: 4 op, 515095500.00 ns, 128.7739 ms/op
WorkloadResult  14: 4 op, 523419900.00 ns, 130.8550 ms/op
WorkloadResult  15: 4 op, 518535800.00 ns, 129.6339 ms/op
GC:  0 0 0 2888 4
Threading:  0 0 4

// AfterAll
// Benchmark Process 23612 has exited with code 0.

Mean = 128.553 ms, StdErr = 0.417 ms (0.32%), N = 15, StdDev = 1.615 ms
Min = 126.006 ms, Q1 = 127.382 ms, Median = 128.774 ms, Q3 = 129.302 ms, Max = 131.177 ms
IQR = 1.920 ms, LowerFence = 124.502 ms, UpperFence = 132.182 ms
ConfidenceInterval = [126.827 ms; 130.279 ms] (CI 99.9%), Margin = 1.726 ms (1.34% of Mean)
Skewness = 0.11, Kurtosis = 1.82, MValue = 2

// ***** BenchmarkRunner: Finish  *****

// * Export *
  rch\results\Program-report.csv
  rch\results\Program-report-github.md
  rch\results\Program-report.html

// * Detailed results *
Program.WriteEntity: Job-HORVZT(Toolchain=CoreRun)
Runtime = .NET 7.0.0 (42.42.42.42424), X64 RyuJIT; GC = Concurrent Workstation
Mean = 128.553 ms, StdErr = 0.417 ms (0.32%), N = 15, StdDev = 1.615 ms
Min = 126.006 ms, Q1 = 127.382 ms, Median = 128.774 ms, Q3 = 129.302 ms, Max = 131.177 ms
IQR = 1.920 ms, LowerFence = 124.502 ms, UpperFence = 132.182 ms
ConfidenceInterval = [126.827 ms; 130.279 ms] (CI 99.9%), Margin = 1.726 ms (1.34% of Mean)
Skewness = 0.11, Kurtosis = 1.82, MValue = 2
-------------------- Histogram --------------------
[125.147 ms ; 131.710 ms) | @@@@@@@@@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-10700K CPU 3.80GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  Job-HORVZT : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

Toolchain=CoreRun

|      Method |     Mean |   Error |  StdDev | Allocated |
|------------ |---------:|--------:|--------:|----------:|
| WriteEntity | 128.6 ms | 1.73 ms | 1.61 ms |     722 B |

// * Legends *
  Mean      : Arithmetic mean of all measurements
  Error     : Half of 99.9% confidence interval
  StdDev    : Standard deviation of all measurements
  Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  1 ms      : 1 Millisecond (0.001 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:00:13 (13.76 sec), executed benchmarks: 1

Global total time: 00:00:29 (29.26 sec), executed benchmarks: 1
// * Artifacts cleanup *

@kronic
Copy link
Contributor Author

kronic commented Dec 10, 2021

@stephentoub
This method is also used
XmlTextWriter.WriteChars
XmlTextWriter.WriteSurrogateCharEntity
XmlTextWriter.WriteWhitespace
XmlTextWriter.WriteString
XmlTextWriter.WriteEndStartTag
and indirectly in many places

@kronic
Copy link
Contributor Author

kronic commented Jan 13, 2022

@stephentoub @krwq ping

@krwq
Copy link
Member

krwq commented Feb 11, 2022

@stephentoub can you please verify @kronic's method of running benchmark and see if this is what you expected?

@stephentoub
Copy link
Member

Thanks, but my opinion hasn't changed from the previous times I've shared it.

First, the code that I've repeatedly asked be used (which still isn't in the PR) using TryWrite is simpler and easier to prove is correct... case in point, there's currently a bug in the WriteCharToSpan in this PR:

private static int WriteCharToSpan(Span<char> destination, int ch)
{
Debug.Assert(destination.Length >= 12);
destination[0] = '&';
destination[1] = '#';
destination[2] = 'x';
((uint)ch).TryFormat(destination.Slice(3), out int charsWritten, "X");
Debug.Assert(charsWritten != 0);
destination[charsWritten + 3] = ';';
return charsWritten;
}

which also means the previously shared benchmark numbers are off.

Second, the fact that there is such a bug and CI passed on this PR highlights that there isn't sufficient testing of this code in place to warrant trying to optimize at the nanosecond level.

Third, this code isn't on hot enough paths to warrant such micro-optimization.

And even if it were, fourth, on my machine the difference between a corrected version of the code in this PR and the TryWrite version is currently only 9 nanoseconds on .NET 7 on my machine.

Finally, using the shared routine in TryWrite rather than open-coding it means any improvements we subsequently make to TryWrite (e.g. adding a special-case for 3-char literals and not just 2-char literals) will accrue here as well and shrink the gap even further.

If we want to make a change here (which I'm fine with), please follow my earlier requests to use the form span.TryWrite($"&#x{(uint)ch:X};", out int charsWritten) and also ensure these paths are covered by tests that would fail with the code currently in this PR. Otherwise, please close the PR.

Thank you.

@kronic kronic closed this Feb 11, 2022
@kronic
Copy link
Contributor Author

kronic commented Feb 11, 2022

@stephentoub I don't understand what is wrong here?

@stephentoub
Copy link
Member

I don't understand what is wrong here?

It's cutting off the end by returning charsWritten rather than charsWritten + 4.

@kronic
Copy link
Contributor Author

kronic commented Feb 11, 2022

@stephentoub thanks.

@kronic
Copy link
Contributor Author

kronic commented Feb 11, 2022

@stephentoub @krwq i'll work on that a bit later

@ghost ghost locked as resolved and limited conversation to collaborators Mar 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Xml community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants