-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove string allocation from XmlTextEncoder.WriteCharEntity #61774
Remove string allocation from XmlTextEncoder.WriteCharEntity #61774
Conversation
Tagging subscribers to this area: @dotnet/area-system-xml Issue Detailsnull
|
LGTM |
I've marked this PR as "needs more info" for now. We'd like to better understand the risk/reward for the series of XML changes. See #61773 (comment). Thanks! |
@kronic How's the performance of @stephentoub Do we really need to return the rented array in case of an exception? Does it outweight the cost of the try/finally? |
@drieseng I don't know what performance TextWriter.Write (ReadOnlySpan ) has. These are problems in the TextWriter code. |
@drieseng Why use try / finally if, when an exception occurs, the rented array will return a finalizer? |
@kronic I understand that, from your point of view, you're not touching TextWriter so its performance characterics are not your concern. But you are switching from using About the array renting: I'm not sure what the general policy is for rented arrays in case of an exception flow (that did not have a try/catch or try/finally when the renting was introduced). |
@drieseng as far as I can tell from the try / finally code is not used |
Benchmark
[MemoryDiagnoser]
public class StringWriterBenchmark
{
[Params('3', ')', '[')]
public char Ch { get; set; }
[Benchmark]
public void Span()
{
using StringWriter stringWriter = new();
Span<char> span = stackalloc char[8];
((int) Ch).TryFormat(span, out var charsWritten, "X", NumberFormatInfo.InvariantInfo);
stringWriter.Write(span.Slice(0, charsWritten));
}
[Benchmark]
public void String()
{
using StringWriter stringWriter = new();
stringWriter.Write(((int) Ch).ToString("X", NumberFormatInfo.InvariantInfo));
}
} |
@drieseng, what try/finally and rented array are you taking about? |
@stephentoub I'm talking about the base implementation in TextWriter here. @kronic You're not testing the |
There's a rental in |
@drieseng TextWritter abstract class. How to test it? |
We're not strict about returning in the case of exception, but I'd be surprised if it made a meaningful difference in this case to avoid the try/finally. The primary cost of a try/finally is preventing inlining, which this virtual is very unlikely to be when used via the XML APIs. On top of that, any TextWriter-derived type that cares about eeking out every last ounce of perf needs to override this method, which both StringWriter and StreamWriter do. |
A direct call to the ROS overload on what? It's the base implementation of Write(ROS)`; that can't call itself. |
Yeah, being abstract it isn't going to call that, it's a fallback. As you said any implementation that cares about perf will override the ROS. I don't see a problem. |
@krwq Until you merge, I want to conduct a performance test |
@krwq @stephentoub using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Xml;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[module: SkipLocalsInit]
[MemoryDiagnoser]
public class Program
{
private readonly char[] _chars = Enumerable.Range(char.MinValue, char.MaxValue).Select(x => (char) x)
.Where(x => char.IsLetterOrDigit(x) || char.IsPunctuation(x) || char.IsSeparator(x))
.ToArray();
private readonly XmlWriter _xw = new XmlTextWriter(Stream.Null, Encoding.UTF8);
public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
[Benchmark]
public void WriteEntity()
{
var writer = _xw;
for (var i = 0; i < 100; i++)
{
for (var j = 0; j < _chars.Length; j++)
{
writer.WriteCharEntity(_chars[j]);
}
}
}
} runtime
current pr
my
private static int WriteCharToSpan(Span<char> destination, int ch)
{
Debug.Assert(destination.Length >= 12);
destination[0] = '&';
destination[1] = '#';
destination[2] = 'x';
bool result = ((uint)ch).TryFormat(destination.Slice(3), out int charsWritten, "X");
Debug.Assert(result);
destination[charsWritten + 3] = ';';
return charsWritten;
} |
@kronic, can you clarify what "my" means (i.e. point to a specific commit or something) and also compare that alongside with @stephentoub's version? |
And current pr is not what I have in my commit. |
I will make two commits with different versions and run a performance test |
I have zero doubt that manually expanding what TryWrite is doing will be slightly faster; TryWrite needs to validate on each write that there's space remaining in the destination span. But it's also simpler, more maintainable, more understandable. Unless there's a compelling top-level scenario where those nanoseconds are proven to matter, I would like the version that doesn't allocate, is just as fast as the one that does allocate, and that's a one-line replacement. |
…er.WriteCharEntityImpl
@stephentoub Usage scenario. every few hours a snapshot of a database of 10-50 GB is taken and there is a need to save it in xml. |
And what percentage of those chars being written hit the code path where the character is written as an entity? What does this change do to the execution time of that? Do you have an approximation for what that operation is doing that can be benchmarkwd? I'm skeptical that the microbenchmark on WriteCharEntity here is representative of that whole operation. |
@stephentoub It is difficult to estimate, but columns in the database with the symbol type are quite common. |
When you profile saving that 10GB, what percentage of the time is spent inside WriteCharEntity? |
Benchmark
using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Xml;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[module: SkipLocalsInit]
[MemoryDiagnoser]
public class Program
{
private readonly char[] _chars = Enumerable.Range(char.MinValue, char.MaxValue).Select(x => (char) x)
.Where(x => char.IsLetterOrDigit(x) || char.IsPunctuation(x) || char.IsSeparator(x))
.ToArray();
private readonly XmlWriter _xw = new XmlTextWriter(Stream.Null, Encoding.UTF8);
public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
[Benchmark]
public void WriteEntity()
{
var writer = _xw;
for (var i = 0; i < 100; i++)
{
for (var j = 0; j < _chars.Length; j++)
{
writer.WriteCharEntity(_chars[j]);
}
}
}
} runtime
my version
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
using System.IO;
using System.Text;
using System.Diagnostics;
namespace System.Xml
{
// XmlTextEncoder
//
// This class does special handling of text content for XML. For example
// it will replace special characters with entities whenever necessary.
internal sealed class XmlTextEncoder
{
//
// Fields
//
// output text writer
private readonly TextWriter _textWriter;
// true when writing out the content of attribute value
private bool _inAttribute;
// quote char of the attribute (when inAttribute)
private char _quoteChar;
// caching of attribute value
private StringBuilder? _attrValue;
private bool _cacheAttrValue;
//
// Constructor
//
internal XmlTextEncoder(TextWriter textWriter)
{
_textWriter = textWriter;
_quoteChar = '"';
}
//
// Internal methods and properties
//
internal char QuoteChar
{
set
{
_quoteChar = value;
}
}
internal void StartAttribute(bool cacheAttrValue)
{
_inAttribute = true;
_cacheAttrValue = cacheAttrValue;
if (cacheAttrValue)
{
if (_attrValue == null)
{
_attrValue = new StringBuilder();
}
else
{
_attrValue.Length = 0;
}
}
}
internal void EndAttribute()
{
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Length = 0;
}
_inAttribute = false;
_cacheAttrValue = false;
}
internal string AttributeValue
{
get
{
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
return _attrValue.ToString();
}
else
{
return string.Empty;
}
}
}
internal void WriteSurrogateChar(char lowChar, char highChar)
{
if (!XmlCharType.IsLowSurrogate(lowChar) ||
!XmlCharType.IsHighSurrogate(highChar))
{
throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, highChar);
}
_textWriter.Write(highChar);
_textWriter.Write(lowChar);
}
internal void Write(char[] array, int offset, int count)
{
if (null == array)
{
throw new ArgumentNullException(nameof(array));
}
if (0 > offset)
{
throw new ArgumentOutOfRangeException(nameof(offset));
}
if (0 > count)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
if (count > array.Length - offset)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(array, offset, count);
}
int endPos = offset + count;
int i = offset;
char ch = (char)0;
while (true)
{
int startPos = i;
while (i < endPos && XmlCharType.IsAttributeValueChar(ch = array[i]))
{
i++;
}
if (startPos < i)
{
_textWriter.Write(array, startPos, i - startPos);
}
if (i == endPos)
{
break;
}
switch (ch)
{
case (char)0x9:
_textWriter.Write(ch);
break;
case (char)0xA:
case (char)0xD:
if (_inAttribute)
{
WriteCharEntityImpl(ch);
}
else
{
_textWriter.Write(ch);
}
break;
case '<':
WriteEntityRefImpl("lt");
break;
case '>':
WriteEntityRefImpl("gt");
break;
case '&':
WriteEntityRefImpl("amp");
break;
case '\'':
if (_inAttribute && _quoteChar == ch)
{
WriteEntityRefImpl("apos");
}
else
{
_textWriter.Write('\'');
}
break;
case '"':
if (_inAttribute && _quoteChar == ch)
{
WriteEntityRefImpl("quot");
}
else
{
_textWriter.Write('"');
}
break;
default:
if (XmlCharType.IsHighSurrogate(ch))
{
if (i + 1 < endPos)
{
WriteSurrogateChar(array[++i], ch);
}
else
{
throw new ArgumentException(SR.Xml_SurrogatePairSplit);
}
}
else if (XmlCharType.IsLowSurrogate(ch))
{
throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
}
else
{
Debug.Assert((ch < 0x20 && !XmlCharType.IsWhiteSpace(ch)) || (ch > 0xFFFD));
WriteCharEntityImpl(ch);
}
break;
}
i++;
}
}
internal void WriteSurrogateCharEntity(char lowChar, char highChar)
{
if (!XmlCharType.IsLowSurrogate(lowChar) || !XmlCharType.IsHighSurrogate(highChar))
{
throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, highChar);
}
int surrogateChar = XmlCharType.CombineSurrogateChar(lowChar, highChar);
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(highChar);
_attrValue.Append(lowChar);
}
Span<char> span = stackalloc char[12];
int charsWritten = WriteCharToSpan(span, surrogateChar);
_textWriter.Write(span.Slice(0, charsWritten));
}
internal void Write(ReadOnlySpan<char> text)
{
if (text.IsEmpty)
{
return;
}
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(text);
}
// scan through the string to see if there are any characters to be escaped
int len = text.Length;
int i = 0;
int startPos = 0;
char ch = (char)0;
while (true)
{
while (i < len && XmlCharType.IsAttributeValueChar(ch = text[i]))
{
i++;
}
if (i == len)
{
// reached the end of the string -> write it whole out
_textWriter.Write(text);
return;
}
if (_inAttribute)
{
if (ch == 0x9)
{
i++;
continue;
}
}
else
{
if (ch == 0x9 || ch == 0xA || ch == 0xD || ch == '"' || ch == '\'')
{
i++;
continue;
}
}
// some character that needs to be escaped is found:
break;
}
while (true)
{
if (startPos < i)
{
_textWriter.Write(text.Slice(startPos, i - startPos));
}
if (i == len)
{
break;
}
switch (ch)
{
case (char)0x9:
_textWriter.Write(ch);
break;
case (char)0xA:
case (char)0xD:
if (_inAttribute)
{
WriteCharEntityImpl(ch);
}
else
{
_textWriter.Write(ch);
}
break;
case '<':
WriteEntityRefImpl("lt");
break;
case '>':
WriteEntityRefImpl("gt");
break;
case '&':
WriteEntityRefImpl("amp");
break;
case '\'':
if (_inAttribute && _quoteChar == ch)
{
WriteEntityRefImpl("apos");
}
else
{
_textWriter.Write('\'');
}
break;
case '"':
if (_inAttribute && _quoteChar == ch)
{
WriteEntityRefImpl("quot");
}
else
{
_textWriter.Write('"');
}
break;
default:
if (XmlCharType.IsHighSurrogate(ch))
{
if (i + 1 < len)
{
WriteSurrogateChar(text[++i], ch);
}
else
{
throw XmlConvert.CreateInvalidSurrogatePairException(text[i], ch);
}
}
else if (XmlCharType.IsLowSurrogate(ch))
{
throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
}
else
{
Debug.Assert((ch < 0x20 && !XmlCharType.IsWhiteSpace(ch)) || (ch > 0xFFFD));
WriteCharEntityImpl(ch);
}
break;
}
i++;
startPos = i;
while (i < len && XmlCharType.IsAttributeValueChar(ch = text[i]))
{
i++;
}
}
}
internal void WriteRawWithSurrogateChecking(string text)
{
if (text == null)
{
return;
}
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(text);
}
int len = text.Length;
int i = 0;
char ch = (char)0;
while (true)
{
while (i < len && (XmlCharType.IsCharData((ch = text[i])) || ch < 0x20))
{
i++;
}
if (i == len)
{
break;
}
if (XmlCharType.IsHighSurrogate(ch))
{
if (i + 1 < len)
{
char lowChar = text[i + 1];
if (XmlCharType.IsLowSurrogate(lowChar))
{
i += 2;
continue;
}
else
{
throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, ch);
}
}
throw new ArgumentException(SR.Xml_InvalidSurrogateMissingLowChar);
}
else if (XmlCharType.IsLowSurrogate(ch))
{
throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
}
else
{
i++;
}
}
_textWriter.Write(text);
return;
}
internal void WriteRaw(char[] array, int offset, int count)
{
if (null == array)
{
throw new ArgumentNullException(nameof(array));
}
if (0 > count)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
if (0 > offset)
{
throw new ArgumentOutOfRangeException(nameof(offset));
}
if (count > array.Length - offset)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(array, offset, count);
}
_textWriter.Write(array, offset, count);
}
internal void WriteCharEntity(char ch)
{
if (XmlCharType.IsSurrogate(ch))
{
throw new ArgumentException(SR.Xml_InvalidSurrogateMissingLowChar);
}
Span<char> span = stackalloc char[12];
int charsWritten = WriteCharToSpan(span, ch);
ReadOnlySpan<char> ros = span.Slice(0, charsWritten);
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(ros);
}
_textWriter.Write(ros);
}
internal void WriteEntityRef(string name)
{
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append('&');
_attrValue.Append(name);
_attrValue.Append(';');
}
WriteEntityRefImpl(name);
}
//
// Private implementation methods
//
private void WriteCharEntityImpl(char ch)
{
Span<char> span = stackalloc char[12];
int charsWritten = WriteCharToSpan(span, ch);
_textWriter.Write(span.Slice(0, charsWritten));
}
private static int WriteCharToSpan(Span<char> destination, int ch)
{
Debug.Assert(destination.Length >= 12);
destination[0] = '&';
destination[1] = '#';
destination[2] = 'x';
((uint)ch).TryFormat(destination.Slice(3), out int charsWritten, "X");
Debug.Assert(charsWritten != 0);
destination[charsWritten + 3] = ';';
return charsWritten;
}
private void WriteEntityRefImpl(string name)
{
_textWriter.Write('&');
_textWriter.Write(name);
_textWriter.Write(';');
}
}
}
|
@stephentoub I understand that the percentage is small, but it is. I think my version is not much harder to understand, but it will remove the allocation and make the code faster and makes it possible to reuse it, I don't see anything wrong with that. |
@stephentoub @krwq |
You and I get very different numbers. How exactly are you building System.Private.Xml.dll with these changes and then running the benchmarks? Regardless, I deterministically see the TryWrite version as being at least 10% faster than what's currently in main, not slower as your numbers suggest. This is exactly the kind of scenario TryWriter is geared towards, enabling devs to write the simple thing and let the system generate the optimal set of calls under the covers. So I want anything that should be using to use it rather than trying to work around any perceived current perf limitations by open-coding a replacement; then when TryWrite improves, so too does all the code using it... we improve the primitive implementations, and everyone benefits. The code is also simpler, very clearly expressing the intent of what should be written. If this were a super hot path, then yeah, it might be worth an exception to eek out a few more cycles of savings. But we're talking about an API that's already a corner-case, used fairly rarely, a very small percentage of time, and then within that we're talking about a small percentage of that small percentage of possible improvement in open-coding the solution vs just doing the simple thing with TryWrite. I would be very surprised if you were able to measure the impact of that difference on your 10-50GB workload; I'd welcome you trying it out and sharing the profiles that prove me wrong. In the meantime, please use TryWrite, at the call site rather than in a helper. Thanks. |
fist build
rebuild
run test
Benchmark
using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Xml;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[module: SkipLocalsInit]
[MemoryDiagnoser]
public class Program
{
private readonly char[] _chars = Enumerable.Range(char.MinValue, char.MaxValue).Select(x => (char) x)
.Where(x => char.IsLetterOrDigit(x) || char.IsPunctuation(x) || char.IsSeparator(x))
.ToArray();
private readonly XmlWriter _xw = new XmlTextWriter(Stream.Null, Encoding.UTF8);
public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
[Benchmark]
public void WriteEntity()
{
var writer = _xw;
for (var i = 0; i < 100; i++)
{
for (var j = 0; j < _chars.Length; j++)
{
writer.WriteCharEntity(_chars[j]);
}
}
}
} runtime
Benchmark log
Benchmark log
my version
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
using System.IO;
using System.Text;
using System.Diagnostics;
namespace System.Xml
{
// XmlTextEncoder
//
// This class does special handling of text content for XML. For example
// it will replace special characters with entities whenever necessary.
internal sealed class XmlTextEncoder
{
//
// Fields
//
// output text writer
private readonly TextWriter _textWriter;
// true when writing out the content of attribute value
private bool _inAttribute;
// quote char of the attribute (when inAttribute)
private char _quoteChar;
// caching of attribute value
private StringBuilder? _attrValue;
private bool _cacheAttrValue;
//
// Constructor
//
internal XmlTextEncoder(TextWriter textWriter)
{
_textWriter = textWriter;
_quoteChar = '"';
}
//
// Internal methods and properties
//
internal char QuoteChar
{
set
{
_quoteChar = value;
}
}
internal void StartAttribute(bool cacheAttrValue)
{
_inAttribute = true;
_cacheAttrValue = cacheAttrValue;
if (cacheAttrValue)
{
if (_attrValue == null)
{
_attrValue = new StringBuilder();
}
else
{
_attrValue.Length = 0;
}
}
}
internal void EndAttribute()
{
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Length = 0;
}
_inAttribute = false;
_cacheAttrValue = false;
}
internal string AttributeValue
{
get
{
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
return _attrValue.ToString();
}
else
{
return string.Empty;
}
}
}
internal void WriteSurrogateChar(char lowChar, char highChar)
{
if (!XmlCharType.IsLowSurrogate(lowChar) ||
!XmlCharType.IsHighSurrogate(highChar))
{
throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, highChar);
}
_textWriter.Write(highChar);
_textWriter.Write(lowChar);
}
internal void Write(char[] array, int offset, int count)
{
if (null == array)
{
throw new ArgumentNullException(nameof(array));
}
if (0 > offset)
{
throw new ArgumentOutOfRangeException(nameof(offset));
}
if (0 > count)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
if (count > array.Length - offset)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(array, offset, count);
}
int endPos = offset + count;
int i = offset;
char ch = (char)0;
while (true)
{
int startPos = i;
while (i < endPos && XmlCharType.IsAttributeValueChar(ch = array[i]))
{
i++;
}
if (startPos < i)
{
_textWriter.Write(array, startPos, i - startPos);
}
if (i == endPos)
{
break;
}
switch (ch)
{
case (char)0x9:
_textWriter.Write(ch);
break;
case (char)0xA:
case (char)0xD:
if (_inAttribute)
{
WriteCharEntityImpl(ch);
}
else
{
_textWriter.Write(ch);
}
break;
case '<':
WriteEntityRefImpl("lt");
break;
case '>':
WriteEntityRefImpl("gt");
break;
case '&':
WriteEntityRefImpl("amp");
break;
case '\'':
if (_inAttribute && _quoteChar == ch)
{
WriteEntityRefImpl("apos");
}
else
{
_textWriter.Write('\'');
}
break;
case '"':
if (_inAttribute && _quoteChar == ch)
{
WriteEntityRefImpl("quot");
}
else
{
_textWriter.Write('"');
}
break;
default:
if (XmlCharType.IsHighSurrogate(ch))
{
if (i + 1 < endPos)
{
WriteSurrogateChar(array[++i], ch);
}
else
{
throw new ArgumentException(SR.Xml_SurrogatePairSplit);
}
}
else if (XmlCharType.IsLowSurrogate(ch))
{
throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
}
else
{
Debug.Assert((ch < 0x20 && !XmlCharType.IsWhiteSpace(ch)) || (ch > 0xFFFD));
WriteCharEntityImpl(ch);
}
break;
}
i++;
}
}
internal void WriteSurrogateCharEntity(char lowChar, char highChar)
{
if (!XmlCharType.IsLowSurrogate(lowChar) || !XmlCharType.IsHighSurrogate(highChar))
{
throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, highChar);
}
int surrogateChar = XmlCharType.CombineSurrogateChar(lowChar, highChar);
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(highChar);
_attrValue.Append(lowChar);
}
Span<char> span = stackalloc char[12];
int charsWritten = WriteCharToSpan(span, surrogateChar);
_textWriter.Write(span.Slice(0, charsWritten));
}
internal void Write(ReadOnlySpan<char> text)
{
if (text.IsEmpty)
{
return;
}
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(text);
}
// scan through the string to see if there are any characters to be escaped
int len = text.Length;
int i = 0;
int startPos = 0;
char ch = (char)0;
while (true)
{
while (i < len && XmlCharType.IsAttributeValueChar(ch = text[i]))
{
i++;
}
if (i == len)
{
// reached the end of the string -> write it whole out
_textWriter.Write(text);
return;
}
if (_inAttribute)
{
if (ch == 0x9)
{
i++;
continue;
}
}
else
{
if (ch == 0x9 || ch == 0xA || ch == 0xD || ch == '"' || ch == '\'')
{
i++;
continue;
}
}
// some character that needs to be escaped is found:
break;
}
while (true)
{
if (startPos < i)
{
_textWriter.Write(text.Slice(startPos, i - startPos));
}
if (i == len)
{
break;
}
switch (ch)
{
case (char)0x9:
_textWriter.Write(ch);
break;
case (char)0xA:
case (char)0xD:
if (_inAttribute)
{
WriteCharEntityImpl(ch);
}
else
{
_textWriter.Write(ch);
}
break;
case '<':
WriteEntityRefImpl("lt");
break;
case '>':
WriteEntityRefImpl("gt");
break;
case '&':
WriteEntityRefImpl("amp");
break;
case '\'':
if (_inAttribute && _quoteChar == ch)
{
WriteEntityRefImpl("apos");
}
else
{
_textWriter.Write('\'');
}
break;
case '"':
if (_inAttribute && _quoteChar == ch)
{
WriteEntityRefImpl("quot");
}
else
{
_textWriter.Write('"');
}
break;
default:
if (XmlCharType.IsHighSurrogate(ch))
{
if (i + 1 < len)
{
WriteSurrogateChar(text[++i], ch);
}
else
{
throw XmlConvert.CreateInvalidSurrogatePairException(text[i], ch);
}
}
else if (XmlCharType.IsLowSurrogate(ch))
{
throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
}
else
{
Debug.Assert((ch < 0x20 && !XmlCharType.IsWhiteSpace(ch)) || (ch > 0xFFFD));
WriteCharEntityImpl(ch);
}
break;
}
i++;
startPos = i;
while (i < len && XmlCharType.IsAttributeValueChar(ch = text[i]))
{
i++;
}
}
}
internal void WriteRawWithSurrogateChecking(string text)
{
if (text == null)
{
return;
}
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(text);
}
int len = text.Length;
int i = 0;
char ch = (char)0;
while (true)
{
while (i < len && (XmlCharType.IsCharData((ch = text[i])) || ch < 0x20))
{
i++;
}
if (i == len)
{
break;
}
if (XmlCharType.IsHighSurrogate(ch))
{
if (i + 1 < len)
{
char lowChar = text[i + 1];
if (XmlCharType.IsLowSurrogate(lowChar))
{
i += 2;
continue;
}
else
{
throw XmlConvert.CreateInvalidSurrogatePairException(lowChar, ch);
}
}
throw new ArgumentException(SR.Xml_InvalidSurrogateMissingLowChar);
}
else if (XmlCharType.IsLowSurrogate(ch))
{
throw XmlConvert.CreateInvalidHighSurrogateCharException(ch);
}
else
{
i++;
}
}
_textWriter.Write(text);
return;
}
internal void WriteRaw(char[] array, int offset, int count)
{
if (null == array)
{
throw new ArgumentNullException(nameof(array));
}
if (0 > count)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
if (0 > offset)
{
throw new ArgumentOutOfRangeException(nameof(offset));
}
if (count > array.Length - offset)
{
throw new ArgumentOutOfRangeException(nameof(count));
}
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(array, offset, count);
}
_textWriter.Write(array, offset, count);
}
internal void WriteCharEntity(char ch)
{
if (XmlCharType.IsSurrogate(ch))
{
throw new ArgumentException(SR.Xml_InvalidSurrogateMissingLowChar);
}
Span<char> span = stackalloc char[12];
int charsWritten = WriteCharToSpan(span, ch);
ReadOnlySpan<char> ros = span.Slice(0, charsWritten);
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append(ros);
}
_textWriter.Write(ros);
}
internal void WriteEntityRef(string name)
{
if (_cacheAttrValue)
{
Debug.Assert(_attrValue != null);
_attrValue.Append('&');
_attrValue.Append(name);
_attrValue.Append(';');
}
WriteEntityRefImpl(name);
}
//
// Private implementation methods
//
private void WriteCharEntityImpl(char ch)
{
Span<char> span = stackalloc char[12];
int charsWritten = WriteCharToSpan(span, ch);
_textWriter.Write(span.Slice(0, charsWritten));
}
private static int WriteCharToSpan(Span<char> destination, int ch)
{
Debug.Assert(destination.Length >= 12);
destination[0] = '&';
destination[1] = '#';
destination[2] = 'x';
((uint)ch).TryFormat(destination.Slice(3), out int charsWritten, "X");
Debug.Assert(charsWritten != 0);
destination[charsWritten + 3] = ';';
return charsWritten;
}
private void WriteEntityRefImpl(string name)
{
_textWriter.Write('&');
_textWriter.Write(name);
_textWriter.Write(';');
}
}
}
Benchmark log
|
@stephentoub |
@stephentoub @krwq ping |
@stephentoub can you please verify @kronic's method of running benchmark and see if this is what you expected? |
Thanks, but my opinion hasn't changed from the previous times I've shared it. First, the code that I've repeatedly asked be used (which still isn't in the PR) using TryWrite is simpler and easier to prove is correct... case in point, there's currently a bug in the WriteCharToSpan in this PR: runtime/src/libraries/System.Private.Xml/src/System/Xml/Core/XmlTextEncoder.cs Lines 524 to 534 in 56610d2
which also means the previously shared benchmark numbers are off. Second, the fact that there is such a bug and CI passed on this PR highlights that there isn't sufficient testing of this code in place to warrant trying to optimize at the nanosecond level. Third, this code isn't on hot enough paths to warrant such micro-optimization. And even if it were, fourth, on my machine the difference between a corrected version of the code in this PR and the TryWrite version is currently only 9 nanoseconds on .NET 7 on my machine. Finally, using the shared routine in TryWrite rather than open-coding it means any improvements we subsequently make to TryWrite (e.g. adding a special-case for 3-char literals and not just 2-char literals) will accrue here as well and shrink the gap even further. If we want to make a change here (which I'm fine with), please follow my earlier requests to use the form Thank you. |
@stephentoub I don't understand what is wrong here? |
It's cutting off the end by returning |
@stephentoub thanks. |
@stephentoub @krwq i'll work on that a bit later |
No description provided.