Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Stream.ReadAtLeast perf regression in DataContractSerializer #69879

Merged
merged 2 commits into from
May 28, 2022

Conversation

eerhardt
Copy link
Member

#69272 changed DCS to no longer call Stream.Read inside a loop, but instead call the new ReadAtLeast method. ReadAtLeast only takes a Span<byte>, and not a byte[]. This caused a regression because the internal encoding stream wrapper classes don't override Read(Span<byte>), so the base implementation is used. The base implementation is slower because it needs to rent a byte[] from the pool, and do 2 copies.

Overriding Read(Span<byte>) on the internal encoding stream wrapper classes allows ReadAtLeast to be just as fast.

Fix #69730

dotnet#69272 changed DCS to no longer call Stream.Read inside a loop, but instead call the new ReadAtLeast method. ReadAtLeast only takes a Span<byte>, and not a byte[]. This caused a regression because the internal encoding stream wrapper classes don't override Read(Span<byte>), so the base implementation is used. The base implementation is slower because it needs to rent a byte[] from the pool, and do 2 copies.

Overriding Read(Span<byte>) on the internal encoding stream wrapper classes allows ReadAtLeast to be just as fast.

Fix dotnet#69730
{
try
{
if (_byteCount == 0)
{
if (_encodingCode == SupportedEncoding.UTF8)
{
return _stream.Read(buffer, offset, count);
return _stream.Read(buffer);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that _stream is always a BufferedStream, which also overrides Read(Span<byte>), so we shouldn't have the same problem here.

{
try
{
if (_byteCount == 0)
{
if (_encodingCode == SupportedEncoding.UTF8)
return _stream.Read(buffer, offset, count);
return _stream.Read(buffer);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, EncodingStreamWrapper isn't guaranteed to be a BufferedStream - I'm not sure why these two encoding wrappers differ. The _stream here is the Stream that was passed into the public API. So if the caller passes a Stream that doesn't override Read(Span<byte>), this could run into the same problem.

public override int Read(byte[] buffer, int offset, int count) =>
Read(new Span<byte>(buffer, offset, count));

public override int Read(Span<byte> buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should we override Write(span) on each of these streams as well?

Are there other streams in dotnet/runtime for which we should provide these overrides?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should we override Write(span) on each of these streams as well?

Maybe? It could prevent future regressions, like the one I made here.

Looking at where these streams are used for writing, they only get instantiated when the Encoding is not UTF8 (which is different than read - where they always are used). And looking through all the "Write" methods to these streams, they all pass an array, offset, and count. I don't think I should add it since they will never get used. Thoughts?

Are there other streams in dotnet/runtime for which we should provide these overrides?

I looked through all the places that override Read(byte[] ...) and looked to see if they override Read(Span) as well.

Here's are ones I think we should add the additional override to:

  • Passed to BinaryReader / XmlReader / other place I changed to call ReadAtLeast / ReadExactly:


public override int Read(byte[] buffer, int offset, int count)


  • These get returned through public methods, so they probably should:

public override int Read(byte[] buffer, int offset, int count)


public override int Read(byte[] buffer, int offset, int size)

Here are the ones I'm not sure we need to:

  • Only appears to be used by FtpControlStream, I can't tell if anyone ever calls Read with a Span. Looks easy to do though.

public override int Read(byte[] buffer, int offset, int size)

  • Has a lot of derived Stream classes, but I don't see them ever getting Read(Span) call on them. This also looks easy to do.

public override int Read(byte[] buffer, int offset, int count)

Here's the ones I don't think we need to:

  • Only gets BeginRead and EndRead called on it

public override int Read(byte[] buffer, int offset, int count)

  • Only used for writing

public override int Read(byte[] buffer, int offset, int count)

  • Only do "async" reading, so we can't do anything with a Span

public override int Read(byte[] buffer, int offset, int count)

public override int Read(byte[] buffer, int offset, int count)

public override int Read(byte[] buffer, int offset, int count)

public override int Read(byte[] array, int offset, int count)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the overrides for the first 4 locations above, and 2 of the next 3. The SqlBytes one is a little more involved, so I decided to open a separate issue for it - #69921.

@@ -424,6 +424,22 @@ public override int Read(byte[] buffer, int offset, int count)
return iBytesRead;
}

// Duplicate the Read(byte[]) logic here instead of refactoring both to use Spans
// in case the backing _stream doesn't override Read(Span).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most of our library code we've just assumed that any stream that cares about perf will end up overriding the relevant methods, and so for example wrapper streams like DeflateStream just use, say, ReadAsync(Memory) rather than having two code paths. But if we think it's worthwhile here to special-case it, maybe because we expect streams wrapped here to be more legacy, that's ok. Eventually it'd be nice to consolidate, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was my thinking - this stream can come from a public constructor parameter, and this type is rather old - so older code could be calling it.

I took a quick look, and the first place I found actually has this problem:

https://github.com/dotnet/SqlClient/blob/9a7d53ea1651b94783c07f37eaa39cd8f728c02e/src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlCachedBuffer.cs#L130-L134

        internal SqlXml ToSqlXml()
        {
            SqlXml sx = new(ToStream());
            return sx;
        }

https://github.com/dotnet/SqlClient/blob/9a7d53ea1651b94783c07f37eaa39cd8f728c02e/src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlCachedBuffer.cs#L99-L102

        internal Stream ToStream()
        {
            return new SqlCachedStream(this);
        }

and SqlCachedStream doesn't override the Span-based API.

I've opened dotnet/SqlClient#1628 for the Sql team to consider providing Span-based overrides.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regressions in MicroBenchmarks.Serializers
2 participants