Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64.Decode: fixed latent bug for invalid input that is less than a block-size #79952

Merged
merged 2 commits into from
Jan 4, 2023

Conversation

gfoidl
Copy link
Member

@gfoidl gfoidl commented Dec 24, 2022

Repro:

ReadOnlySpan<byte> base64 = stackalloc byte[] { (byte)'A', (byte)'B', (byte)'C', (byte)'D' };
Span<byte> data = stackalloc byte[128];

base64 = base64[..3];

OperationStatus status = Base64.DecodeFromUtf8(base64, data, out int consumed, out int written);
Console.WriteLine($"status: {status}, consumed: {consumed}, written: {written}");

We fill base64 with four valid base64-bytes, then we slice it to only contain 3 bytes.
Thus decoding should result in InvalidData (which is does) and consumed, written should be both 0, but they are 4, 3 which is wrong, as it's read beyond the valid range.

See #79334 (comment) for an investigation of this 🐛.
As in the loop condition

while (src < srcMax)
{
int result = Decode(src, ref decodingMap);
if (result < 0)
goto InvalidDataExit;
WriteThreeLowOrderBytes(dest, result);
src += 4;
dest += 3;
}
srcMax is more than int.MaxValue away from src and iff [src, src + 4) contains valid base64 encoded bytes, then it may consume a lot of data outside of valid ranges.

There was a test-hole, as the test BasicDecodingInvalidInputLength has a too big start for the range. Thus a new specific test for this case (input length < BlockSize) got added.

This 🐛 got introduced with dotnet/corefx#34529 (🙈), so it's there since .NET Core 3.1.
And as by accident I know the author of that PR quite well the uint-cast is placed there to avoid a movsxd.


The repro above is artificial and constructed to investigate #79334 (comment). In real-world usage it may or may not happen, that depends on the value read base64[3]. If this is by accident valid base64 byte, then the 🐛 manifests.

Even if InvalidData is reported correctly, the real problem is that it's read beyond the given / allowed range.
Since this bug exists for quite some time now and we don't have any bug-reports for this, I don't assume it's critical enough to backport that change.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 24, 2022
@ghost
Copy link

ghost commented Dec 24, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro:

ReadOnlySpan<byte> base64 = stackalloc byte[] { (byte)'A', (byte)'B', (byte)'C', (byte)'D' };
Span<byte> data = stackalloc byte[128];

base64 = base64[..3];

OperationStatus status = Base64.DecodeFromUtf8(base64, data, out int consumed, out int written);
Console.WriteLine($"status: {status}, consumed: {consumed}, written: {written}");

We fill base64 with four valid base64-bytes, then we slice it to only contain 3 bytes.
Thus decoding should result in InvalidData (which is does) and consumed, written should be both 0, but they are 4, 3 which is wrong, as it's read beyond the valid range.

See #79334 (comment) for an investigation of this 🐛.
As in the loop condition

while (src < srcMax)
{
int result = Decode(src, ref decodingMap);
if (result < 0)
goto InvalidDataExit;
WriteThreeLowOrderBytes(dest, result);
src += 4;
dest += 3;
}
srcMax is more than int.MaxValue away from src and iff [src, src + 4) contains valid base64 encoded bytes, then it may consume a lot of data outside of valid ranges.

There was a test-hole, as the test BasicDecodingInvalidInputLength has a too big start for the range. Thus a new specific test for this case (input length < BlockSize) got added.

This 🐛 got introduced with dotnet/corefx#34529 (🙈), so it's there since .NET Core 3.1.
And as by accident I know the author of that PR quite well the uint-cast is placed there to avoid a movsxd.


The repro above is artificial and constructed to investigate #79334 (comment). In real-world usage it may or may not happen, that depends on the value read base64[3]. If this is by accident valid base64 byte, then the 🐛 manifests.

Even if InvalidData is reported correctly, the real problem is that it's read beyond the given / allowed range.
Since this bug exists for quite some time now and we don't have any bug-reports for this, I don't assume it's critical enough to backport that change.

Author: gfoidl
Assignees: -
Labels:

area-System.Memory, community-contribution

Milestone: -

@@ -101,7 +101,7 @@ public static unsafe OperationStatus DecodeFromUtf8(ReadOnlySpan<byte> utf8, Spa
}

ref sbyte decodingMap = ref MemoryMarshal.GetReference(DecodingMap);
srcMax = srcBytes + (uint)maxSrcLength;
srcMax = srcBytes + maxSrcLength;
Copy link
Member Author

@gfoidl gfoidl Dec 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uint-cast was there to avoid the movsxd (on x86), so removing the cast will introduce the movsxd and I expect that be cheaper than having a if to guard the loop.

PS: this is a reason why I dislike walking with pointers around (ptr++), and prefer index-based addressing (ptr[i] or ptr + offset) as it's clearer where to start and where to end.

@stephentoub stephentoub closed this Jan 4, 2023
@stephentoub stephentoub reopened this Jan 4, 2023
@stephentoub stephentoub merged commit 8da03fa into dotnet:main Jan 4, 2023
@stephentoub
Copy link
Member

Thanks.

@gfoidl gfoidl deleted the base64_bug_fix branch January 5, 2023 08:59
@ghost ghost locked as resolved and limited conversation to collaborators Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Memory community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants