Base64.Decode: fixed latent bug for invalid input that is less than a block-size #79952

gfoidl · 2022-12-24T12:37:58Z

Repro:

ReadOnlySpan<byte> base64 = stackalloc byte[] { (byte)'A', (byte)'B', (byte)'C', (byte)'D' };
Span<byte> data = stackalloc byte[128];

base64 = base64[..3];

OperationStatus status = Base64.DecodeFromUtf8(base64, data, out int consumed, out int written);
Console.WriteLine($"status: {status}, consumed: {consumed}, written: {written}");

We fill base64 with four valid base64-bytes, then we slice it to only contain 3 bytes.
Thus decoding should result in InvalidData (which is does) and consumed, written should be both 0, but they are 4, 3 which is wrong, as it's read beyond the valid range.

See #79334 (comment) for an investigation of this 🐛.
As in the loop condition

runtime/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs

Lines 200 to 210 in dca7ee6

    
           while (src < srcMax) 
        
           { 
        
               int result = Decode(src, ref decodingMap); 
        
               if (result < 0) 
        
                   goto InvalidDataExit; 
        
               WriteThreeLowOrderBytes(dest, result); 
        
               src += 4; 
        
               dest += 3; 
        
           }

srcMax is more than int.MaxValue away from src and iff [src, src + 4) contains valid base64 encoded bytes, then it may consume a lot of data outside of valid ranges.

There was a test-hole, as the test BasicDecodingInvalidInputLength has a too big start for the range. Thus a new specific test for this case (input length < BlockSize) got added.

This 🐛 got introduced with dotnet/corefx#34529 (🙈), so it's there since .NET Core 3.1.
And as by accident I know the author of that PR quite well the uint-cast is placed there to avoid a movsxd.

The repro above is artificial and constructed to investigate #79334 (comment). In real-world usage it may or may not happen, that depends on the value read base64[3]. If this is by accident valid base64 byte, then the 🐛 manifests.

Even if InvalidData is reported correctly, the real problem is that it's read beyond the given / allowed range.
Since this bug exists for quite some time now and we don't have any bug-reports for this, I don't assume it's critical enough to backport that change.

ghost · 2022-12-24T12:38:09Z

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro:

ReadOnlySpan<byte> base64 = stackalloc byte[] { (byte)'A', (byte)'B', (byte)'C', (byte)'D' };
Span<byte> data = stackalloc byte[128];

base64 = base64[..3];

OperationStatus status = Base64.DecodeFromUtf8(base64, data, out int consumed, out int written);
Console.WriteLine($"status: {status}, consumed: {consumed}, written: {written}");

We fill base64 with four valid base64-bytes, then we slice it to only contain 3 bytes.
Thus decoding should result in InvalidData (which is does) and consumed, written should be both 0, but they are 4, 3 which is wrong, as it's read beyond the valid range.

See #79334 (comment) for an investigation of this 🐛.
As in the loop condition

runtime/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs

Lines 200 to 210 in dca7ee6

    
           while (src < srcMax) 
        
           { 
        
               int result = Decode(src, ref decodingMap); 
        
               if (result < 0) 
        
                   goto InvalidDataExit; 
        
               WriteThreeLowOrderBytes(dest, result); 
        
               src += 4; 
        
               dest += 3; 
        
           }

srcMax is more than int.MaxValue away from src and iff [src, src + 4) contains valid base64 encoded bytes, then it may consume a lot of data outside of valid ranges.

There was a test-hole, as the test BasicDecodingInvalidInputLength has a too big start for the range. Thus a new specific test for this case (input length < BlockSize) got added.

This 🐛 got introduced with dotnet/corefx#34529 (🙈), so it's there since .NET Core 3.1.
And as by accident I know the author of that PR quite well the uint-cast is placed there to avoid a movsxd.

The repro above is artificial and constructed to investigate #79334 (comment). In real-world usage it may or may not happen, that depends on the value read base64[3]. If this is by accident valid base64 byte, then the 🐛 manifests.

Even if InvalidData is reported correctly, the real problem is that it's read beyond the given / allowed range.
Since this bug exists for quite some time now and we don't have any bug-reports for this, I don't assume it's critical enough to backport that change.

Author:	gfoidl
Assignees:	-
Labels:	`area-System.Memory`, `community-contribution`
Milestone:	-

gfoidl · 2022-12-24T12:44:15Z

src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs

@@ -101,7 +101,7 @@ public static unsafe OperationStatus DecodeFromUtf8(ReadOnlySpan<byte> utf8, Spa
                }

                ref sbyte decodingMap = ref MemoryMarshal.GetReference(DecodingMap);
-                srcMax = srcBytes + (uint)maxSrcLength;
+                srcMax = srcBytes + maxSrcLength;


The uint-cast was there to avoid the movsxd (on x86), so removing the cast will introduce the movsxd and I expect that be cheaper than having a if to guard the loop.

_{PS: this is a reason why I dislike walking with pointers around (ptr++), and prefer index-based addressing (ptr[i] or ptr + offset) as it's clearer where to start and where to end.}

stephentoub · 2023-01-04T21:02:06Z

Thanks.

gfoidl added 2 commits December 24, 2022 13:08

Tests

9743543

Fix

4da5369

dotnet-issue-labeler bot added the area-System.Memory label Dec 24, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 24, 2022

gfoidl mentioned this pull request Dec 24, 2022

Allow Base64Decoder to ignore space chars, add IsValid methods and tests #79334

Closed

gfoidl commented Dec 24, 2022

View reviewed changes

This was referenced Dec 24, 2022

Precondition failure: File has not had execution verified #79439

Closed

emcc received SIGKILL #79874

Closed

Test failure Loader\\classloader\\DictionaryExpansion\\DictionaryExpansion\\DictionaryExpansion.cmd #75244

Closed

stephentoub approved these changes Jan 4, 2023

View reviewed changes

stephentoub closed this Jan 4, 2023

stephentoub reopened this Jan 4, 2023

stephentoub merged commit 8da03fa into dotnet:main Jan 4, 2023

gfoidl deleted the base64_bug_fix branch January 5, 2023 08:59

ghost locked as resolved and limited conversation to collaborators Feb 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base64.Decode: fixed latent bug for invalid input that is less than a block-size #79952

Base64.Decode: fixed latent bug for invalid input that is less than a block-size #79952

gfoidl commented Dec 24, 2022

ghost commented Dec 24, 2022

gfoidl Dec 24, 2022 •

edited

Loading

stephentoub commented Jan 4, 2023

	while (src < srcMax)
	{
	int result = Decode(src, ref decodingMap);

	if (result < 0)
	goto InvalidDataExit;

	WriteThreeLowOrderBytes(dest, result);
	src += 4;
	dest += 3;
	}

Base64.Decode: fixed latent bug for invalid input that is less than a block-size #79952

Base64.Decode: fixed latent bug for invalid input that is less than a block-size #79952

Conversation

gfoidl commented Dec 24, 2022

ghost commented Dec 24, 2022

gfoidl Dec 24, 2022 • edited Loading

Choose a reason for hiding this comment

stephentoub commented Jan 4, 2023

gfoidl Dec 24, 2022 •

edited

Loading