[API Proposal]: Base64.IsValid #76020

stephentoub · 2022-09-22T15:29:59Z

Background and motivation

The Base64 class provides efficient methods for encoding and decoding base64 data, but it doesn't provide any means for validating Base64-encoded data is properly encoded, at least not without having output memory into which to decode the resulting data. For scenarios that, for example, want to validate configuration data promptly and that might need the results of that configuration ever or until later, it's desirable to support an efficient means for validating Base64-encoded data without requiring the output memory, which then also enables the decoding/validation to be performed faster.

API Proposal

namespace System.Buffers.Text;

public static class Base64
{
+    public static bool IsValid(ReadOnlySpan<char> base64Text);
+    public static bool IsValid(ReadOnlySpan<byte> base64Text);
}

API Usage

string base64Text = ...;
if (!Base64.IsValid(base64Text))
    throw new InvalidConfigurationException(...);

Alternative Designs

We could expose such methods on the new Ascii class instead, since base64-encoded data is all ASCII. However, we already have two different places for Base64 functionality (Convert and Base64), and ideally we'd use the Base64 class as the central location moving forward for Base64-related functionality (we might even want to duplicate the Convert APIs for working with chars onto the Base64 type for discoverability, or potentially create streaming char-based versions to parallel the streaming byte-based versions already there).
The existing Encode/Decode methods call the source text parameter "utf8"; we could do the same here, but only for the byte-based overload, which is then inconsistent between the overloads.

Risks

No response

The text was updated successfully, but these errors were encountered:

ghost · 2022-09-22T15:30:10Z

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

The Base64 class provides efficient methods for encoding and decoding base64 data, but it doesn't provide any means for validating Base64-encoded data is properly encoded, at least not without having output memory into which to decode the resulting data. For scenarios that, for example, want to validate configuration data promptly and that might need the results of that configuration ever or until later, it's desirable to support an efficient means for validating Base64-encoded data without requiring the output memory, which then also enables the decoding/validation to be performed faster.

API Proposal

namespace System.Buffers.Text;

public static class Base64
{
+    public static bool IsValid(ReadOnlySpan<char> base64Text);
+    public static bool IsValid(ReadOnlySpan<byte> base64Text);
}

API Usage

string base64Text = ...;
if (!Base64.IsValid(base64Text))
    throw new InvalidConfigurationException(...);

Alternative Designs

We could expose such methods on the new Ascii class instead, since base64-encoded data is all ASCII. However, we already have two different places for Base64 functionality (Convert and Base64), and ideally we'd use the Base64 class as the central location moving forward for Base64-related functionality (we might even want to duplicate the Convert APIs for working with chars onto the Base64 type for discoverability, or potentially create streaming char-based versions to parallel the streaming byte-based versions already there).

Risks

No response

Author:	stephentoub
Assignees:	-
Labels:	`api-suggestion`, `area-System.Memory`
Milestone:	8.0.0

terrajobst · 2022-09-27T19:09:45Z

Video

Looks good as proposed
- We should consider answering the question whether Decode will work, rather than just checking whether it's syntactically valid Base64, which would be more useful.

namespace System.Buffers.Text;

public static class Base64
{
    public static bool IsValid(ReadOnlySpan<char> base64Text);
    public static bool IsValid(ReadOnlySpan<char> base64Text, out int decodedLength);
    public static bool IsValid(ReadOnlySpan<byte> base64Text);
    public static bool IsValid(ReadOnlySpan<byte> base64Text, out int decodedLength);
}

GrabYourPitchforks · 2022-09-27T22:25:06Z

Here are some observations from quick experimentation, as discussed during review.

Both Convert.FromBase64String and Base64.Decode handle non-canonical forms (RFC 4648, Sec. 3.5) equivalently. Padding bits are discarded without being checked to see if they're all-zero.

Convert.FromBase64String allows whitespace, while Base64.Decode forbids whitespace. (See RFC 4648, Sec. 3.1.)

stephentoub · 2022-09-28T21:55:22Z

Convert.FromBase64String allows whitespace, while Base64.Decode forbids whitespace.

That's an unfortunate discrepancy. The cleanest option from my perspective would be to update Base64.Decode* to allow whitespace, and then Base64.IsValid would similarly ignore them.

@GrabYourPitchforks, what do you think we should do here? Base64.IsValid aside, it's strange to me that we have these two different base64-decoding routines that treat whitespace (in particular newlines) differently.

gfoidl · 2022-09-29T10:59:27Z

These APIs could be implemented based on #68328 (comment)

namespace System.Buffers.Text;

public static class Base64
{
    public static bool IsValid(ReadOnlySpan<char> base64Text) => base64Text.IndexOfAnyExcept(/* base64 alphabet + allowed whitespace */) < 0;
    public static bool IsValid(ReadOnlySpan<byte> base64Text) => => base64Text.IndexOfAnyExcept(/* base64 alphabet + allowed whitespace */) < 0;

    public static bool IsValid(ReadOnlySpan<char> base64Text, out int decodedLength)
    {
        int indexOfFirstNonBase64 = base64Text.IndexOfAnyExcept(/* base64 alphabet + allowed whitespace */);

        if (indexOfFirstNonBase64 < 0)
        {
            decodedLength = base64Text.Length;    // probably wrong, see comment below
            return true;
        }

        decodedLength = indexOfFirstNonBase64 & -4;
        return false;
    }

    public static bool IsValid(ReadOnlySpan<byte> base64Text, out int decodedLength)
    {
        // the same
    }
}

Here I assume that decodedLength in the case of being not valid returns length that could actually be decoded (i.e. tha "valid part").

stephentoub · 2022-09-29T11:07:00Z

These APIs could be implemented based

That's the intent :-) There are a bunch of places the new api will be useful.

stephentoub · 2022-09-29T11:35:28Z

Here I assume that decodedLength in the case of being not valid returns length that could actually be decoded

We didn't talk about what it would be in the invalid case, but your suggestion is reasonable. I think the intent, however, was for it to be the required output buffer size for the decoded data rather than the input length; that's particularly relevant if whitespace is ignored.

gfoidl · 2022-09-29T11:44:16Z

was for it to be the required output buffer size for the decoded data

So for documentation that would mean:
IsValid = true -> required buffer size that can hold the decoded data
IsValid = false -> up until this length data can be decoded, then there's non base64 (useful for failure message, etc.)

stephentoub · 2022-09-29T11:51:06Z

I actually think that's confusing. I'd be inclined to say the out is undefined/0 on false. @bartonjs, what did you intend for this?

bartonjs · 2022-09-29T15:25:49Z

My intention is the required space for decoding. If IsValid is false, that makes no sense, so 0.

stephentoub · 2022-10-11T16:31:08Z

Convert.FromBase64String allows whitespace, while Base64.Decode forbids whitespace. (See RFC 4648, Sec. 3.1.)

So it seems like we have a few options here:

Base64.IsValid allows whitespace, matching Convert.FromBase64String but differing from Base64.Decode. That seems like a non-starter, as BAse64.IsValid would end up returning true for something Base64.Decode would fail on.
Base64.IsValid disallows whitespace, matching Base64.Decode but not matching Convert.FromBase64String. This also seems problematic, as a) it would fail to state as valid inputs Convert.FromBase64String would successfully handle, but more importantly b) Convert.ToBase64* accepts a Base64FormattingOptions that allows for the inserting of whitespace into encoded output; it'd be really strange for IsValid to return false for that.
Make Base64.IsValid differ in whitespace-accepting behavior between its char and byte-based overloads. This would be extremely confusing and inconsistent.
Update Base64.Decode to treat whitespace as valid, just as does Convert.FromBase64String, and then IsValid would allow whitespace. I don't know if this could manifest as an impactful enough breaking change so as to discard this option, e.g. if someone is relying on the existing whitespace-is-an-error behavior as part of some kind of input validation.
Add an additional parameter to Base64.IsValid that dictates whether whitespace should be accepted, e.g. enum Base64ParsingOptions { None = 0, IgnoreWhiteSpace = 1 } and public static bool IsValid(ReadOnlySpan<...> base64Text, Base64ParsingOptions options).
(5) + add overloads to Base64.Decode and Convert.FromBase64* that accept a Base64ParsingOptions.

1/2/3 all seem like bad choices.

I'd be ok with 4/5/6.

@GrabYourPitchforks? @bartonjs? @terrajobst?

stephentoub · 2022-10-28T01:07:33Z

We approved this, but we need to figure out which of these options we want to pursue:
#76020 (comment)
so I'm marking it as ready for review again.

gfoidl · 2022-10-28T08:19:55Z

When this goes to review again, so consider dropping the overload with decodedLength:

namespace System.Buffers.Text;

public static class Base64
{
    public static bool IsValid(ReadOnlySpan<char> base64Text);
-   public static bool IsValid(ReadOnlySpan<char> base64Text, out int decodedLength);
    public static bool IsValid(ReadOnlySpan<byte> base64Text);
-   public static bool IsValid(ReadOnlySpan<byte> base64Text, out int decodedLength);
}

the API is for checking if it's valid base64 according the name, so don't overload if with an orthogonal concern (getting the decoded length)
getting the decoded length is a simple operation -- for max length it's just $\frac{length}{4} \cdot 3$ (API already provided), and if exact length is needed check if elements { x[n-2], x[n-1] } are padding chars then subtract that count
so I don't assume someone will call this O(n) API to get the exact decoded length

If the exact decoced length is needed, then it would be better to offer a separate method for this:

namespace System.Buffers.Text;

public static class Base64
{
+    public static int GetDecodedFromUtf8Length(int length);
}

But this I'd leave to a potential separate proposal.

stephentoub · 2022-10-28T11:57:46Z

getting the decoded length is a simple operation

For exact length, that's not the case if there's whitespace to be ignored.

bartonjs · 2022-10-28T17:10:53Z

I don't assume someone will call this O(n) API to get the exact decoded length

The overload emitting the decoded length is required to allow the version of this in PemEncoding.TryFind to unify with the public API; or, at least the functionality is required.

bartonjs · 2022-10-28T17:17:22Z

So it seems like we have a few options here:

I'm sure the argument for the non-whitespace implementation is perf: if no data ever needs to be skipped over then it's a very fast transform. While I, personally, think (4) (make Base64.Decode just support whitespace) is probably the right way to go, if there's a big kerfuffle about losing an optimized path then I can support (6).

stephentoub · 2022-10-28T17:30:09Z

FWIW, @bartonjs, 4 is my preference as well.

GrabYourPitchforks · 2022-11-08T01:40:05Z

I'm fine with (4) as long as you include utf8 somewhere in the name. I'd prefer the method name because it's more visible, but the parameter name is also acceptable.

The reason for this is that Convert.FromBase64String currently understands just a handful of whitespace characters:

runtime/src/libraries/System.Private.CoreLib/src/System/Convert.cs

Line 2774 in 264d739

    
           private static bool IsSpace(this char c) => c == ' ' || c == '\t' || c == '\r' || c == '\n';

If this list of allowed whitespace chars is ever expanded to encompass the full set of characters allowed by char.IsWhiteSpace, then we're starting to discuss chars outside of the US-ASCII range, which means the caller has to know how the provided ROS<byte> is going to be interpreted. (Latin-1? UTF-8? Something else?) Putting the UTF-8 moniker somewhere in the API surface makes this obvious and avoids confusion.

(Yes, expanding the list would be a breaking change for the Convert class. But we've been entertaining the idea here and elsewhere of breaking changes to .NET, so it seems prudent to future-proof ourselves.)

stephentoub · 2022-11-08T01:42:41Z

I'm fine with (4) as long as you include utf8 somewhere in the name.

I don't mind using a different name, but I'm confused by the "utf8" aspect of it. Where in public static bool IsValid(ReadOnlySpan<char> base64Text); would you include "utf8"?

GrabYourPitchforks · 2022-11-08T01:45:49Z

Only in the ROS<byte> overloads. :)

stephentoub · 2022-11-08T01:49:39Z

So e.g.

+public static bool IsValid(ReadOnlySpan<byte> utf8); // or base64Utf8Text
+public static bool IsValid(ReadOnlySpan<char> utf16); // or base64Text

// implementation updated to ignore same whitespace as Convert.FromBase64String
public static unsafe OperationStatus DecodeFromUtf8(ReadOnlySpan<byte> utf8, Span<byte> bytes, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true)

?

GrabYourPitchforks · 2022-11-08T01:53:11Z

No need to add the utf16 moniker to ROS<char> inputs. The char data type implies UTF-16. (Well, UCS-2 or UTF-16, context-dependent, but academic distinction for this issue.)

public static bool IsValid(ReadOnlySpan<byte> utf8); // or base64Utf8Text

Sure, that seems fine.

stephentoub · 2022-11-08T01:54:10Z

No need to add the utf16 moniker to ROS inputs. The char data type implies UTF-16. (Well, UCS-2 or UTF-16, context-dependent, but academic distinction for this issue.)

Just aiming for some measure of consistency.

gfoidl · 2022-11-08T10:32:58Z

(6) is the one that aligns best with RFC4648 (especially section 3.1 and 3.3).
Allowing whitspace could be the default (parameter).

terrajobst · 2022-11-08T19:37:55Z

Video

We'd like to pursue option (4) from @stephentoub's suggestion:
- Change Base64 to allow the same whitespace characters as Convert.ToBase64
- We should output the sizes
- If this proves problematic, we can produce overloads/alternative methods that disallow whitespace
- Which happens to match the previously approved API shape

namespace System.Buffers.Text;

public static class Base64
{
    public static bool IsValid(ReadOnlySpan<char> base64Text);
    public static bool IsValid(ReadOnlySpan<char> base64Text, out int decodedLength);
    public static bool IsValid(ReadOnlySpan<byte> base64TextUtf8);
    public static bool IsValid(ReadOnlySpan<byte> base64TextUtf8, out int decodedLength);
}

stephentoub added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory labels Sep 22, 2022

stephentoub added this to the 8.0.0 milestone Sep 22, 2022

stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Sep 27, 2022

terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Sep 27, 2022

gfoidl mentioned this issue Oct 12, 2022

IndexOutOfRangeException on decoding base64 bytes with LF characters included (>= .NET 5 only) #76918

Closed

stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-approved API was approved in API review, it can be implemented labels Oct 28, 2022

terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Nov 8, 2022

tarekgh mentioned this issue Nov 22, 2022

Libraries work to support Cloud Native effort for .NET 8 #78518

Closed

61 tasks

MihaZupan mentioned this issue Nov 26, 2022

[API Proposal]: MemoryExtensions.Count / number of occurrences of a value in a span #59466

Closed

heathbm mentioned this issue Nov 29, 2022

Allow Base64Decoder to ignore space chars, add IsValid methods and tests #78951

Closed

heathbm mentioned this issue Dec 7, 2022

Allow Base64Decoder to ignore space chars, add IsValid methods and tests #79334

Closed

joperezr assigned stephentoub and unassigned stephentoub Jan 30, 2023

stephentoub added the in-pr There is an active PR which will close this issue when it is merged label Mar 3, 2023

danmoseley added the partner-impact This issue impacts a partner who needs to be kept updated label Mar 21, 2023

tarekgh removed the in-pr There is an active PR which will close this issue when it is merged label May 4, 2023

tarekgh assigned stephentoub May 4, 2023

stephentoub mentioned this issue May 8, 2023

Add Base64.IsValid and allow Base64.DecodeXx methods to skip whitespace #85938

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label May 8, 2023

stephentoub closed this as completed in #85938 May 9, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 9, 2023

ghost locked as resolved and limited conversation to collaborators Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: Base64.IsValid #76020

[API Proposal]: Base64.IsValid #76020

stephentoub commented Sep 22, 2022 •

edited

Loading

ghost commented Sep 22, 2022

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

terrajobst commented Sep 27, 2022 •

edited by dotnet-api-review bot

Loading

GrabYourPitchforks commented Sep 27, 2022

stephentoub commented Sep 28, 2022

gfoidl commented Sep 29, 2022 •

edited

Loading

stephentoub commented Sep 29, 2022

stephentoub commented Sep 29, 2022

gfoidl commented Sep 29, 2022

stephentoub commented Sep 29, 2022

bartonjs commented Sep 29, 2022

stephentoub commented Oct 11, 2022 •

edited

Loading

stephentoub commented Oct 28, 2022

gfoidl commented Oct 28, 2022

stephentoub commented Oct 28, 2022

bartonjs commented Oct 28, 2022

bartonjs commented Oct 28, 2022

stephentoub commented Oct 28, 2022

GrabYourPitchforks commented Nov 8, 2022

stephentoub commented Nov 8, 2022

GrabYourPitchforks commented Nov 8, 2022

stephentoub commented Nov 8, 2022 •

edited

Loading

GrabYourPitchforks commented Nov 8, 2022 •

edited

Loading

stephentoub commented Nov 8, 2022

gfoidl commented Nov 8, 2022

terrajobst commented Nov 8, 2022 •

edited by dotnet-api-review bot

Loading

[API Proposal]: Base64.IsValid #76020

[API Proposal]: Base64.IsValid #76020

Comments

stephentoub commented Sep 22, 2022 • edited Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

ghost commented Sep 22, 2022

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

terrajobst commented Sep 27, 2022 • edited by dotnet-api-review bot Loading

GrabYourPitchforks commented Sep 27, 2022

stephentoub commented Sep 28, 2022

gfoidl commented Sep 29, 2022 • edited Loading

stephentoub commented Sep 29, 2022

stephentoub commented Sep 29, 2022

gfoidl commented Sep 29, 2022

stephentoub commented Sep 29, 2022

bartonjs commented Sep 29, 2022

stephentoub commented Oct 11, 2022 • edited Loading

stephentoub commented Oct 28, 2022

gfoidl commented Oct 28, 2022

stephentoub commented Oct 28, 2022

bartonjs commented Oct 28, 2022

bartonjs commented Oct 28, 2022

stephentoub commented Oct 28, 2022

GrabYourPitchforks commented Nov 8, 2022

stephentoub commented Nov 8, 2022

GrabYourPitchforks commented Nov 8, 2022

stephentoub commented Nov 8, 2022 • edited Loading

GrabYourPitchforks commented Nov 8, 2022 • edited Loading

stephentoub commented Nov 8, 2022

gfoidl commented Nov 8, 2022

terrajobst commented Nov 8, 2022 • edited by dotnet-api-review bot Loading

stephentoub commented Sep 22, 2022 •

edited

Loading

terrajobst commented Sep 27, 2022 •

edited by dotnet-api-review bot

Loading

gfoidl commented Sep 29, 2022 •

edited

Loading

stephentoub commented Oct 11, 2022 •

edited

Loading

stephentoub commented Nov 8, 2022 •

edited

Loading

GrabYourPitchforks commented Nov 8, 2022 •

edited

Loading

terrajobst commented Nov 8, 2022 •

edited by dotnet-api-review bot

Loading