-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Base64.IsValid #76020
Comments
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsBackground and motivationThe API Proposalnamespace System.Buffers.Text;
public static class Base64
{
+ public static bool IsValid(ReadOnlySpan<char> base64Text);
+ public static bool IsValid(ReadOnlySpan<byte> base64Text);
} API Usagestring base64Text = ...;
if (!Base64.IsValid(base64Text))
throw new InvalidConfigurationException(...); Alternative Designs
RisksNo response
|
namespace System.Buffers.Text;
public static class Base64
{
public static bool IsValid(ReadOnlySpan<char> base64Text);
public static bool IsValid(ReadOnlySpan<char> base64Text, out int decodedLength);
public static bool IsValid(ReadOnlySpan<byte> base64Text);
public static bool IsValid(ReadOnlySpan<byte> base64Text, out int decodedLength);
} |
Here are some observations from quick experimentation, as discussed during review. Both
|
That's an unfortunate discrepancy. The cleanest option from my perspective would be to update Base64.Decode* to allow whitespace, and then Base64.IsValid would similarly ignore them. @GrabYourPitchforks, what do you think we should do here? Base64.IsValid aside, it's strange to me that we have these two different base64-decoding routines that treat whitespace (in particular newlines) differently. |
These APIs could be implemented based on #68328 (comment) namespace System.Buffers.Text;
public static class Base64
{
public static bool IsValid(ReadOnlySpan<char> base64Text) => base64Text.IndexOfAnyExcept(/* base64 alphabet + allowed whitespace */) < 0;
public static bool IsValid(ReadOnlySpan<byte> base64Text) => => base64Text.IndexOfAnyExcept(/* base64 alphabet + allowed whitespace */) < 0;
public static bool IsValid(ReadOnlySpan<char> base64Text, out int decodedLength)
{
int indexOfFirstNonBase64 = base64Text.IndexOfAnyExcept(/* base64 alphabet + allowed whitespace */);
if (indexOfFirstNonBase64 < 0)
{
decodedLength = base64Text.Length; // probably wrong, see comment below
return true;
}
decodedLength = indexOfFirstNonBase64 & -4;
return false;
}
public static bool IsValid(ReadOnlySpan<byte> base64Text, out int decodedLength)
{
// the same
}
} Here I assume that |
That's the intent :-) There are a bunch of places the new api will be useful. |
We didn't talk about what it would be in the invalid case, but your suggestion is reasonable. I think the intent, however, was for it to be the required output buffer size for the decoded data rather than the input length; that's particularly relevant if whitespace is ignored. |
So for documentation that would mean: |
I actually think that's confusing. I'd be inclined to say the out is undefined/0 on false. @bartonjs, what did you intend for this? |
My intention is the required space for decoding. If IsValid is false, that makes no sense, so 0. |
So it seems like we have a few options here:
1/2/3 all seem like bad choices. I'd be ok with 4/5/6. |
We approved this, but we need to figure out which of these options we want to pursue: |
When this goes to review again, so consider dropping the overload with namespace System.Buffers.Text;
public static class Base64
{
public static bool IsValid(ReadOnlySpan<char> base64Text);
- public static bool IsValid(ReadOnlySpan<char> base64Text, out int decodedLength);
public static bool IsValid(ReadOnlySpan<byte> base64Text);
- public static bool IsValid(ReadOnlySpan<byte> base64Text, out int decodedLength);
}
If the exact decoced length is needed, then it would be better to offer a separate method for this: namespace System.Buffers.Text;
public static class Base64
{
+ public static int GetDecodedFromUtf8Length(int length);
} But this I'd leave to a potential separate proposal. |
For exact length, that's not the case if there's whitespace to be ignored. |
The overload emitting the decoded length is required to allow the version of this in PemEncoding.TryFind to unify with the public API; or, at least the functionality is required. |
I'm sure the argument for the non-whitespace implementation is perf: if no data ever needs to be skipped over then it's a very fast transform. While I, personally, think (4) (make Base64.Decode just support whitespace) is probably the right way to go, if there's a big kerfuffle about losing an optimized path then I can support (6). |
FWIW, @bartonjs, 4 is my preference as well. |
I'm fine with (4) as long as you include utf8 somewhere in the name. I'd prefer the method name because it's more visible, but the parameter name is also acceptable. The reason for this is that
If this list of allowed whitespace chars is ever expanded to encompass the full set of characters allowed by (Yes, expanding the list would be a breaking change for the |
I don't mind using a different name, but I'm confused by the "utf8" aspect of it. Where in |
Only in the |
So e.g. +public static bool IsValid(ReadOnlySpan<byte> utf8); // or base64Utf8Text
+public static bool IsValid(ReadOnlySpan<char> utf16); // or base64Text
// implementation updated to ignore same whitespace as Convert.FromBase64String
public static unsafe OperationStatus DecodeFromUtf8(ReadOnlySpan<byte> utf8, Span<byte> bytes, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true) ? |
No need to add the utf16 moniker to public static bool IsValid(ReadOnlySpan<byte> utf8); // or base64Utf8Text Sure, that seems fine. |
Just aiming for some measure of consistency. |
(6) is the one that aligns best with RFC4648 (especially section 3.1 and 3.3). |
namespace System.Buffers.Text;
public static class Base64
{
public static bool IsValid(ReadOnlySpan<char> base64Text);
public static bool IsValid(ReadOnlySpan<char> base64Text, out int decodedLength);
public static bool IsValid(ReadOnlySpan<byte> base64TextUtf8);
public static bool IsValid(ReadOnlySpan<byte> base64TextUtf8, out int decodedLength);
} |
Background and motivation
The
Base64
class provides efficient methods for encoding and decoding base64 data, but it doesn't provide any means for validating Base64-encoded data is properly encoded, at least not without having output memory into which to decode the resulting data. For scenarios that, for example, want to validate configuration data promptly and that might need the results of that configuration ever or until later, it's desirable to support an efficient means for validating Base64-encoded data without requiring the output memory, which then also enables the decoding/validation to be performed faster.API Proposal
API Usage
Alternative Designs
Risks
No response
The text was updated successfully, but these errors were encountered: