-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NRBF] Don't use Unsafe.As when decoding DateTime(s) #105749
Conversation
@@ -77,10 +77,10 @@ internal SerializationRecord TryToMapToUserFriendly() | |||
} | |||
else if (MemberValues.Count == 2 | |||
&& HasMember("ticks") && HasMember("dateData") | |||
&& MemberValues[0] is long value && MemberValues[1] is ulong | |||
&& MemberValues[0] is long && MemberValues[1] is ulong dateData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are MemberValues[0] and MemberValues[1] the same bits just typed differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. "ticks" is "dataData" with ticks mask applied:
public long Ticks => (long)(_dateData & TicksMask); |
runtime/src/libraries/System.Private.CoreLib/src/System/DateTime.cs
Lines 1299 to 1301 in 0f05719
// Serialize both the old and the new format | |
info.AddValue(TicksField, Ticks); | |
info.AddValue(DateDataField, _dateData); |
#if NET | ||
[UnsafeAccessor(UnsafeAccessorKind.Constructor)] | ||
extern static DateTime CallPrivateSerializationConstructor(SerializationInfo si, StreamingContext ct); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a private constructor like this from a separate package is considered safe / supported? We have a bunch of types now that implement ISerializable but that either throw from their deserialization ctor or don't have one at all... DateTime will never be on the same plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. System.Formats.Nrbf
should not depend on any of the built-in legacy infrastructure for binary serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ctor is part of the ISerializable
protocol and it's supported by other serializers as well (example: DataContractSerializer
). AFAIK we have no plans to remove these ctors.
tagging @GrabYourPitchforks who suggested this solution in #102826 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK we have no plans to remove these ctors.
We've already made some of them throw PlatformNotSupportedException, e.g.
runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs
Lines 175 to 176 in 0f05719
protected Regex(SerializationInfo info, StreamingContext context) => | |
throw new PlatformNotSupportedException(); |
and entirely removed them from others, e.g.
public sealed class OperatingSystem : ISerializable, ICloneable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a public API that allows us to create the specific data time value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a public API that allows us to create the specific data time value?
Yes, but it would not solve the problem as this package needs to support older monikers, including netstandard2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both Unsafe.As and reflection are ok for existing targets. The existing targets are set in stone and we can make assumptions about them.
Unsafe.As or reflection are less than ideal for future. They limit the changes we can do in future.
Hi all! There seems to be some confusion here, so I'll try to clarify. For libraries which could execute on the .NET Framework runtime, we are not allowed to call APIs or access members which are undocumented. Using First, this requirement only applies to libraries compiled against netfx directly or against netstandard (if they can run on netfx). If you're compiled against any other TFM, go wild. :) Second, the serialization ctor is in fact "publicly" documented by virtue of the fact that the type implements the Third, if you do continue to use a struct overlay for platforms other than netfx, ensure that on these platforms it is legal and safe for |
@@ -77,10 +77,10 @@ internal SerializationRecord TryToMapToUserFriendly() | |||
} | |||
else if (MemberValues.Count == 2 | |||
&& HasMember("ticks") && HasMember("dateData") | |||
&& MemberValues[0] is long value && MemberValues[1] is ulong | |||
&& MemberValues[0] is long && MemberValues[1] is ulong dateData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to depend on the order of the fields in the payload? In other words, is the exact order of fields part of the BF contract for given type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SerializationInfo
doesn't officially document the order, but in practice it enumerates elements in the same order in which they're added, and some types are sensitive to this ordering. It's akin to how if a dictionary / hashtable changes the order of enumeration or if a sort routine changes the relative order of "equal" elements, things break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant it in the connection with your other comment: Is the order considered a documented .NET Framework detail that it is ok to depend on; or is the order undocumented .NET Framework detail and we should not depend on it?
My hunch is that it should be the later. The de-serializing constructor is explicitly coded to accept any order, or to accept one of the fields missing completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my mistake to rely on the order of fields, I am going to push a fix in a minute.
{ | ||
#pragma warning disable SYSLIB0050 // Type or member is obsolete | ||
SerializationInfo si = new(typeof(DateTime), new FormatterConverter()); | ||
si.AddValue("ticks", 0L); // legacy value (serialized as long) - specify both just to be safe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does specifying tick
makes us safe?
I think it can only hide problems and produce invalid values instead of throwing an exception. I cannot think about a case where it actually helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not speak on the behalf of @GrabYourPitchforks (who authored the code), but my understanding is that initially the SerializationInfo
for DateTime
contained only ticks
field. Later dateData
was introduced, but the runtime kept emitting the old field in case it could be deserialized with older runtime.
I've double checked the code:
runtime/src/libraries/System.Private.CoreLib/src/System/DateTime.cs
Lines 837 to 839 in eb655cf
case TicksField: | |
_dateData = (ulong)Convert.ToInt64(enumerator.Value, CultureInfo.InvariantCulture); | |
foundTicks = true; |
And you are right, in case this code were executed on a very old runtime, we would provide an invalid result. I've removed it and added comment.
- don't rely on the order of fields, as it's an implementation details that may change - don't specify "ticks" in the SerializationInfo
@@ -75,29 +75,30 @@ ulong value when TypeNameMatches(typeof(UIntPtr)) => Create(new UIntPtr(value)), | |||
_ => this | |||
}; | |||
} | |||
else if (HasMember("_ticks") && MemberValues[0] is long ticks && TypeNameMatches(typeof(TimeSpan))) | |||
else if (HasMember("_ticks") && GetRawValue("_ticks") is long ticks && TypeNameMatches(typeof(TimeSpan))) | |||
{ | |||
return Create(new TimeSpan(ticks)); | |||
} | |||
} | |||
else if (MemberValues.Count == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
This assumes that the BF format is set is stone and that new key/value pairs won't be added in future. Is it safe assumption to make?
-
If somebody constructs malicious payload with extra TimeSpan, DateTime or Guid fields or with fields of unexpected type, this pattern match won't kick in, there won't be any exception thrown and we return the raw data. Is it the desired behavior for Nrbf reader? (As far as I can tell, the reader tends to throw on anything unexpected or invalid instead of accepting it silently.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are great questions.
BinaryFormatter
can represent same primitive value using different record types based on the context.
In this case, when DateTime
is the root object it is expressed as SystemClassWithMembersAndTypesRecord
which is just a type name + key/value dictionary. In other cases, it can be represented as MemberPrimitiveTypedRecord<T>
(or a raw 8 bytes).
My goal was to hide this from the end users and always map it to PrimitiveTypeRecord<T>
so users don't need to become experts in this area.
This assumes that the BF format is set is stone and that new key/value pairs won't be added in future.
If we ever extend the binary representation of given types, we may need to handle the versioning here.
If somebody constructs malicious payload with extra TimeSpan, DateTime or Guid fields or with fields of unexpected type, this pattern match won't kick in, there won't be any exception thrown and we return the raw data. Is it the desired behavior for Nrbf reader?
It's allowed to create a type that is called System.DateTime
and has different layout, in such cases we are going to return a ClassRecord
and the users will need to handle it.
SerializationRecord rootObject = NrbfDecoder.Decode(payload);
if (rootObject is PrimitiveTypeRecord<DateTime> primitiveRecord)
{
// DateTime
}
else if (rootObject is ClassRecord classRecord)
{
// something else
}
@jkotas this is just the way I see it, please let me know if something is not clear or some other changes are needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal was to hide this from the end users and always map it to PrimitiveTypeRecord
You are not always mapping it to PrimitiveTypeRecord<T>
.
You are only mapping it to PrimitiveTypeRecord<T>
if the input has specific shape. You are not mapping it for all possible valid input shapes. For example, if the payload was produced by .NET Framework 1.x (I am sure there are a bunch of such payloads still alive in the wild), it will be missing dateData
field and it is not going to be mapped. However, the classic BF deserializer is going to handle it just fine. If somebody runs into this case, they will have to do double the work: They will need to handle both mapped and the non-mapped cases.
In general, I would expect the behavior to be either:
- 100% compatible with classic BF deserializer
- Exception to be thrown
* [NRBF] Don't use Unsafe.As when decoding DateTime(s) (#105749) * Add NrbfDecoder Fuzzer (#107385) * [NRBF] Fix bugs discovered by the fuzzer (#107368) * bug #1: don't allow for values out of the SerializationRecordType enum range * bug #2: throw SerializationException rather than KeyNotFoundException when the referenced record is missing or it points to a record of different type * bug #3: throw SerializationException rather than FormatException when it's being thrown by BinaryReader (or sth else that we use) * bug #4: document the fact that IOException can be thrown * bug #5: throw SerializationException rather than OverflowException when parsing the decimal fails * bug #6: 0 and 17 are illegal values for PrimitiveType enum * bug #7: throw SerializationException when a surrogate character is read (so far an ArgumentException was thrown) # Conflicts: # src/libraries/System.Formats.Nrbf/src/System/Formats/Nrbf/NrbfDecoder.cs * [NRBF] throw SerializationException when a surrogate character is read (#107532) (so far an ArgumentException was thrown) * [NRBF] Fuzzing non-seekable stream input (#107605) * [NRBF] More bug fixes (#107682) - Don't use `Debug.Fail` not followed by an exception (it may cause problems for apps deployed in Debug) - avoid Int32 overflow - throw for unexpected enum values just in case parsing has not rejected them - validate the number of chars read by BinaryReader.ReadChars - pass serialization record id to ex message - return false rather than throw EndOfStreamException when provided Stream has not enough data - don't restore the position in finally - limit max SZ and MD array length to Array.MaxLength, stop using LinkedList<T> as List<T> will be able to hold all elements now - remove internal enum values that were always illegal, but needed to be handled everywhere - Fix DebuggerDisplay * [NRBF] Comments and bug fixes from internal code review (#107735) * copy comments and asserts from Levis internal code review * apply Levis suggestion: don't store Array.MaxLength as a const, as it may change in the future * add missing and fix some of the existing comments * first bug fix: SerializationRecord.TypeNameMatches should throw ArgumentNullException for null Type argument * second bug fix: SerializationRecord.TypeNameMatches should know the difference between SZArray and single-dimension, non-zero offset arrays (example: int[] and int[*]) * third bug fix: don't cast bytes to booleans * fourth bug fix: don't cast bytes to DateTimes * add one test case that I've forgot in previous PR # Conflicts: # src/libraries/System.Formats.Nrbf/src/System/Formats/Nrbf/SerializationRecord.cs * [NRBF] Address issues discovered by Threat Model (#106629) * introduce ArrayRecord.FlattenedLength * do not include invalid Type or Assembly names in the exception messages, as it's most likely corrupted/tampered/malicious data and could be used as a vector of attack. * It is possible to have binary array records have an element type of array without being marked as jagged --------- Co-authored-by: Buyaa Namnan <bunamnan@microsoft.com>
fixes #102826