Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NRBF] Don't use Unsafe.As when decoding DateTime(s) #105749

Merged
merged 4 commits into from
Aug 21, 2024

Conversation

adamsitnik
Copy link
Member

fixes #102826

@@ -77,10 +77,10 @@ internal SerializationRecord TryToMapToUserFriendly()
}
else if (MemberValues.Count == 2
&& HasMember("ticks") && HasMember("dateData")
&& MemberValues[0] is long value && MemberValues[1] is ulong
&& MemberValues[0] is long && MemberValues[1] is ulong dateData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are MemberValues[0] and MemberValues[1] the same bits just typed differently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. "ticks" is "dataData" with ticks mask applied:

public long Ticks => (long)(_dateData & TicksMask);

// Serialize both the old and the new format
info.AddValue(TicksField, Ticks);
info.AddValue(DateDataField, _dateData);

#if NET
[UnsafeAccessor(UnsafeAccessorKind.Constructor)]
extern static DateTime CallPrivateSerializationConstructor(SerializationInfo si, StreamingContext ct);
#endif
Copy link
Member

@stephentoub stephentoub Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a private constructor like this from a separate package is considered safe / supported? We have a bunch of types now that implement ISerializable but that either throw from their deserialization ctor or don't have one at all... DateTime will never be on the same plan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. System.Formats.Nrbf should not depend on any of the built-in legacy infrastructure for binary serialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ctor is part of the ISerializable protocol and it's supported by other serializers as well (example: DataContractSerializer). AFAIK we have no plans to remove these ctors.

tagging @GrabYourPitchforks who suggested this solution in #102826 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we have no plans to remove these ctors.

We've already made some of them throw PlatformNotSupportedException, e.g.

protected Regex(SerializationInfo info, StreamingContext context) =>
throw new PlatformNotSupportedException();

and entirely removed them from others, e.g.
public sealed class OperatingSystem : ISerializable, ICloneable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a public API that allows us to create the specific data time value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a public API that allows us to create the specific data time value?

Yes, but it would not solve the problem as this package needs to support older monikers, including netstandard2.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both Unsafe.As and reflection are ok for existing targets. The existing targets are set in stone and we can make assumptions about them.

Unsafe.As or reflection are less than ideal for future. They limit the changes we can do in future.

@GrabYourPitchforks
Copy link
Member

Hi all! There seems to be some confusion here, so I'll try to clarify.

For libraries which could execute on the .NET Framework runtime, we are not allowed to call APIs or access members which are undocumented. Using Unsafe.As<...>(ref datetime) as a struct overlay violates this policy. We were previously alerted to a similar problem in the SqlClient package and resolved it. See dotnet/SqlClient#1647 (comment) for additional context. Much of what I'm going to say here is basically a copy of what I posted there.

First, this requirement only applies to libraries compiled against netfx directly or against netstandard (if they can run on netfx). If you're compiled against any other TFM, go wild. :)

Second, the serialization ctor is in fact "publicly" documented by virtue of the fact that the type implements the ISerializable interface. The contract of that interface is that the type is required to implement the necessary serialization ctor - often in a nonpublic manner - and callers can absolutely rely on it being present. When running on the netfx runtime, the code sample I provided is fully supported, fully reliable, and meets all compliance requirements we're bound by.

Third, if you do continue to use a struct overlay for platforms other than netfx, ensure that on these platforms it is legal and safe for DateTime to be backed by any arbitrary bit pattern. We have been bitten in the past where overlaying atop types like decimal has led to security incidents because arbitrary untrusted bit patterns can violate an instance's internal invariants.

@@ -77,10 +77,10 @@ internal SerializationRecord TryToMapToUserFriendly()
}
else if (MemberValues.Count == 2
&& HasMember("ticks") && HasMember("dateData")
&& MemberValues[0] is long value && MemberValues[1] is ulong
&& MemberValues[0] is long && MemberValues[1] is ulong dateData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to depend on the order of the fields in the payload? In other words, is the exact order of fields part of the BF contract for given type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SerializationInfo doesn't officially document the order, but in practice it enumerates elements in the same order in which they're added, and some types are sensitive to this ordering. It's akin to how if a dictionary / hashtable changes the order of enumeration or if a sort routine changes the relative order of "equal" elements, things break.

Copy link
Member

@jkotas jkotas Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant it in the connection with your other comment: Is the order considered a documented .NET Framework detail that it is ok to depend on; or is the order undocumented .NET Framework detail and we should not depend on it?

My hunch is that it should be the later. The de-serializing constructor is explicitly coded to accept any order, or to accept one of the fields missing completely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my mistake to rely on the order of fields, I am going to push a fix in a minute.

{
#pragma warning disable SYSLIB0050 // Type or member is obsolete
SerializationInfo si = new(typeof(DateTime), new FormatterConverter());
si.AddValue("ticks", 0L); // legacy value (serialized as long) - specify both just to be safe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does specifying tick makes us safe?

I think it can only hide problems and produce invalid values instead of throwing an exception. I cannot think about a case where it actually helps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not speak on the behalf of @GrabYourPitchforks (who authored the code), but my understanding is that initially the SerializationInfo for DateTime contained only ticks field. Later dateData was introduced, but the runtime kept emitting the old field in case it could be deserialized with older runtime.

I've double checked the code:

https://github.com/microsoft/referencesource/blob/51cf7850defa8a17d815b4700b67116e3fa283c2/mscorlib/system/datetime.cs#L388-L389

case TicksField:
_dateData = (ulong)Convert.ToInt64(enumerator.Value, CultureInfo.InvariantCulture);
foundTicks = true;

And you are right, in case this code were executed on a very old runtime, we would provide an invalid result. I've removed it and added comment.

- don't rely on the order of fields, as it's an implementation details that may change
- don't specify "ticks" in the SerializationInfo
@adamsitnik adamsitnik requested a review from jkotas August 21, 2024 14:57
@@ -75,29 +75,30 @@ ulong value when TypeNameMatches(typeof(UIntPtr)) => Create(new UIntPtr(value)),
_ => this
};
}
else if (HasMember("_ticks") && MemberValues[0] is long ticks && TypeNameMatches(typeof(TimeSpan)))
else if (HasMember("_ticks") && GetRawValue("_ticks") is long ticks && TypeNameMatches(typeof(TimeSpan)))
{
return Create(new TimeSpan(ticks));
}
}
else if (MemberValues.Count == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This assumes that the BF format is set is stone and that new key/value pairs won't be added in future. Is it safe assumption to make?

  • If somebody constructs malicious payload with extra TimeSpan, DateTime or Guid fields or with fields of unexpected type, this pattern match won't kick in, there won't be any exception thrown and we return the raw data. Is it the desired behavior for Nrbf reader? (As far as I can tell, the reader tends to throw on anything unexpected or invalid instead of accepting it silently.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great questions.

BinaryFormatter can represent same primitive value using different record types based on the context.
In this case, when DateTime is the root object it is expressed as SystemClassWithMembersAndTypesRecord which is just a type name + key/value dictionary. In other cases, it can be represented as MemberPrimitiveTypedRecord<T> (or a raw 8 bytes).

My goal was to hide this from the end users and always map it to PrimitiveTypeRecord<T> so users don't need to become experts in this area.

This assumes that the BF format is set is stone and that new key/value pairs won't be added in future.

If we ever extend the binary representation of given types, we may need to handle the versioning here.

If somebody constructs malicious payload with extra TimeSpan, DateTime or Guid fields or with fields of unexpected type, this pattern match won't kick in, there won't be any exception thrown and we return the raw data. Is it the desired behavior for Nrbf reader?

It's allowed to create a type that is called System.DateTime and has different layout, in such cases we are going to return a ClassRecord and the users will need to handle it.

SerializationRecord rootObject = NrbfDecoder.Decode(payload);
if (rootObject is PrimitiveTypeRecord<DateTime> primitiveRecord)
{
    // DateTime
}
else if (rootObject is ClassRecord classRecord)
{
    // something else
}

@jkotas this is just the way I see it, please let me know if something is not clear or some other changes are needed

Copy link
Member

@jkotas jkotas Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal was to hide this from the end users and always map it to PrimitiveTypeRecord

You are not always mapping it to PrimitiveTypeRecord<T>.

You are only mapping it to PrimitiveTypeRecord<T> if the input has specific shape. You are not mapping it for all possible valid input shapes. For example, if the payload was produced by .NET Framework 1.x (I am sure there are a bunch of such payloads still alive in the wild), it will be missing dateData field and it is not going to be mapped. However, the classic BF deserializer is going to handle it just fine. If somebody runs into this case, they will have to do double the work: They will need to handle both mapped and the non-mapped cases.

In general, I would expect the behavior to be either:

  • 100% compatible with classic BF deserializer
  • Exception to be thrown

@adamsitnik adamsitnik merged commit 477de34 into dotnet:main Aug 21, 2024
81 of 84 checks passed
adamsitnik added a commit to adamsitnik/runtime that referenced this pull request Sep 13, 2024
carlossanlop pushed a commit that referenced this pull request Sep 17, 2024
* [NRBF] Don't use Unsafe.As when decoding DateTime(s) (#105749)

* Add NrbfDecoder Fuzzer (#107385)

* [NRBF] Fix bugs discovered by the fuzzer (#107368)

* bug #1: don't allow for values out of the SerializationRecordType enum range

* bug #2: throw SerializationException rather than KeyNotFoundException when the referenced record is missing or it points to a record of different type

* bug #3: throw SerializationException rather than FormatException when it's being thrown by BinaryReader (or sth else that we use)

* bug #4: document the fact that IOException can be thrown

* bug #5: throw SerializationException rather than OverflowException when parsing the decimal fails

* bug #6: 0 and 17 are illegal values for PrimitiveType enum

* bug #7: throw SerializationException when a surrogate character is read (so far an ArgumentException was thrown)
# Conflicts:
#	src/libraries/System.Formats.Nrbf/src/System/Formats/Nrbf/NrbfDecoder.cs

* [NRBF] throw SerializationException when a surrogate character is read (#107532)

 (so far an ArgumentException was thrown)

* [NRBF] Fuzzing non-seekable stream input (#107605)

* [NRBF] More bug fixes (#107682)

- Don't use `Debug.Fail` not followed by an exception (it may cause problems for apps deployed in Debug)
- avoid Int32 overflow
- throw for unexpected enum values just in case parsing has not rejected them
- validate the number of chars read by BinaryReader.ReadChars
- pass serialization record id to ex message
- return false rather than throw EndOfStreamException when provided Stream has not enough data
- don't restore the position in finally 
- limit max SZ and MD array length to Array.MaxLength, stop using LinkedList<T> as List<T> will be able to hold all elements now
- remove internal enum values that were always illegal, but needed to be handled everywhere
- Fix DebuggerDisplay

* [NRBF] Comments and bug fixes from internal code review (#107735)

* copy comments and asserts from Levis internal code review

* apply Levis suggestion: don't store Array.MaxLength as a const, as it may change in the future

* add missing and fix some of the existing comments

* first bug fix: SerializationRecord.TypeNameMatches should throw ArgumentNullException for null Type argument

* second bug fix: SerializationRecord.TypeNameMatches should know the difference between SZArray and single-dimension, non-zero offset arrays (example: int[] and int[*])

* third bug fix: don't cast bytes to booleans

* fourth bug fix: don't cast bytes to DateTimes

* add one test case that I've forgot in previous PR
# Conflicts:
#	src/libraries/System.Formats.Nrbf/src/System/Formats/Nrbf/SerializationRecord.cs

* [NRBF] Address issues discovered by Threat Model  (#106629)

* introduce ArrayRecord.FlattenedLength

* do not include invalid Type or Assembly names in the exception messages, as it's most likely corrupted/tampered/malicious data and could be used as a vector of attack.

* It is possible to have binary array records have an element type of array without being marked as jagged

---------

Co-authored-by: Buyaa Namnan <bunamnan@microsoft.com>
@github-actions github-actions bot locked and limited conversation to collaborators Sep 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How NRBF Payload Reader should read DateTimes?
4 participants