Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetHashCode() randomization will cause XML serialization to be inconsistent #34189

Closed
AmbachtIT opened this issue Mar 27, 2020 · 5 comments
Closed
Labels
area-Serialization untriaged New issue has not been triaged by the area owner

Comments

@AmbachtIT
Copy link

We commonly use the following pattern for regression tests of the reporting part of our application:

  • We create a report. The result is a complex object
  • We serialize the report, using XML
  • We compare the result against a known version.
  • If anything changed, the test will fail

It's a simple approach that works very well to detect regression. Comparing the files will usually give a clue of what's going on.

Recently, we have migrated our test assemblies to .net core and now we are running into an issue: Namespace declarations are serialized in a different order between test runs:

Run 1
<ImportSpecification xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

Run 2
<ImportSpecification xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

The namespaces are stored in the class XmlSerializerNamespaces, which internally uses a Dictionary<string, string> to store the namespaces. The order in which namespaces are enumerated depends on the result of the method GetHashCode(). In .net framework, the result of the GetHashCode() was consistent between runs. This behaviour has changed for .NET core as detailed in this issue.

In .NET framework, the behaviour for GetHashCode() can be changed using the App.config setting <UseRandomizedStringHashAlgorithm />. For .NET core, this is not possible. Is there a way to make serialization of namespaces stable between runs? This could be done by simply storing the order in which namespaces are added in the class XmlSerializerNamespaces. I'd be happy to provide a pull request.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-Serialization untriaged New issue has not been triaged by the area owner labels Mar 27, 2020
@Wraith2
Copy link
Contributor

Wraith2 commented Mar 27, 2020

I suggest changing the method of differencing to canonicalize the xml and then directly compare those outputs. The canonicalization transforms are in the System.Cryptography.Xml namespace because they're used in the digital signing process.

@Clockwork-Muse
Copy link
Contributor

Note that your current approach is fragile anyways, because the ordering of elements is not guaranteed to be based on anything. The iteration order is a side effect of multiple internal implementation details for the various manipulation methods (Add, Remove, etc), and should not have been relied upon in this fashion.

@AmbachtIT
Copy link
Author

I agree the approach it's fragile, because there are no guarantees of serialization order in the XmlSerializer. That being said, there are scenario's (like the one I described), where a stable serialization order is a desirable feature.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 28, 2020

Sure, but the nature of xml itself is that order isn't required for documents to be considered functionally equivalent. That's why canonicalization exists for it, to define a single layout for comparison.

What you're asking for is to revert changes made in core so that private implementation details are the same as netfx, that isn't sensible or something you should expect to happen. No promises were made about the order of serialization and while it may have been the same on previous versions of netfx it is not on core. Now you get to choose how you work around the invalid assumption of your test.

@danmoseley
Copy link
Member

I think this is resolved.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Serialization untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants