Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting ZipArchiveEntry general-purpose flag bits #98278

Merged
merged 7 commits into from
May 9, 2024

Conversation

edwardneal
Copy link
Contributor

Fixes #88812.
Also contributes to dotnet/Open-XML-SDK#1443.

The .NET Framework previously allowed consumers to set ZipPackage's CompressionLevel to update bits 1 & 2 in the ZIP archive header's general purpose bitfield. This was lost in .NET Core, and the PR restores it.

Specifying the CompressionLevel of ZipArchive will now change these bits, and the ZipArchive's CompressionLevel will be read from the bitfield (defaulting to null/Optimum, as before.) It's a fairly small change, but I've added a test to verify that setting the compression level sets the bits, that they can be re-read correctly and that the file data can still be decompressed correctly.

Also changed mapping of ZipPackage's compression options, such that CompressionOption.Maximum now sets the compression level to SmallestSize, and SuperFast now sets the compression level to NoCompression.
Both of these changes restore compatibility with the .NET Framework.
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 11, 2024
@ghost
Copy link

ghost commented Feb 11, 2024

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #88812.
Also contributes to dotnet/Open-XML-SDK#1443.

The .NET Framework previously allowed consumers to set ZipPackage's CompressionLevel to update bits 1 & 2 in the ZIP archive header's general purpose bitfield. This was lost in .NET Core, and the PR restores it.

Specifying the CompressionLevel of ZipArchive will now change these bits, and the ZipArchive's CompressionLevel will be read from the bitfield (defaulting to null/Optimum, as before.) It's a fairly small change, but I've added a test to verify that setting the compression level sets the bits, that they can be re-read correctly and that the file data can still be decompressed correctly.

Author: edwardneal
Assignees: -
Labels:

area-System.IO.Compression

Milestone: -

This test was intended to ensure that bit 11 isn't set. It was actually performing a blind comparison of the entire bit field. Other tests in System.IO.Packaging function properly.
@ericstj
Copy link
Member

ericstj commented Mar 18, 2024

@carlossanlop - can you please review this?

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience, @edwardneal. I left some comments for you to consider. Also:

  • @ericstj initially suggested a different mapping of the Compression values. @edwardneal do you mind elaborating on your decision to change the values slightly? One of them looks like a breaking change (I pointed it out in the comments).
  • As mentioned in the issue description, these two bits are only relevant when using DEFLATE, and are undefined for any other compression methods. Please consider changing MapDeflateCompressionOption to only work when we're using that algorithm.

Thank you so much for offering this fix!

@@ -86,7 +86,7 @@ internal ZipArchiveEntry(ZipArchive archive, ZipCentralDirectoryFileHeader cd)

_fileComment = cd.FileComment;

_compressionLevel = null;
_compressionLevel = MapCompressionLevel(_generalPurposeBitFlag);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place I found reading the value of _compressionLevel that could potentially change behavior due to this change is inside the GetDataCompressor method (in this same file), in the default switch case, where if the value of _compressionLevel is null, we default to CompressionLevel.Optimal.

Can you please double check if we have a test that can reach the code inside the switch's default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better option here might be to eliminate the null check. The only time that _compressionLevel is going to be null is when creating a new ZipArchiveEntry. This sets _compressionLevel to null, then sets the bit flags to zero in MapDeflateCompressionOption. GetDataCompressor will then interpret the null as CompressionLevel.Optimal implicitly.

If the same ZIP file is loaded again though, MapCompressionLevel will map the zero to CompressionLevel.Optimal explicitly. I think it'd be clearer to make _compressionLevel non-nullable, set it to CompressionLevel.Optimal explicitly in the constructor and remove the coalesce inside GetDataCompressor.

Doing this means that there's no extra check in GetDataCompressor's default case for a test to reach. There's provably no behavioural change here to test because the current behaviour is effectively:

_compressionLevel = null;
// ...
compressorStream = new DeflateStream(/**/, _compressionLevel ?? CompressionLevel.Optimal, /**/);

while the new behaviour would be:

_compressionLevel = CompressionLevel.Optimal;
// ...
compressorStream = new DeflateStream(/**/, _compressionLevel, /**/);

@@ -393,7 +397,7 @@ internal static string GetOpcNameFromZipItemName(string zipItemName)
break;
case CompressionOption.SuperFast:
{
compressionLevel = CompressionLevel.Fastest;
compressionLevel = CompressionLevel.NoCompression;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a breaking change, but if it's more appropriate, we can take it.

@edwardneal @maedula is this the preferred/more correct CompressionLevel value when converting from XPS?

@edwardneal
Copy link
Contributor Author

Thanks @carlossanlop - I'll work through your comments over the next few days.

I changed the originally-suggested CompressionOption/CompressionLevel mapping to restore backwards compatibility with the .NET Framework. In the reference source, a CompressionOption is mapped to a DeflateOptionEnum, which directly represents the bits in the file (written out here). Currently, specifying a CompressionOption of Maximum via .NET Core will result in a CompressionLevel of Optimal and will not set any bits. In .NET Framework, it'll set bit 1. Similarly, a CompressionOption of SuperFast will set bit 2 in .NET Core and bits 1 & 2 in .NET Framework. I've put a table in the runtime issue, but unfortunately there's a behavioural difference involved no matter what happens - either between .NET Core and Framework, or between the previous version of Core and this one.

I'll make sure the value of bits 1 & 2 is zeroed as per the specification.

* Updated the conditional compilation directives for the .NET Framework/Core package CompressionLevel mappings.
* Specifying a CompressionMethod other than Deflate or Deflate64 will now set the compression level to NoCompression, and will write zeros to the relevant general purpose bits.
* The CompressionLevel always defaulted to Optimal for new archives, but this is now explicitly set (rather than relying upon setting it to null and null-coalescing it to Optimal.) This removes a condition to test for.
* Updated the test data for the CreateArchiveEntriesWithBitFlags test. If the compression level is set to NoCompression, we should expect the CompressionMethod to become Stored (which unsets the general purpose bits, returning an expected result of zero.)
@edwardneal edwardneal requested a review from carlossanlop March 22, 2024 23:57
Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick response!

unfortunately there's a behavioural difference involved no matter what happens - either between .NET Core and Framework, or between the previous version of Core and this one.

When you say behavioral difference, do you mean the compressed output is different? We don't really measure that, and we don't guarantee a specific output. What really trully matters is that:

  • The output is valid for DEFLATE compression.
  • The output can be roundtripped (compressed, then decompressed to the original value).
  • Other tools can read the zip.
  • Specifically for this issue, that the OpenXML standard is respected as expected.

If instead, by behavioral difference you mean the mapping is now different, then we could document this as a breaking change, which I think would be acceptable. In the end, what truly matters is what I listed above.

The changes look good to me, so I'm approving. Before this gets merged, two things that I'd like to double check:

  • @ericstj since you offered an initial suggestion for the compression level mappings, do you agree with the alternative mapping offered by @edwardneal here?
  • @edwardneal did you get a chance to confirm that OpenXML is now working as expected with the provided fix?

@edwardneal
Copy link
Contributor Author

Thanks @carlossanlop. When I use the DocumentFormat.OpenXml library to generate a spreadsheet, I can see the changes being reflected in the bits being set in the file, so I think it's now just a matter of the CompressionOption/CompressionLevel mapping. The movement of CompressionOption.NoCompression has changed my mind, and I'm happy to carry on with ericstj's original mapping.

The OPC spec mandates the mappings from CompressionOption to the general-purpose flags. The ZipArchive's CompressionLevel which sit in between is purely a matter of compatibility between .NET Core's CompressionLevel and Framework's DeflateOptionsEnum. This is a little difficult because the .NET Framework's DeflateOptionEnum has four options, while CompressionLevel has three (plus CompressionLevel.NoCompression.)

Bit2 Bit1 CompressionOption .NET Framework DeflateOptionEnum .NET Core CompressionLevel (proposed)
1 1 SuperFast SuperFast Fastest
1 0 Fast Fast Fastest
0 1 Maximum Maximum SmallestSize
0 0 Normal Normal Optimum

This matches ericstj's original mapping, so I think we now both agree on it. The only quirk here is that under some circumstances, a package won't round-trip perfectly. Somebody could add a ZipPackagePart with a CompressionOption of Fast and this would be persisted as a CompressionLevel of Fastest. When re-reading the package, the ZipArchiveEntry's CompressionLevel would instead be mapped to a CompressionOption of SuperFast.

A complete way to fix this would be to add a new CompressionLevel ("Faster"?) and mapping this to a different ZLib compression level (perhaps 4, since Fastest is 1 and Optimum is 6). I don't personally think it's needed for OpenXML's purpose though - although the compression metadata's interpreted differently, the uncompressed data should still be round-tripped correctly.

@ericstj ericstj added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Mar 25, 2024
@dotnet-policy-service dotnet-policy-service bot added the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Mar 25, 2024
@ericstj
Copy link
Member

ericstj commented Mar 25, 2024

The original issue was all about making a compliance test pass - even though the implementation did nothing with the value other than persist it. Whatever is chosen here should ensure it satisfies that test. If the mapping needs to change to satisfy the test then that's ultimately what this should do. I don't have a very strong opinion here other than that the mapping "makes sense".

I did mark this as a "breaking-change" because we are going to start setting some bits in the zip header we didn't set before and that could have some effects downstream so it makes sense to document it. If we do any other breaks to the mapping from Packaging -> Compression as part of this that should also be documented.

The only quirk here is that under some circumstances, a package won't round-trip perfectly. Somebody could add a ZipPackagePart with a CompressionOption of Fast and this would be persisted as a CompressionLevel of Fastest. When re-reading the package, the ZipArchiveEntry's CompressionLevel would instead be mapped to a CompressionOption of SuperFast.

When new entries are created the compression level is specified per-entry so we'd honor whatever someone specifies when creating the entry. If the entry already exists it looks to me like we just persist the value that's already there. Can we double check (with a test) that we persist these bits correctly in an update case?

* Updated mapping between general purpose bit fields and CompressionLevel.
* Updated mapping from CompressionOption to CompressionLevel.
* Added test to verify round-trip of CompressionOption and its setting of the general purpose bit fields.
@edwardneal
Copy link
Contributor Author

Thanks for the quick response. I understand your point around the breaking change, and I'll file the issue shortly.

When new entries are created the compression level is specified per-entry so we'd honor whatever someone specifies when creating the entry. If the entry already exists it looks to me like we just persist the value that's already there. Can we double check (with a test) that we persist these bits correctly in an update case?

You're right - updating the contents of a ZipArchiveEntry with bit 2 set and bit 1 unset, then saving the ZipArchive shouldn't change the contents of the bits. They'll be persisted as they were read, the compression level can't be changed post hoc. The round-trip I'm talking about is actually in Packaging:

var ms = new MemoryStream();
var pkg= Package.Open(ms, FileMode.OpenOrCreate);
var partUri = PackUriHelper.CreatePartUri(new Uri("/root.xml", UriKind.Relative));
var packagePart = newlyCreatedPackage.CreatePart(partUri, "application/xml", CompressionOption.Fast);

pkg.Flush();
pkg.Dispose();

ms.Seek(0, SeekOrigin.Begin);
pkg = Package.Open(ms);
packagePart = pkg.GetPart(partUri);

Debug.Assert(packagePart.CompressionOption == CompressionOption.Fast, "This assert will fail.");

My original point was that CompressionOption.Fast will be mapped to CompressionLevel.Fastest at the point of PackagePart creation, because there's no unique CompressionLevel which it can be mapped to. At the point that the Package is re-read, the ZipArchiveEntry's CompressionLevel of Fastest would be mapped to a CompressionOption of SuperFast - so the CompressionOption of the created ZipPackagePart wouldn't round-trip precisely.

I've checked while writing the extra test though, and realised that this isn't actually a new problem. At present, the .NET Core version of ZipPackage doesn't map the ZipArchiveEntry's compression level to a CompressionOption at all: ZipArchiveEntry's compression level isn't publicly exposed, so the Packaging API can't see it and just defaults the package part's CompressionOption to Normal. This isn't consistent with .NET Framework, but it'd require an API change to modify. I think it'd be better to keep this PR clean, then raise a separate issue which groups it with the additional CompressionLevel enum member.

@edwardneal
Copy link
Contributor Author

I've created the breaking change issue as dotnet/docs#40299.

@edwardneal
Copy link
Contributor Author

@dotnet/area-system-io-compression is this likely to be reviewed ready for merge prior to Preview 5?

@carlossanlop
Copy link
Member

My original point was that CompressionOption.Fast will be mapped to CompressionLevel.Fastest at the point of PackagePart creation, because there's no unique CompressionLevel which it can be mapped to. At the point that the Package is re-read, the ZipArchiveEntry's CompressionLevel of Fastest would be mapped to a CompressionOption of SuperFast - so the CompressionOption of the created ZipPackagePart wouldn't round-trip precisely.

I've checked while writing the extra test though, and realised that this isn't actually a new problem. At present, the .NET Core version of ZipPackage doesn't map the ZipArchiveEntry's compression level to a CompressionOption at all: ZipArchiveEntry's compression level isn't publicly exposed, so the Packaging API can't see it and just defaults the package part's CompressionOption to Normal. This isn't consistent with .NET Framework, but it'd require an API change to modify.

Thanks for checking this. I agree that since this is not a new problem, we can tackle it in a separate issue so we can decide what can be done. Would you be interested in opening the issue?

updating the contents of a ZipArchiveEntry with bit 2 set and bit 1 unset, then saving the ZipArchive shouldn't change the contents of the bits. They'll be persisted as they were read, the compression level can't be changed post hoc.

Thank you so much for opening the breaking change issue. As long as what you described is well documented there, we are good to go.

This is ready to merge.

@carlossanlop
Copy link
Member

carlossanlop commented May 9, 2024

Ah, I need to update the branch so the CI issues can get linked to known build issues, otherwise I'm blocked from merging. The results were too old so I was unable to get the failure information.

@edwardneal
Copy link
Contributor Author

That's great, thanks! I'll raise the issue about the PackagePart's CompressionOption in the next few days.

@carlossanlop carlossanlop merged commit 6cc6c66 into dotnet:main May 9, 2024
83 checks passed
@carlossanlop
Copy link
Member

carlossanlop commented May 9, 2024

Much appreciated, thanks.

@edwardneal remind me: to which released .NET version do you need this backported?

Edit: I found the info in the openxml issue. You need it in 6.0. We won't backport it to 7.0 as it's already EOL, but we can also backport it to 8.0.

@carlossanlop
Copy link
Member

/backport to release/6.0-staging

@carlossanlop
Copy link
Member

/backport to release/8.0-staging

Copy link
Contributor

github-actions bot commented May 9, 2024

Started backporting to release/6.0-staging: https://github.com/dotnet/runtime/actions/runs/9024932428

Copy link
Contributor

github-actions bot commented May 9, 2024

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/9024933570

Copy link
Contributor

github-actions bot commented May 9, 2024

@carlossanlop backporting to release/6.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Use CompressionLevel to set general-purpose bit flags
Using index info to reconstruct a base tree...
M	src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
M	src/libraries/System.IO.Compression/tests/ZipArchive/zip_CreateTests.cs
M	src/libraries/System.IO.Packaging/src/System/IO/Packaging/ZipPackage.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.IO.Packaging/src/System/IO/Packaging/ZipPackage.cs
Auto-merging src/libraries/System.IO.Compression/tests/ZipArchive/zip_CreateTests.cs
Auto-merging src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Applying: Made function verbs consistent
Using index info to reconstruct a base tree...
M	src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Applying: Added test to verify read file contents
Applying: Corrected failing Packaging test
Using index info to reconstruct a base tree...
A	src/libraries/System.IO.Packaging/tests/ReflectionTests.cs
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): src/libraries/System.IO.Packaging/tests/ReflectionTests.cs deleted in HEAD and modified in Corrected failing Packaging test. Version Corrected failing Packaging test of src/libraries/System.IO.Packaging/tests/ReflectionTests.cs left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0004 Corrected failing Packaging test
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Copy link
Contributor

github-actions bot commented May 9, 2024

@carlossanlop an error occurred while backporting to release/6.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
* Use CompressionLevel to set general-purpose bit flags

Also changed mapping of ZipPackage's compression options, such that CompressionOption.Maximum now sets the compression level to SmallestSize, and SuperFast now sets the compression level to NoCompression.
Both of these changes restore compatibility with the .NET Framework.

* Made function verbs consistent

* Added test to verify read file contents

* Corrected failing Packaging test

This test was intended to ensure that bit 11 isn't set. It was actually performing a blind comparison of the entire bit field. Other tests in System.IO.Packaging function properly.

* Changes following code review

* Updated the conditional compilation directives for the .NET Framework/Core package CompressionLevel mappings.
* Specifying a CompressionMethod other than Deflate or Deflate64 will now set the compression level to NoCompression, and will write zeros to the relevant general purpose bits.
* The CompressionLevel always defaulted to Optimal for new archives, but this is now explicitly set (rather than relying upon setting it to null and null-coalescing it to Optimal.) This removes a condition to test for.
* Updated the test data for the CreateArchiveEntriesWithBitFlags test. If the compression level is set to NoCompression, we should expect the CompressionMethod to become Stored (which unsets the general purpose bits, returning an expected result of zero.)

* Code review changes

* Updated mapping between general purpose bit fields and CompressionLevel.
* Updated mapping from CompressionOption to CompressionLevel.
* Added test to verify round-trip of CompressionOption and its setting of the general purpose bit fields.

---------

Co-authored-by: Carlos Sánchez López <1175054+carlossanlop@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators Jun 9, 2024
@edwardneal edwardneal deleted the issue-88812 branch August 31, 2024 16:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO.Compression breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. community-contribution Indicates that the PR has been added by a community member needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System.IO.Packaging.ZipArchive: No means to set the zip header bits 1 and 2
5 participants