-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow setting ZipArchiveEntry general-purpose flag bits #98278
Conversation
Also changed mapping of ZipPackage's compression options, such that CompressionOption.Maximum now sets the compression level to SmallestSize, and SuperFast now sets the compression level to NoCompression. Both of these changes restore compatibility with the .NET Framework.
Tagging subscribers to this area: @dotnet/area-system-io-compression Issue DetailsFixes #88812. The .NET Framework previously allowed consumers to set ZipPackage's CompressionLevel to update bits 1 & 2 in the ZIP archive header's general purpose bitfield. This was lost in .NET Core, and the PR restores it. Specifying the CompressionLevel of ZipArchive will now change these bits, and the ZipArchive's CompressionLevel will be read from the bitfield (defaulting to null/Optimum, as before.) It's a fairly small change, but I've added a test to verify that setting the compression level sets the bits, that they can be re-read correctly and that the file data can still be decompressed correctly.
|
This test was intended to ensure that bit 11 isn't set. It was actually performing a blind comparison of the entire bit field. Other tests in System.IO.Packaging function properly.
@carlossanlop - can you please review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience, @edwardneal. I left some comments for you to consider. Also:
- @ericstj initially suggested a different mapping of the Compression values. @edwardneal do you mind elaborating on your decision to change the values slightly? One of them looks like a breaking change (I pointed it out in the comments).
- As mentioned in the issue description, these two bits are only relevant when using DEFLATE, and are undefined for any other compression methods. Please consider changing
MapDeflateCompressionOption
to only work when we're using that algorithm.
Thank you so much for offering this fix!
@@ -86,7 +86,7 @@ internal ZipArchiveEntry(ZipArchive archive, ZipCentralDirectoryFileHeader cd) | |||
|
|||
_fileComment = cd.FileComment; | |||
|
|||
_compressionLevel = null; | |||
_compressionLevel = MapCompressionLevel(_generalPurposeBitFlag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place I found reading the value of _compressionLevel
that could potentially change behavior due to this change is inside the GetDataCompressor
method (in this same file), in the default switch case, where if the value of _compressionLevel
is null
, we default to CompressionLevel.Optimal
.
Can you please double check if we have a test that can reach the code inside the switch's default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the better option here might be to eliminate the null check. The only time that _compressionLevel
is going to be null is when creating a new ZipArchiveEntry. This sets _compressionLevel
to null, then sets the bit flags to zero in MapDeflateCompressionOption
. GetDataCompressor
will then interpret the null as CompressionLevel.Optimal
implicitly.
If the same ZIP file is loaded again though, MapCompressionLevel
will map the zero to CompressionLevel.Optimal
explicitly. I think it'd be clearer to make _compressionLevel
non-nullable, set it to CompressionLevel.Optimal
explicitly in the constructor and remove the coalesce inside GetDataCompressor
.
Doing this means that there's no extra check in GetDataCompressor
's default case for a test to reach. There's provably no behavioural change here to test because the current behaviour is effectively:
_compressionLevel = null;
// ...
compressorStream = new DeflateStream(/**/, _compressionLevel ?? CompressionLevel.Optimal, /**/);
while the new behaviour would be:
_compressionLevel = CompressionLevel.Optimal;
// ...
compressorStream = new DeflateStream(/**/, _compressionLevel, /**/);
src/libraries/System.IO.Packaging/src/System/IO/Packaging/ZipPackage.cs
Outdated
Show resolved
Hide resolved
@@ -393,7 +397,7 @@ internal static string GetOpcNameFromZipItemName(string zipItemName) | |||
break; | |||
case CompressionOption.SuperFast: | |||
{ | |||
compressionLevel = CompressionLevel.Fastest; | |||
compressionLevel = CompressionLevel.NoCompression; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a breaking change, but if it's more appropriate, we can take it.
@edwardneal @maedula is this the preferred/more correct CompressionLevel
value when converting from XPS?
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
Thanks @carlossanlop - I'll work through your comments over the next few days. I changed the originally-suggested CompressionOption/CompressionLevel mapping to restore backwards compatibility with the .NET Framework. In the reference source, a CompressionOption is mapped to a DeflateOptionEnum, which directly represents the bits in the file (written out here). Currently, specifying a CompressionOption of Maximum via .NET Core will result in a CompressionLevel of Optimal and will not set any bits. In .NET Framework, it'll set bit 1. Similarly, a CompressionOption of SuperFast will set bit 2 in .NET Core and bits 1 & 2 in .NET Framework. I've put a table in the runtime issue, but unfortunately there's a behavioural difference involved no matter what happens - either between .NET Core and Framework, or between the previous version of Core and this one. I'll make sure the value of bits 1 & 2 is zeroed as per the specification. |
* Updated the conditional compilation directives for the .NET Framework/Core package CompressionLevel mappings. * Specifying a CompressionMethod other than Deflate or Deflate64 will now set the compression level to NoCompression, and will write zeros to the relevant general purpose bits. * The CompressionLevel always defaulted to Optimal for new archives, but this is now explicitly set (rather than relying upon setting it to null and null-coalescing it to Optimal.) This removes a condition to test for. * Updated the test data for the CreateArchiveEntriesWithBitFlags test. If the compression level is set to NoCompression, we should expect the CompressionMethod to become Stored (which unsets the general purpose bits, returning an expected result of zero.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick response!
unfortunately there's a behavioural difference involved no matter what happens - either between .NET Core and Framework, or between the previous version of Core and this one.
When you say behavioral difference, do you mean the compressed output is different? We don't really measure that, and we don't guarantee a specific output. What really trully matters is that:
- The output is valid for DEFLATE compression.
- The output can be roundtripped (compressed, then decompressed to the original value).
- Other tools can read the zip.
- Specifically for this issue, that the OpenXML standard is respected as expected.
If instead, by behavioral difference you mean the mapping is now different, then we could document this as a breaking change, which I think would be acceptable. In the end, what truly matters is what I listed above.
The changes look good to me, so I'm approving. Before this gets merged, two things that I'd like to double check:
- @ericstj since you offered an initial suggestion for the compression level mappings, do you agree with the alternative mapping offered by @edwardneal here?
- @edwardneal did you get a chance to confirm that OpenXML is now working as expected with the provided fix?
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
Thanks @carlossanlop. When I use the DocumentFormat.OpenXml library to generate a spreadsheet, I can see the changes being reflected in the bits being set in the file, so I think it's now just a matter of the CompressionOption/CompressionLevel mapping. The movement of CompressionOption.NoCompression has changed my mind, and I'm happy to carry on with ericstj's original mapping. The OPC spec mandates the mappings from CompressionOption to the general-purpose flags. The ZipArchive's CompressionLevel which sit in between is purely a matter of compatibility between .NET Core's CompressionLevel and Framework's DeflateOptionsEnum. This is a little difficult because the .NET Framework's DeflateOptionEnum has four options, while CompressionLevel has three (plus CompressionLevel.NoCompression.)
This matches ericstj's original mapping, so I think we now both agree on it. The only quirk here is that under some circumstances, a package won't round-trip perfectly. Somebody could add a ZipPackagePart with a CompressionOption of Fast and this would be persisted as a CompressionLevel of Fastest. When re-reading the package, the ZipArchiveEntry's CompressionLevel would instead be mapped to a CompressionOption of SuperFast. A complete way to fix this would be to add a new CompressionLevel ("Faster"?) and mapping this to a different ZLib compression level (perhaps 4, since Fastest is 1 and Optimum is 6). I don't personally think it's needed for OpenXML's purpose though - although the compression metadata's interpreted differently, the uncompressed data should still be round-tripped correctly. |
The original issue was all about making a compliance test pass - even though the implementation did nothing with the value other than persist it. Whatever is chosen here should ensure it satisfies that test. If the mapping needs to change to satisfy the test then that's ultimately what this should do. I don't have a very strong opinion here other than that the mapping "makes sense". I did mark this as a "breaking-change" because we are going to start setting some bits in the zip header we didn't set before and that could have some effects downstream so it makes sense to document it. If we do any other breaks to the mapping from Packaging -> Compression as part of this that should also be documented.
When new entries are created the compression level is specified per-entry so we'd honor whatever someone specifies when creating the entry. If the entry already exists it looks to me like we just persist the value that's already there. Can we double check (with a test) that we persist these bits correctly in an update case? |
* Updated mapping between general purpose bit fields and CompressionLevel. * Updated mapping from CompressionOption to CompressionLevel. * Added test to verify round-trip of CompressionOption and its setting of the general purpose bit fields.
Thanks for the quick response. I understand your point around the breaking change, and I'll file the issue shortly.
You're right - updating the contents of a ZipArchiveEntry with bit 2 set and bit 1 unset, then saving the ZipArchive shouldn't change the contents of the bits. They'll be persisted as they were read, the compression level can't be changed post hoc. The round-trip I'm talking about is actually in Packaging: var ms = new MemoryStream();
var pkg= Package.Open(ms, FileMode.OpenOrCreate);
var partUri = PackUriHelper.CreatePartUri(new Uri("/root.xml", UriKind.Relative));
var packagePart = newlyCreatedPackage.CreatePart(partUri, "application/xml", CompressionOption.Fast);
pkg.Flush();
pkg.Dispose();
ms.Seek(0, SeekOrigin.Begin);
pkg = Package.Open(ms);
packagePart = pkg.GetPart(partUri);
Debug.Assert(packagePart.CompressionOption == CompressionOption.Fast, "This assert will fail."); My original point was that I've checked while writing the extra test though, and realised that this isn't actually a new problem. At present, the .NET Core version of ZipPackage doesn't map the ZipArchiveEntry's compression level to a CompressionOption at all: ZipArchiveEntry's compression level isn't publicly exposed, so the Packaging API can't see it and just defaults the package part's CompressionOption to Normal. This isn't consistent with .NET Framework, but it'd require an API change to modify. I think it'd be better to keep this PR clean, then raise a separate issue which groups it with the additional CompressionLevel enum member. |
I've created the breaking change issue as dotnet/docs#40299. |
@dotnet/area-system-io-compression is this likely to be reviewed ready for merge prior to Preview 5? |
Thanks for checking this. I agree that since this is not a new problem, we can tackle it in a separate issue so we can decide what can be done. Would you be interested in opening the issue?
Thank you so much for opening the breaking change issue. As long as what you described is well documented there, we are good to go. This is ready to merge. |
Ah, I need to update the branch so the CI issues can get linked to known build issues, otherwise I'm blocked from merging. The results were too old so I was unable to get the failure information. |
That's great, thanks! I'll raise the issue about the PackagePart's CompressionOption in the next few days. |
Much appreciated, thanks. @edwardneal remind me: to which released .NET version do you need this backported? Edit: I found the info in the openxml issue. You need it in 6.0. We won't backport it to 7.0 as it's already EOL, but we can also backport it to 8.0. |
/backport to release/6.0-staging |
/backport to release/8.0-staging |
Started backporting to release/6.0-staging: https://github.com/dotnet/runtime/actions/runs/9024932428 |
Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/9024933570 |
@carlossanlop backporting to release/6.0-staging failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Use CompressionLevel to set general-purpose bit flags
Using index info to reconstruct a base tree...
M src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
M src/libraries/System.IO.Compression/tests/ZipArchive/zip_CreateTests.cs
M src/libraries/System.IO.Packaging/src/System/IO/Packaging/ZipPackage.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.IO.Packaging/src/System/IO/Packaging/ZipPackage.cs
Auto-merging src/libraries/System.IO.Compression/tests/ZipArchive/zip_CreateTests.cs
Auto-merging src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Applying: Made function verbs consistent
Using index info to reconstruct a base tree...
M src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Applying: Added test to verify read file contents
Applying: Corrected failing Packaging test
Using index info to reconstruct a base tree...
A src/libraries/System.IO.Packaging/tests/ReflectionTests.cs
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): src/libraries/System.IO.Packaging/tests/ReflectionTests.cs deleted in HEAD and modified in Corrected failing Packaging test. Version Corrected failing Packaging test of src/libraries/System.IO.Packaging/tests/ReflectionTests.cs left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0004 Corrected failing Packaging test
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
@carlossanlop an error occurred while backporting to release/6.0-staging, please check the run log for details! Error: git am failed, most likely due to a merge conflict. |
* Use CompressionLevel to set general-purpose bit flags Also changed mapping of ZipPackage's compression options, such that CompressionOption.Maximum now sets the compression level to SmallestSize, and SuperFast now sets the compression level to NoCompression. Both of these changes restore compatibility with the .NET Framework. * Made function verbs consistent * Added test to verify read file contents * Corrected failing Packaging test This test was intended to ensure that bit 11 isn't set. It was actually performing a blind comparison of the entire bit field. Other tests in System.IO.Packaging function properly. * Changes following code review * Updated the conditional compilation directives for the .NET Framework/Core package CompressionLevel mappings. * Specifying a CompressionMethod other than Deflate or Deflate64 will now set the compression level to NoCompression, and will write zeros to the relevant general purpose bits. * The CompressionLevel always defaulted to Optimal for new archives, but this is now explicitly set (rather than relying upon setting it to null and null-coalescing it to Optimal.) This removes a condition to test for. * Updated the test data for the CreateArchiveEntriesWithBitFlags test. If the compression level is set to NoCompression, we should expect the CompressionMethod to become Stored (which unsets the general purpose bits, returning an expected result of zero.) * Code review changes * Updated mapping between general purpose bit fields and CompressionLevel. * Updated mapping from CompressionOption to CompressionLevel. * Added test to verify round-trip of CompressionOption and its setting of the general purpose bit fields. --------- Co-authored-by: Carlos Sánchez López <1175054+carlossanlop@users.noreply.github.com>
Fixes #88812.
Also contributes to dotnet/Open-XML-SDK#1443.
The .NET Framework previously allowed consumers to set ZipPackage's CompressionLevel to update bits 1 & 2 in the ZIP archive header's general purpose bitfield. This was lost in .NET Core, and the PR restores it.
Specifying the CompressionLevel of ZipArchive will now change these bits, and the ZipArchive's CompressionLevel will be read from the bitfield (defaulting to null/Optimum, as before.) It's a fairly small change, but I've added a test to verify that setting the compression level sets the bits, that they can be re-read correctly and that the file data can still be decompressed correctly.