-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open XML SDK ZIP local file header is not compliant to ISO/IEC 29500-2. Incorrect "general purpose bit flag" settings #1443
Comments
Hi @maedula, I'm looking into this. I've confirmed your findings and am going to consult with the .Net team to see if core can do the same. |
@maedula, I've discussed with the dotnet team, this is still under investigation but I've filed dotnet/runtime#87658 to track it with them. It looks like |
@tomjebo Appreciate your swift investigation on this issue. Thanks a lot! |
@maedula thanks for raising the issue, it looks like dotnet/runtime#87658 has been fixed and merged. If you're ok, I'll close this issue. |
@tomjebo thanks a lot to you and the team for the identifying the root cause and having this fixed is quickly. Stupid question though. The runtime fix is having set a milestone for 8.0.0. That´s confusing me. Will it still go into next upcoming version 6.0.19? Once this is clarified we can go ahead and close this ticket. Once a new runtime version is available I reverify again on my end. |
@maedula Yes, it's going to be available in .NET 8 Preview 6 initially (that version will be shipped on Jul 11th). If you get a chance to test the fix in that version and confirm it works well, that would be awesome and extremely helpful. We want to wait for the fix to bake in preview first, then if we confirm there are no unexpected issues, we can talk about considering it as a candidate for backporting it into the next servicing releases of 7.0 and 6.0. cc @ericstj |
You can pull nightly builds from the NuGet feed |
@ericstj and @tomjebo Thanks for the NuGet feed including the fix. I have tested it now. The good news is that bit 11 is no longer toggled and thus being strictly speaking again compliant to ISO/IEC 29500-2. The bad news is that bit 1 and 2 are zero regardless of the chosen SpreadsheetDocument.CompressOption. This is different to the usage of .net framework, which correctly sets bit 1 and 2 according to the CompressionOption. My biggest argument here for an investigation was the compliance to 29500-2 but with that the CompressionOption is not considered in the "general purpose bit" field might be considered a cosmetic issue? Default "general purpose bit" field used by Excel is 06 00 (which indicates CompressionOption.SuperFast) while the fix is now indicating 00 00 (which indicates CompressionOption.Normal). File sizes differ for different CompressionOption settings. So a different compression is being applied according to the option. But the "general purpose bit" field will always indicate the same 00 00. If it is up to me, I still consider this to not be really correct yet after all. |
Sounds like a different issue. Here is the meaning of those bits for deflate (as is used here): https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE_6.2.0.txt
So they just reflect whatever sort of compression settings were used when producing the ZIP. Those settings might make sense for some zip implementations, but not for others. In the NETFx WindowsBase compression implementation this was set here: These settings don't directly map to CompressionLevel's exposed in System.IO.Compression's zip implementation: So we can't really make the ZipArchiveEntry write those bits since it's zip implementation doesn't have the same options. In fact, the underlying deflate algorithm has even more options for compression than we expose or these bits represent. We could try to reverse some mapping based on CompressionLevel, but as you can see from the current mapping in ZipPackage we wouldn't be able to represent all options. The bit 11 problem was a correctness issue - making it impossible to represent a non-UTF8 archive - bit 1/2 seem to be informational. Can you point to a place where this is causing a problem with a zip client, or to a place in the spec that requires these bits to be set? Seems to me they should be optional. @carlossanlop what's your take? EDIT: Further data to suggest that these bits aren't meaningful. On .NETFramework the CompressionOption was completely ignored during compression. See |
@ericstj I greatly appreciate your input to this topic and insight on the root-causes. If it is up to me I would be more than happy to at least gain the limited mapping set, which .net framework offered compared to no option of influencing bit 1 and 2 at all. Especially for bit 1 and 2 being enabled considering this is the Microsoft Office Excel default usage. The background to all of this is that I´m forced by a customer project to reuse the very same zip header settings as Excel does. They are using a very strict IT security application affecting their PC install base (utilized on thousands of PCs), dropping XLSX files as potentially malicious in case there is a deviation in the ZIP header compared to Excel. So it is a few bits but a big impact on my application. The app is supposed to create mandatory XLSX reports, which are being rendered unusable for this customer project and thus threatening overall project success. Downgrading from .net core to .net framework just does not sound like a reasonable approach to me up to now. Pushing back on the meaningfulness of IT security app behavior was not successful either up to now. I appreciate your considerations on this matter and looking forward to your decision on how to proceed. Thanks a lot! |
Your findings here somehow deviate from my code tests with .net framework and the sample code in my original posting. Changing the CompressionOption IS changing the general purpose bit field for .net framework. Maybe the actual compression algo is not changed but it is changing the bit field, which is at least vital to me here. I did not change how files sizes differ though... |
I didn't mean to disagree. I was pointing out that only thing that changes when you specify CompressionOption was the bits in the header (as you noticed) -- it didn't affect the behavior of the compression algorithm at all. I was pointing this out because it showed that these bits were pretty useless as they only recorded a parameter passed to the API and not any information needed to consume the archive.
I recommend that you open an issue in dotnet/runtime that describes what you'd like to see. My comments so far haven't been so much of a "decision" but an analysis of the state of the implementation, and my opinion about the importance of these bits and a suggestion for how to proceed. Since it seems like you don't need all possible values of bits 2,1 to be set but only 11 -> SuperFast to be set, to match what Excel did, maybe you could ask for ZipArchiveEntry to set these as follows: |
@ericstj sorry for the delay and thanks for your comments here. Even though they are little confusing to me. I agree the report here is about a spec incompliance and this has been resolved. So from this perspective this ticket can be closed. But what´s not clear to me is how I can go forward with my request of bits 1 and 2 to be mappable to the CompressionOption to have the same behavior as back with .net framework. I´m utilizing as an API OpenXML SDK and not ZipPackage. I have not checked all your code references and I´m not sure if out of the box without changes to OpenXML SDK the ZipPackage would be able to take care of a suitable mapping of bits 1 and 2. Can you confirm that dotnet/runtime would have to implement a change only for the bit mappings only? I further understand you suggest me to open a new issue with dotnet/runtime and explain this situation based on what we have summarized here? Thanks a lot for your clarification in advance! |
Yes - open an issue in dotnet/runtime suggesting that the ZipArchive implementation either set bits 1 & 2 as some mapping from CompressionLevel, or expose and API so that callers can specify the value for these bits - connecting the ask to the OpenXML document which states that these bits should be set. |
@ericstj issue is created in dotnet/runtime. Not sure how to proceed here for this issue. If it is up to me, we can get this closed as resolved |
With my new enhancement request raised for dinner/runtime and realistically not get any priority for this anytime soon, let's close this as bit11 no longer toggled and thus compliant again |
The upstream bug is now fixed in commit 6cc6c66, and should hopefully be available from .NET 9 preview 5, so setting a package part's CompressionOption to Fast or SuperFast will result in general-purpose bits 1 & 2 being set. It's also being backported to .NET 6 and 8. |
Hi all - We merged this fix in runtime for main dotnet/runtime#98278 . The fix would introduce a breaking change as there would not be a clean roundtrip between .NET Framework and .NET 9+. We are ok introducing this in .NET 9 but it would not meet the bar for backporting them to .NET 8 and/or .NET 6, which are LTS releases. @maedula (or anyone reading this), can you please help describe the severity of the impact of this bug? Is it blocking some customer scenarios? Please feel free to contact me via email directly if you prefer. |
Does this mean that something saved on .NET 9+ couldn't be opened in previous versions? Or does it mean that if it is resaved in < .NET 9 it will not be bitwise the same? May main goal is to understand if as the OpenXML SDK we should consider it a breaking change for consumers of our package. |
I'm going to reopen so we can make sure to understand the impact of taking the change |
@carlossanlop we need to distinguish between two different issues. The one issue described here with bit 11 (UTF-8 usage) being toggled but this way not being compliant to ISO/IEC 29500-2 anymore. I guess the initial description is pretty detailed here. If you have any specific questions, I´m happy to get back to you asap. I consider this as a rather severe issue. Not sure how the breaking change can be of more concern than not to be spec compliant. The other issue is that compared to .net framework bit 1 and 2 any CompressionOption is not having any impact on these bits is rather cosmetic. |
Describe the bug
The currently created XLSX files are not compliant to OpenXML specification ISO/IEC 29500-2. The reason is that in reference to the ZIP Appnote (ZIP File Format Specification Version 6.2.0) and the listed "Requirements on package implementers" in chapter B.4 are not met. What I would like to challenge in particular is the set "General purpose bit flag" of the ZIP local file header. The currently used bit flag is utilizing bit 11, which is unsupported.
Following references are hex representations of the XLSX file...
OpenXML SDK used incorrect local file header: 50 4B 03 04 14 00 00 08 08 00
Microsoft Excel used correct file header: 50 4B 03 04 14 00 06 00 08 00
So Microsoft Excel by defaults seems to use "super fast compression" method by default with bit1 and bit2 toggled.
When using .net framework as a target, the headers are written correctly and thus compliant to ISO/IEC 29500-2.
Check out for your reference
https://standards.iso.org/ittf/PubliclyAvailableStandards/c077818_ISO_IEC_29500-2_2021(E).zip
http://www.pkware.com/documents/APPNOTE/APPNOTE_6.2.0.txt
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Give the same correct result for .net Core like for .net framework and thus OpenXML SDK will be compliant to the definitions of ISO_IEC_29500-2. The chosen CompressionOption in the .net code should be correctly considered for the "general purpose bit flag".
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: