Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open XML SDK ZIP local file header is not compliant to ISO/IEC 29500-2. Incorrect "general purpose bit flag" settings #1443

Open
maedula opened this issue Jun 6, 2023 · 21 comments
Assignees
Labels

Comments

@maedula
Copy link

maedula commented Jun 6, 2023

Describe the bug
The currently created XLSX files are not compliant to OpenXML specification ISO/IEC 29500-2. The reason is that in reference to the ZIP Appnote (ZIP File Format Specification Version 6.2.0) and the listed "Requirements on package implementers" in chapter B.4 are not met. What I would like to challenge in particular is the set "General purpose bit flag" of the ZIP local file header. The currently used bit flag is utilizing bit 11, which is unsupported.

Following references are hex representations of the XLSX file...
OpenXML SDK used incorrect local file header: 50 4B 03 04 14 00 00 08 08 00
Microsoft Excel used correct file header: 50 4B 03 04 14 00 06 00 08 00

So Microsoft Excel by defaults seems to use "super fast compression" method by default with bit1 and bit2 toggled.

When using .net framework as a target, the headers are written correctly and thus compliant to ISO/IEC 29500-2.

Check out for your reference
https://standards.iso.org/ittf/PubliclyAvailableStandards/c077818_ISO_IEC_29500-2_2021(E).zip
http://www.pkware.com/documents/APPNOTE/APPNOTE_6.2.0.txt

To Reproduce

using System.IO.Packaging;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

CreateSpreadsheetWorkbook("C:\\Temp\\test.xlsx");

void CreateSpreadsheetWorkbook(string filepath)
{
	// Create a spreadsheet document by supplying the filepath.
	// By default, AutoSave = true, Editable = true, and Type = xlsx.
	SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.
		Create(filepath, SpreadsheetDocumentType.Workbook);
	spreadsheetDocument.CompressionOption = CompressionOption.SuperFast;

	// Add a WorkbookPart to the document.
	WorkbookPart workbookpart = spreadsheetDocument.AddWorkbookPart();
	workbookpart.Workbook = new Workbook();

	// Add a WorksheetPart to the WorkbookPart.
	WorksheetPart worksheetPart = workbookpart.AddNewPart<WorksheetPart>();
	worksheetPart.Worksheet = new Worksheet(new SheetData());

	// Add Sheets to the Workbook.
	Sheets sheets = spreadsheetDocument.WorkbookPart!.Workbook.
		AppendChild<Sheets>(new Sheets());

	// Append a new worksheet and associate it with the workbook.
	Sheet sheet = new Sheet() { Id = spreadsheetDocument.WorkbookPart.
		GetIdOfPart(worksheetPart), SheetId = 1, Name = "mySheet" };
	sheets.Append(sheet);

	workbookpart.Workbook.Save();

	// Close the document.
	spreadsheetDocument.Dispose();
}

Steps to reproduce the behavior:

  1. Create an empty excel spreadsheet and store it in XLSX
  2. Check with your favorite HEX file viewer the corresponding local file header
  3. See that the general purpose flag is set to a unsupported "general purpose bit flag", where bit 11 is toggled

Expected behavior
Give the same correct result for .net Core like for .net framework and thus OpenXML SDK will be compliant to the definitions of ISO_IEC_29500-2. The chosen CompressionOption in the .net code should be correctly considered for the "general purpose bit flag".

Desktop (please complete the following information):

  • OS: Windows
  • Office version 16.0.1398.20008
  • .NET Target: .NET Core 6
  • DocumentFormat.OpenXml Version: 2.20.0)
@maedula maedula changed the title Bug with Open XML SDK Open XML SDK ZIP local file header is not compliant to ISO/IEC 29500-2. Incorrect "general purpose bit flag" settings Jun 6, 2023
@tomjebo tomjebo self-assigned this Jun 6, 2023
@tomjebo
Copy link
Collaborator

tomjebo commented Jun 7, 2023

Hi @maedula,

I'm looking into this. I've confirmed your findings and am going to consult with the .Net team to see if core can do the same.

@tomjebo
Copy link
Collaborator

tomjebo commented Jun 15, 2023

@maedula, I've discussed with the dotnet team, this is still under investigation but I've filed dotnet/runtime#87658 to track it with them. It looks like System.IO.Packaging.ZipArchive code is doing this unilaterally and I don't see a way to influence it from the SDK code. I'll follow up here when we have more information.

@maedula
Copy link
Author

maedula commented Jun 16, 2023

@tomjebo Appreciate your swift investigation on this issue. Thanks a lot!

@tomjebo
Copy link
Collaborator

tomjebo commented Jun 21, 2023

@maedula thanks for raising the issue, it looks like dotnet/runtime#87658 has been fixed and merged. If you're ok, I'll close this issue.

@maedula
Copy link
Author

maedula commented Jun 22, 2023

@tomjebo thanks a lot to you and the team for the identifying the root cause and having this fixed is quickly. Stupid question though. The runtime fix is having set a milestone for 8.0.0. That´s confusing me. Will it still go into next upcoming version 6.0.19? Once this is clarified we can go ahead and close this ticket. Once a new runtime version is available I reverify again on my end.

@carlossanlop
Copy link
Member

@maedula Yes, it's going to be available in .NET 8 Preview 6 initially (that version will be shipped on Jul 11th).

If you get a chance to test the fix in that version and confirm it works well, that would be awesome and extremely helpful. We want to wait for the fix to bake in preview first, then if we confirm there are no unexpected issues, we can talk about considering it as a candidate for backporting it into the next servicing releases of 7.0 and 6.0. cc @ericstj

@ericstj
Copy link
Member

ericstj commented Jun 23, 2023

You can pull nightly builds from the NuGet feed
https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet8/nuget/v3/index.json.
EG:
<PackageReference Include="System.IO.Packaging" Version="8.0.0-preview.6.23322.4" />

@maedula
Copy link
Author

maedula commented Jun 23, 2023

@ericstj and @tomjebo Thanks for the NuGet feed including the fix. I have tested it now. The good news is that bit 11 is no longer toggled and thus being strictly speaking again compliant to ISO/IEC 29500-2. The bad news is that bit 1 and 2 are zero regardless of the chosen SpreadsheetDocument.CompressOption. This is different to the usage of .net framework, which correctly sets bit 1 and 2 according to the CompressionOption. My biggest argument here for an investigation was the compliance to 29500-2 but with that the CompressionOption is not considered in the "general purpose bit" field might be considered a cosmetic issue?

Default "general purpose bit" field used by Excel is 06 00 (which indicates CompressionOption.SuperFast) while the fix is now indicating 00 00 (which indicates CompressionOption.Normal). File sizes differ for different CompressionOption settings. So a different compression is being applied according to the option. But the "general purpose bit" field will always indicate the same 00 00.

If it is up to me, I still consider this to not be really correct yet after all.

@ericstj
Copy link
Member

ericstj commented Jun 23, 2023

Sounds like a different issue.

Here is the meaning of those bits for deflate (as is used here): https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE_6.2.0.txt

          (For Methods 8 and 9 - Deflating)
          Bit 2  Bit 1
            0      0    Normal (-en) compression option was used.
            0      1    Maximum (-exx/-ex) compression option was used.
            1      0    Fast (-ef) compression option was used.
            1      1    Super Fast (-es) compression option was used.

          Note:  Bits 1 and 2 are undefined if the compression
                 method is any other.

So they just reflect whatever sort of compression settings were used when producing the ZIP. Those settings might make sense for some zip implementations, but not for others.

In the NETFx WindowsBase compression implementation this was set here:
https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Zip/ZipIOLocalFileHeader.cs,308
Based on the following mapping:
https://referencesource.microsoft.com/#WindowsBase/Base/System/IO/Packaging/ZipPackage.cs,490

These settings don't directly map to CompressionLevel's exposed in System.IO.Compression's zip implementation:
https://github.com/dotnet/runtime/blob/7d2a4d298a3d9bed50089eb79f67e5e8f3b0b190/src/libraries/System.IO.Packaging/src/System/IO/Packaging/ZipPackage.cs#L368

So we can't really make the ZipArchiveEntry write those bits since it's zip implementation doesn't have the same options. In fact, the underlying deflate algorithm has even more options for compression than we expose or these bits represent. We could try to reverse some mapping based on CompressionLevel, but as you can see from the current mapping in ZipPackage we wouldn't be able to represent all options.

The bit 11 problem was a correctness issue - making it impossible to represent a non-UTF8 archive - bit 1/2 seem to be informational. Can you point to a place where this is causing a problem with a zip client, or to a place in the spec that requires these bits to be set? Seems to me they should be optional. @carlossanlop what's your take?

EDIT: Further data to suggest that these bits aren't meaningful. On .NETFramework the CompressionOption was completely ignored during compression. See
https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/CompressStream.cs,710
https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/CompressStream.cs,723
https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/DeflateEmulationStream.cs,66
All use DeflateStream constructor without CompressionLevel is used, resulting in the default CompressionLevel.Optimal.

@maedula
Copy link
Author

maedula commented Jun 26, 2023

@ericstj I greatly appreciate your input to this topic and insight on the root-causes. If it is up to me I would be more than happy to at least gain the limited mapping set, which .net framework offered compared to no option of influencing bit 1 and 2 at all. Especially for bit 1 and 2 being enabled considering this is the Microsoft Office Excel default usage.

The background to all of this is that I´m forced by a customer project to reuse the very same zip header settings as Excel does. They are using a very strict IT security application affecting their PC install base (utilized on thousands of PCs), dropping XLSX files as potentially malicious in case there is a deviation in the ZIP header compared to Excel. So it is a few bits but a big impact on my application. The app is supposed to create mandatory XLSX reports, which are being rendered unusable for this customer project and thus threatening overall project success.

Downgrading from .net core to .net framework just does not sound like a reasonable approach to me up to now. Pushing back on the meaningfulness of IT security app behavior was not successful either up to now.

I appreciate your considerations on this matter and looking forward to your decision on how to proceed. Thanks a lot!

@maedula
Copy link
Author

maedula commented Jun 26, 2023

EDIT: Further data to suggest that these bits aren't meaningful. On .NETFramework the CompressionOption was completely > ignored during compression. See
https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/CompressStream.cs,710
https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/CompressStream.cs,723
https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/DeflateEmulationStream.cs,66
All use DeflateStream constructor without CompressionLevel is used, resulting in the default CompressionLevel.Optimal.

Your findings here somehow deviate from my code tests with .net framework and the sample code in my original posting. Changing the CompressionOption IS changing the general purpose bit field for .net framework. Maybe the actual compression algo is not changed but it is changing the bit field, which is at least vital to me here. I did not change how files sizes differ though...

@ericstj
Copy link
Member

ericstj commented Jun 26, 2023

Your findings here somehow deviate from my code tests with .net framework and the sample code in my original posting... Maybe the actual compression algo is not changed but it is changing the bit fiel

I didn't mean to disagree. I was pointing out that only thing that changes when you specify CompressionOption was the bits in the header (as you noticed) -- it didn't affect the behavior of the compression algorithm at all. I was pointing this out because it showed that these bits were pretty useless as they only recorded a parameter passed to the API and not any information needed to consume the archive.

I appreciate your considerations on this matter and looking forward to your decision on how to proceed

I recommend that you open an issue in dotnet/runtime that describes what you'd like to see. My comments so far haven't been so much of a "decision" but an analysis of the state of the implementation, and my opinion about the importance of these bits and a suggestion for how to proceed.

Since it seems like you don't need all possible values of bits 2,1 to be set but only 11 -> SuperFast to be set, to match what Excel did, maybe you could ask for ZipArchiveEntry to set these as follows:
CompressionLevel.Fastest -> 11 for "SuperFast" -- would cover CompressionOption.Fast and CompressionOption.SuperFast per https://github.com/dotnet/runtime/blob/bfa9b97712a3c62fba189c651a25f6d91ff3499f/src/libraries/System.IO.Packaging/src/System/IO/Packaging/ZipPackage.cs#L372
CompressionLevel.Optimal -> 00 for "Normal" -- would cover CompressionOption.Normal
CompressionLevel.SmallestSize -> 01 for "Maximum" -- would cover CompressionOption.Maximum if we made the change to our switch to use the new SmallestSize Enum value on frameworks where it is supported.

@maedula
Copy link
Author

maedula commented Jul 12, 2023

@ericstj sorry for the delay and thanks for your comments here. Even though they are little confusing to me. I agree the report here is about a spec incompliance and this has been resolved. So from this perspective this ticket can be closed.

But what´s not clear to me is how I can go forward with my request of bits 1 and 2 to be mappable to the CompressionOption to have the same behavior as back with .net framework. I´m utilizing as an API OpenXML SDK and not ZipPackage. I have not checked all your code references and I´m not sure if out of the box without changes to OpenXML SDK the ZipPackage would be able to take care of a suitable mapping of bits 1 and 2. Can you confirm that dotnet/runtime would have to implement a change only for the bit mappings only?

I further understand you suggest me to open a new issue with dotnet/runtime and explain this situation based on what we have summarized here?

Thanks a lot for your clarification in advance!

@ericstj
Copy link
Member

ericstj commented Jul 12, 2023

Yes - open an issue in dotnet/runtime suggesting that the ZipArchive implementation either set bits 1 & 2 as some mapping from CompressionLevel, or expose and API so that callers can specify the value for these bits - connecting the ask to the OpenXML document which states that these bits should be set.

@maedula
Copy link
Author

maedula commented Jul 13, 2023

@ericstj issue is created in dotnet/runtime. Not sure how to proceed here for this issue. If it is up to me, we can get this closed as resolved

@maedula
Copy link
Author

maedula commented Sep 7, 2023

With my new enhancement request raised for dinner/runtime and realistically not get any priority for this anytime soon, let's close this as bit11 no longer toggled and thus compliant again

@edwardneal
Copy link

The upstream bug is now fixed in commit 6cc6c66, and should hopefully be available from .NET 9 preview 5, so setting a package part's CompressionOption to Fast or SuperFast will result in general-purpose bits 1 & 2 being set. It's also being backported to .NET 6 and 8.

@carlossanlop
Copy link
Member

Hi all - We merged this fix in runtime for main dotnet/runtime#98278 . The fix would introduce a breaking change as there would not be a clean roundtrip between .NET Framework and .NET 9+. We are ok introducing this in .NET 9 but it would not meet the bar for backporting them to .NET 8 and/or .NET 6, which are LTS releases.

@maedula (or anyone reading this), can you please help describe the severity of the impact of this bug? Is it blocking some customer scenarios? Please feel free to contact me via email directly if you prefer.

@twsouthwick
Copy link
Member

The fix would introduce a breaking change as there would not be a clean roundtrip between .NET Framework and .NET 9+.

Does this mean that something saved on .NET 9+ couldn't be opened in previous versions? Or does it mean that if it is resaved in < .NET 9 it will not be bitwise the same? May main goal is to understand if as the OpenXML SDK we should consider it a breaking change for consumers of our package.

@twsouthwick
Copy link
Member

I'm going to reopen so we can make sure to understand the impact of taking the change

@twsouthwick twsouthwick reopened this Jun 4, 2024
@maedula
Copy link
Author

maedula commented Jun 4, 2024

Hi all - We merged this fix in runtime for main dotnet/runtime#98278 . The fix would introduce a breaking change as there would not be a clean roundtrip between .NET Framework and .NET 9+. We are ok introducing this in .NET 9 but it would not meet the bar for backporting them to .NET 8 and/or .NET 6, which are LTS releases.

@maedula (or anyone reading this), can you please help describe the severity of the impact of this bug? Is it blocking some customer scenarios? Please feel free to contact me via email directly if you prefer.

@carlossanlop we need to distinguish between two different issues. The one issue described here with bit 11 (UTF-8 usage) being toggled but this way not being compliant to ISO/IEC 29500-2 anymore. I guess the initial description is pretty detailed here. If you have any specific questions, I´m happy to get back to you asap. I consider this as a rather severe issue. Not sure how the breaking change can be of more concern than not to be spec compliant.

The other issue is that compared to .net framework bit 1 and 2 any CompressionOption is not having any impact on these bits is rather cosmetic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants