Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Html Writer Comments - Breaking Change #3957

Merged
merged 4 commits into from
Apr 14, 2024

Conversation

oleibman
Copy link
Collaborator

Fix #3954. In response to a Security Incident, a package was added to the project to sanitize the Html for comments attached to a cell. We have tried at least 2 different packages for this purpose, and users have raised legitimate concerns about both.

I believe that adding a sanitizer package, although it addresses the problem, was overkill. Cell comments are a RichText object, and Html Writer already handled RichText cell values. Values did not figure in the Incident, and they surely would have done so if they were a problem because it is a lot easier to set up test cases for values than for comments. RichText values were not a problem because they were sanitized with htmlspecialchars; if comments were to use the same code that RichText was already using, it would likewise be safely sanitized. As an added bonus, the existing code for comments only uses the plaintext value, but the values code would allow the comments to be styled, just as they are for Xlsx.

This is a breaking change - I don't think it will affect a lot of users, but there may be some. It is worth noting that the comment block before function writeComment has a link explaining what is being done. That link mentions only styling elements, not other possibilities like hyperlinks. At any rate, if people are putting styling tags (e.g. <b>) in their comments for this purpose, those will no longer work; the styling needs to be applied to the RichText elements. That will keep the user code the same regardless of the format of the intended output, which is certainly a good thing, but it is a break. If people are trying to insert non-styling tage (e.g. <a>), those will no longer work, just as there is no way to do it for Xlsx (although Excel itself may convert the raw text to a hyperlink, but that's out of scope, at least for now). I also note that, even with the current code, I haven't yet found a way to keep the Html comment in place long enough to actually click on any hyperlinks.

The main package being dropped brings several other packages along with it, so the project as a whole will have a slightly lighter footprint than before.

The existing XssVulnerability test cases are all preserved. Although the final result of the sanitizing changes, it can easily be seen that the results are all harmless.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

Fix PHPOffice#3954. In response to a Security Incident, a package was added to the project to sanitize the Html for comments attached to a cell. We have tried at least 2 different packages for this purpose, and users have raised legitimate concerns about both.

I believe that adding a sanitizer package, although it addresses the problem, was overkill. Cell comments are a RichText object, and Html Writer already handled RichText cell *values*. Values did not figure in the Incident, and they surely would have done so if they were a problem because it is a lot easier to set up test cases for values than for comments. RichText values were not a problem because they were sanitized with `htmlspecialchars`; if comments were to use the same code that RichText was already using, it would likewise be safely sanitized. As an added bonus, the existing code for comments only uses the plaintext value, but the values code would allow the comments to be styled, just as they are for Xlsx.

This is a breaking change - I don't think it will affect a lot of users, but there may be some. It is worth noting that the comment block before function `writeComment` has a link explaining what is being done. That link mentions only styling elements, not other possibilities like hyperlinks. At any rate, if people are putting styling tags (e.g. `<b>`) in their comments for this purpose, those will no longer work; the styling needs to be applied to the RichText elements. That will keep the user code the same regardless of the format of the intended output, which is certainly a good thing, but it is a break. If people are trying to insert non-styling tage (e.g. `<a>`), those will no longer work, just as there is no way to do it for Xlsx (although Excel itself may convert the raw text to a hyperlink, but that's out of scope, at least for now). I also note that, even with the current code, I haven't yet found a way to keep the Html comment in place long enough to actually click on any hyperlinks.

The main package being dropped brings several other packages along with it, so the project as a whole will have a slightly lighter footprint than before.

The existing `XssVulnerability` test cases are all preserved. Although the final result of the sanitizing changes, it can easily be seen that the results are all harmless.
@oleibman oleibman marked this pull request as draft March 20, 2024 17:59
@oleibman
Copy link
Collaborator Author

@MarkBaker @PowerKiKi Due to the security-related nature of the original problem (and because it's a breaking change), I do not want to move ahead with this without giving you the opportunity to review it. So I will keep it in draft status for a while - please let me know what you think. Also, any idea why the Doc Build failed? This is the first problem I've seen with that since it was introduced, and there aren't even any doc changes in the PR. I assume it's transient, but, just in case you see somthing ...

Provide a means for potentially affected users to test if new logic is in effect so that they can update code beforehand.
@oleibman
Copy link
Collaborator Author

Doc Build succeeded this time - as suspected, whatever the error was was transient.

@oleibman oleibman marked this pull request as ready for review March 25, 2024 18:55
@oleibman
Copy link
Collaborator Author

I have taken the request out of draft status. I will be away for a while. I will not merge it before April 18.

@oleibman oleibman added this pull request to the merge queue Apr 14, 2024
Merged via the queue into PHPOffice:master with commit f7cf378 Apr 14, 2024
14 checks passed
@oleibman oleibman deleted the htmlcomments branch April 15, 2024 01:49
@PowerKiKi
Copy link
Member

Thanks for that, and sorry I didn't gave a feedback earlier, even so I did had a look and it seems like a reasonable things to do.

@cmanley
Copy link

cmanley commented May 9, 2024

@oleibman thanks for your commit. I'm glad you removed those voku dependencies. It's been wreaking havoc for me. I hope your commit ends up in a release soon. For now I must decide between falling back to the latest 1.* version or using your commit hash in composer.json.
B.t.w. phpspreadsheet also calls ini_set in 1 place (Csv.php) without first storing the original value and putting it back, but in that case it's a minor thing that can be improved later.

@PowerKiKi
Copy link
Member

Don't downgrade, I'll release a new version shortly.

@PowerKiKi
Copy link
Member

https://github.com/PHPOffice/PhpSpreadsheet/releases/tag/2.1.0

@oleibman
Copy link
Collaborator Author

@cmanley I think Csv Reader does clean up after itself. It can change a setting in setAutoDetect. But that method is called only at the beginning of loadStringOrFile, and the old value is restored at the end of that method, and there are no premature returns in between. If you are still concerned about it, you can call setTestAutoDetect(false) before attempting to load a Csv; this will eliminate the ini_set altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Make voku packages optional
3 participants