-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect truncated utf-8 characters at the end of content as still representing utf-8 #19773
Detect truncated utf-8 characters at the end of content as still representing utf-8 #19773
Conversation
…esenting utf-8 Our character detection algorithm can potentially incorrectly detect utf-8 as iso-8859-x if there is a truncated character at the end of the partially read file. This PR changes the detection algorithm to truncated utf8 characters at the end of the buffer. Fix go-gitea#19743 Signed-off-by: Andrew Thornton <art27@cantab.net>
Strangely I'm having some difficulty creating a test that replicates this issue from within the charset module. I'm not certain as to what's going on that means that I can't replicate this. I've been able to add a testcase. |
See my comment in #19743, there is a test case. |
Signed-off-by: Andrew Thornton <art27@cantab.net>
Codecov Report
@@ Coverage Diff @@
## main #19773 +/- ##
=======================================
Coverage ? 47.29%
=======================================
Files ? 957
Lines ? 133317
Branches ? 0
=======================================
Hits ? 63058
Misses ? 62599
Partials ? 7660
Continue to review full report at Codecov.
|
It's better to include this test case. |
I've already included a specific test case but I can add that if you really want it. |
Signed-off-by: Andrew Thornton <art27@cantab.net>
…ection' into fix-19743-improve-encoding-detection
…esenting utf-8 (go-gitea#19773) Backport go-gitea#19773 Our character detection algorithm can potentially incorrectly detect utf-8 as iso-8859-x if there is a truncated character at the end of the partially read file. This PR changes the detection algorithm to truncated utf8 characters at the end of the buffer. Fix go-gitea#19743 Signed-off-by: Andrew Thornton <art27@cantab.net>
…esenting utf-8 (#19773) (#19774) Backport #19773 Our character detection algorithm can potentially incorrectly detect utf-8 as iso-8859-x if there is a truncated character at the end of the partially read file. This PR changes the detection algorithm to truncated utf8 characters at the end of the buffer. Fix #19743 Signed-off-by: Andrew Thornton <art27@cantab.net>
* giteaofficial/main: Prevent NPE when cache service is disabled (go-gitea#19703) Detect truncated utf-8 characters at the end of content as still representing utf-8 (go-gitea#19773) Add silentcodeg to MAINTAINERS (go-gitea#19771) Allows repo search to match against "owner/repo" pattern strings (go-gitea#19754) Update JS dependencies (go-gitea#19767) Nuke the incorrect permission report on /api/v1/notifications (go-gitea#19761)
## [1.16.9](https://github.com/go-gitea/gitea/releases/tag/1.16.9) - 2022-06-20 * BUGFIXES * Fix permission check for delete tag (go-gitea#19985) (go-gitea#20001) * Only log non ErrNotExist errors in git.GetNote (go-gitea#19884) (go-gitea#19905) * Use exact search instead of fuzzy search for branch filter dropdown (go-gitea#19885) (go-gitea#19893) * Set Setpgid on child git processes (go-gitea#19865) (go-gitea#19881) * Import git from alpine 3.16 repository as 2.30.4 is needed for `safe.directory = '*'` to work but alpine 3.13 has 2.30.3 (go-gitea#19876) * Ensure responses are context.ResponseWriters (go-gitea#19843) (go-gitea#19859) * Fix count bug (go-gitea#19850) * Fix raw endpoint PDF file headers (go-gitea#19825) (go-gitea#19826) * Make WIP prefixes case insensitive, e.g. allow `Draft` as a WIP prefix (go-gitea#19780) (go-gitea#19811) * Fix NotificationUnreadCount (go-gitea#19802) * Prevent NPE when cache service is disabled (go-gitea#19703) (go-gitea#19783) * Detect truncated utf-8 characters at the end of content as still representing utf-8 (go-gitea#19773) (go-gitea#19774) * Fix doctor pq: syntax error at or near "." quote user table name (go-gitea#19765) (go-gitea#19770) * Fix bug (go-gitea#19757) Signed-off-by: Andrew Thornton <art27@cantab.net>
…esenting utf-8 (go-gitea#19773) Our character detection algorithm can potentially incorrectly detect utf-8 as iso-8859-x if there is a truncated character at the end of the partially read file. This PR changes the detection algorithm to truncated utf8 characters at the end of the buffer. Fix go-gitea#19743 Signed-off-by: Andrew Thornton <art27@cantab.net>
Our character detection algorithm can potentially incorrectly detect utf-8 as iso-8859-x
if there is a truncated character at the end of the partially read file.
This PR changes the detection algorithm to truncated utf8 characters at the end of the
buffer.
Fix #19743
Signed-off-by: Andrew Thornton art27@cantab.net