-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change unzip reading method for byte to byte instead of chars #11
Conversation
This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation. |
Thank you for this pull request! Please check this document for how the Jenkins project handles pull requests. |
🐝 |
|
||
try (InputStream is = zip.getInputStream(entry); ByteArrayOutputStream output = new ByteArrayOutputStream()) { | ||
IOUtils.copyLarge(is, output); | ||
strMap.put(entry.getName(), new String(output.toByteArray(), Charset.defaultCharset())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Charset.defaultCharset() can cause problems in some setups, ZipFile is using UTF8 internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You think it might be better to fix to UTF-8? I have never worked with ZipFiles so I used the same approach used previously. What I have read from JavaDoc is that The UTF-8 charset is used to decode the entry names and comments.
so it seems that the content itself would be in the original encode right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the contents of a File - those can be in any Charset under the sun - or even binary - of note is that the Manifest will be in UTF-8.
perhaps this step needs to take an option String charset parameter.
🐝 |
@reviewbybees done |
This pull request has completed our internal processes and we now respectfully request the maintainers of this repository to consider our proposal contained within this pull request for merging. |
🐝 👍 Just waiting for Jenkins to finish the check before merging |
change unzip reading method for byte to byte instead of chars
In previous implementation the method used to read the content of a file was using
BufferedReader.readLine
method. This method skips line terminators, so what we did was to append them at the end. The problem is that if a file has a final line with only one line terminator, then this last line was skipped.For example:
And happened with:
In this last case the carriage return was trimmed so the file read was not exactly the same as the original one.
To avoid this a "byte to byte" reading is used. Notice that since all the content file is going to be in memory at the end (the method modified returns a Map with tuple <filename, content> we can read all the contents in a
ByteArrayOutputStream
.This could be implemented in another ways but since at the end the String with all the content will be present, I have decided to implement it in the more compact possible way.
@reviewbybees