Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change unzip reading method for byte to byte instead of chars #11

Merged
merged 2 commits into from
Feb 18, 2016

Conversation

lordofthejars
Copy link
Contributor

In previous implementation the method used to read the content of a file was using BufferedReader.readLine method. This method skips line terminators, so what we did was to append them at the end. The problem is that if a file has a final line with only one line terminator, then this last line was skipped.

For example:

Hi\n
\n

And happened with:

Hi\n

In this last case the carriage return was trimmed so the file read was not exactly the same as the original one.

To avoid this a "byte to byte" reading is used. Notice that since all the content file is going to be in memory at the end (the method modified returns a Map with tuple <filename, content> we can read all the contents in a ByteArrayOutputStream.

This could be implemented in another ways but since at the end the String with all the content will be present, I have decided to implement it in the more compact possible way.

@reviewbybees

@ghost
Copy link

ghost commented Feb 17, 2016

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

@jenkinsadmin
Copy link
Member

Thank you for this pull request! Please check this document for how the Jenkins project handles pull requests.

@oleg-nenashev
Copy link
Member

🐝


try (InputStream is = zip.getInputStream(entry); ByteArrayOutputStream output = new ByteArrayOutputStream()) {
IOUtils.copyLarge(is, output);
strMap.put(entry.getName(), new String(output.toByteArray(), Charset.defaultCharset()));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Charset.defaultCharset() can cause problems in some setups, ZipFile is using UTF8 internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You think it might be better to fix to UTF-8? I have never worked with ZipFiles so I used the same approach used previously. What I have read from JavaDoc is that The UTF-8 charset is used to decode the entry names and comments. so it seems that the content itself would be in the original encode right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the contents of a File - those can be in any Charset under the sun - or even binary - of note is that the Manifest will be in UTF-8.
perhaps this step needs to take an option String charset parameter.

@jtnord
Copy link
Member

jtnord commented Feb 18, 2016

🐝

@lordofthejars
Copy link
Contributor Author

@reviewbybees done

@ghost
Copy link

ghost commented Feb 18, 2016

This pull request has completed our internal processes and we now respectfully request the maintainers of this repository to consider our proposal contained within this pull request for merging.

@rsandell
Copy link
Member

🐝 👍 Just waiting for Jenkins to finish the check before merging

rsandell added a commit that referenced this pull request Feb 18, 2016
change unzip reading method for byte to byte instead of chars
@rsandell rsandell merged commit 2010d7b into jenkinsci:master Feb 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants