change unzip reading method for byte to byte instead of chars #11

lordofthejars · 2016-02-17T18:42:18Z

In previous implementation the method used to read the content of a file was using BufferedReader.readLine method. This method skips line terminators, so what we did was to append them at the end. The problem is that if a file has a final line with only one line terminator, then this last line was skipped.

For example:

Hi\n
\n

And happened with:

Hi\n

In this last case the carriage return was trimmed so the file read was not exactly the same as the original one.

To avoid this a "byte to byte" reading is used. Notice that since all the content file is going to be in memory at the end (the method modified returns a Map with tuple <filename, content> we can read all the contents in a ByteArrayOutputStream.

This could be implemented in another ways but since at the end the String with all the content will be present, I have decided to implement it in the more compact possible way.

@reviewbybees

ghost · 2016-02-17T18:44:32Z

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

jenkinsadmin · 2016-02-17T19:03:00Z

Thank you for this pull request! Please check this document for how the Jenkins project handles pull requests.

oleg-nenashev · 2016-02-17T21:09:44Z

🐝

mslusarczyk · 2016-02-17T23:16:24Z

src/main/java/org/jenkinsci/plugins/pipeline/utility/steps/zip/UnZipStepExecution.java

+
+                            try (InputStream is = zip.getInputStream(entry); ByteArrayOutputStream output = new ByteArrayOutputStream()) {
+                                IOUtils.copyLarge(is, output);
+                                strMap.put(entry.getName(), new String(output.toByteArray(), Charset.defaultCharset()));


I think Charset.defaultCharset() can cause problems in some setups, ZipFile is using UTF8 internally.

You think it might be better to fix to UTF-8? I have never worked with ZipFiles so I used the same approach used previously. What I have read from JavaDoc is that The UTF-8 charset is used to decode the entry names and comments. so it seems that the content itself would be in the original encode right?

This is the contents of a File - those can be in any Charset under the sun - or even binary - of note is that the Manifest will be in UTF-8.
perhaps this step needs to take an option String charset parameter.

jtnord · 2016-02-18T12:48:55Z

🐝

lordofthejars · 2016-02-18T13:56:36Z

@reviewbybees done

ghost · 2016-02-18T14:03:06Z

This pull request has completed our internal processes and we now respectfully request the maintainers of this repository to consider our proposal contained within this pull request for merging.

rsandell · 2016-02-18T14:57:54Z

🐝 👍 Just waiting for Jenkins to finish the check before merging

change unzip reading method for byte to byte instead of chars

change unzip reading method for byte to byte instead of chars

662c54c

mslusarczyk reviewed Feb 17, 2016
View reviewed changes

add test for manifests created by Gradle.

76842c3

rsandell added a commit that referenced this pull request Feb 18, 2016

Merge pull request #11 from lordofthejars/master

2010d7b

change unzip reading method for byte to byte instead of chars

rsandell merged commit 2010d7b into jenkinsci:master Feb 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change unzip reading method for byte to byte instead of chars #11

change unzip reading method for byte to byte instead of chars #11

lordofthejars commented Feb 17, 2016

ghost commented Feb 17, 2016

jenkinsadmin commented Feb 17, 2016

oleg-nenashev commented Feb 17, 2016

mslusarczyk Feb 17, 2016

lordofthejars Feb 18, 2016

jtnord Feb 18, 2016

jtnord commented Feb 18, 2016

lordofthejars commented Feb 18, 2016

ghost commented Feb 18, 2016

rsandell commented Feb 18, 2016

change unzip reading method for byte to byte instead of chars #11

change unzip reading method for byte to byte instead of chars #11

Conversation

lordofthejars commented Feb 17, 2016

ghost commented Feb 17, 2016

jenkinsadmin commented Feb 17, 2016

oleg-nenashev commented Feb 17, 2016

mslusarczyk Feb 17, 2016

Choose a reason for hiding this comment

lordofthejars Feb 18, 2016

Choose a reason for hiding this comment

jtnord Feb 18, 2016

Choose a reason for hiding this comment

jtnord commented Feb 18, 2016

lordofthejars commented Feb 18, 2016

ghost commented Feb 18, 2016

rsandell commented Feb 18, 2016