Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for trailing text after the closing quote, and EOF without a final closing quote, for Excel compatibility. Fix a unit test and add a RAT exclude for the sample CSV file. #303

Closed
wants to merge 4 commits into from

Conversation

DamjanJovanovic
Copy link

Continued from PR 295.

The test was failing because the test was wrong, not because my patches were wrong.

The test should match Excel's interpretation of the CSV file. Excel fuses lines 3 and 4 together, because the last field on line 3 doesn't end in a quote, so it continues into the next line. There, it stops at the initial quote, unquoting that portion, then also adds everything up to the comma, and all this becomes field 2 of line 3. The remaining fields on line 4 are interpreted as successive line 3 fields, and because the last field doesn't have a terminating quote, and the file ends in a new line, the last field also ends in a new line.

Once these corrections are made, the test passes.

Also add a RAT exclude for the sample file, which was missed out in commit 1269c13, and breaks the build.

@codecov-commenter
Copy link

Codecov Report

Merging #303 (930f561) into master (b1bdb99) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master     #303   +/-   ##
=========================================
  Coverage     97.91%   97.91%           
  Complexity      553      553           
=========================================
  Files            11       11           
  Lines          1200     1200           
  Branches        206      206           
=========================================
  Hits           1175     1175           
  Misses           13       13           
  Partials         12       12           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@garydgregory
Copy link
Member

"The test was failing because the test was wrong"

This is why I kept asking (twice) for file-based tests... please add file-based tests that cover use cases for these two new options beyond the one you changed. Let's make sure there are actual files that cover what Excel allows.

@garydgregory
Copy link
Member

@DamjanJovanovic

  • This functionality is now in git master, you are credited in changes.xml.
  • The name of the new properties are different though: setTrailingData(boolean) and setLenientEof(boolean).
  • Closing this PR.

TY!

asfgit pushed a commit that referenced this pull request Mar 12, 2024
Add support for trailing text after the closing quote, and EOF without a
final closing quote, for Excel compatibility. Fix a unit test and add a
RAT exclude for the sample CSV file.
williamhyun pushed a commit to apache/orc that referenced this pull request May 8, 2024
Bumps [org.apache.commons:commons-csv](https://github.com/apache/commons-csv) from 1.10.0 to 1.11.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/apache/commons-csv/blob/master/RELEASE-NOTES.txt">org.apache.commons:commons-csv's changelog</a>.</em></p>
<blockquote>
<p>Apache Commons CSV Version 1.11.0
Release Notes</p>
<p>This document contains the release notes for the 1.11.0 version of Apache Commons CSV.
Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p>
<p>Commons CSV requires at least Java 8.</p>
<p>The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types.</p>
<p>Feature and bug fix release (Java 8 or above)</p>
<p>Changes in this version include:</p>
<h2>New Features</h2>
<ul>
<li>CSV-308:  [Javadoc] Add example to CSVFormat#setHeaderComments() <a href="https://redirect.github.com/apache/commons-csv/issues/344">#344</a>. Thanks to Buddhi De Silva, Gary Gregory.</li>
<li>
<pre><code>      Add and use CSVFormat#setTrailingData(boolean) in CSVFormat.EXCEL for Excel compatibility [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Add and use CSVFormat#setLenientEof(boolean) in CSVFormat.EXCEL for Excel compatibility [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory.
</code></pre>
</li>
</ul>
<h2>Fixed Bugs</h2>
<ul>
<li>CSV-306:  Replace deprecated method in user guide, update external link <a href="https://redirect.github.com/apache/commons-csv/issues/324">#324</a>, <a href="https://redirect.github.com/apache/commons-csv/issues/325">#325</a>. Thanks to Sam Ng, Bruno P. Kinoshita.</li>
<li>
<pre><code>      Document duplicate header behavior [#309](apache/commons-csv#309). Thanks to Seth Falco, Bruno P. Kinoshita.
</code></pre>
</li>
<li>
<pre><code>      Add missing docs [#328](apache/commons-csv#328). Thanks to jkbkupczyk.
</code></pre>
</li>
<li>
<pre><code>      [StepSecurity] CI: Harden GitHub Actions [#329](apache/commons-csv#329), [#330](apache/commons-csv#330). Thanks to step-security-bot.
</code></pre>
</li>
<li>CSV-147:  Better error message during faulty CSV record read <a href="https://redirect.github.com/apache/commons-csv/issues/347">#347</a>. Thanks to Steven Peterson, Benedikt Ritter, Gary Gregory, Joerg Schaible, Buddhi De Silva, Elliotte Rusty Harold.</li>
<li>CSV-310:  Misleading error message when QuoteMode set to None <a href="https://redirect.github.com/apache/commons-csv/issues/352">#352</a>. Thanks to Buddhi De Silva.</li>
<li>CSV-311:  OutOfMemory for very long rows despite using column value of type Reader. Thanks to Christian Feuersaenger, Gary Gregory.</li>
<li>
<pre><code>      Use try-with-resources to manage JDBC Clob in CSVPrinter.printRecords(ResultSet). Thanks to Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      JDBC Blob columns are now output as Base64 instead of Object#toString(), which usually is InputStream#toString(). Thanks to Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Support unusual Excel use cases: Add support for trailing data after the closing quote, and EOF without a final closing quote [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      MongoDB CSV empty first column parsing fix [#412](apache/commons-csv#412). Thanks to Igor Kamyshnikov, Gary Gregory.
</code></pre>
</li>
</ul>
<h2>Changes</h2>
<ul>
<li>
<pre><code>      Bump commons-io:commons-io: from 2.11.0 to 2.16.1 [#408](apache/commons-csv#408), [#413](apache/commons-csv#413). Thanks to Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Bump commons-parent from 57 to 69 [#410](apache/commons-csv#410). Thanks to Gary Gregory, Dependabot.
</code></pre>
</li>
<li>
<pre><code>      Bump h2 from 2.1.214 to 2.2.224 [#333](apache/commons-csv#333), [#349](apache/commons-csv#349), [#359](apache/commons-csv#359). Thanks to Dependabot.
</code></pre>
</li>
<li>
<pre><code>      Bump commons-lang3 from 3.12.0 to 3.14.0. Thanks to Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Update exception message in CSVRecord#getNextRecord() [#348](apache/commons-csv#348). Thanks to Buddhi De Silva, Michael Osipov, Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Bump tests using com.opencsv:opencsv from 5.8 to 5.9 [#373](apache/commons-csv#373). Thanks to Dependabot.
</code></pre>
</li>
</ul>
<p>Historical list of changes: <a href="https://commons.apache.org/proper/commons-csv/changes-report.html">https://commons.apache.org/proper/commons-csv/changes-report.html</a></p>
<p>For complete information on Apache Commons CSV, including instructions on how to submit bug reports,</p>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/apache/commons-csv/commit/74e12741b24e724bb2e60109daa0c834fd75a68a"><code>74e1274</code></a> Prepare for the next release candidate</li>
<li><a href="https://github.com/apache/commons-csv/commit/89cbc7bb3f7f840045ee1fa17863830110e8aebe"><code>89cbc7b</code></a> Prepare for the next release candidate</li>
<li><a href="https://github.com/apache/commons-csv/commit/447682ec4a4bba7ea3c4edf89a87c63ff5bf718e"><code>447682e</code></a> Match version to POM</li>
<li><a href="https://github.com/apache/commons-csv/commit/4c186f27f7b340aa7d78dc68d380200bcb49bb46"><code>4c186f2</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/420">#420</a> from apache/dependabot/github_actions/actions/checkou...</li>
<li><a href="https://github.com/apache/commons-csv/commit/8af37f7992e3e7fb37e0f2a5a9b02f27b9cb5e84"><code>8af37f7</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/418">#418</a> from apache/dependabot/github_actions/github/codeql-a...</li>
<li><a href="https://github.com/apache/commons-csv/commit/2238314ef83214142a4b6304c3cc36a20749b953"><code>2238314</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/419">#419</a> from apache/dependabot/github_actions/actions/upload-...</li>
<li><a href="https://github.com/apache/commons-csv/commit/2ccf6686364c9183a03ab52c944f63695abc2843"><code>2ccf668</code></a> Bump actions/checkout from 4.1.2 to 4.1.4</li>
<li><a href="https://github.com/apache/commons-csv/commit/26cf90ecbffaf0243dd01cdf941d0c13fb875a88"><code>26cf90e</code></a> Bump actions/upload-artifact from 4.3.2 to 4.3.3</li>
<li><a href="https://github.com/apache/commons-csv/commit/586310afbc7f93c356ede7602706b3a2a5a6b916"><code>586310a</code></a> Bump github/codeql-action from 3.25.1 to 3.25.3</li>
<li><a href="https://github.com/apache/commons-csv/commit/bea505a55b6ab3c4eca27f395b9d6fa6787d496a"><code>bea505a</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/416">#416</a> from apache/dependabot/github_actions/actions/upload-...</li>
<li>Additional commits viewable in <a href="https://github.com/apache/commons-csv/compare/rel/commons-csv-1.10.0...rel/commons-csv-1.11.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.commons:commons-csv&package-manager=maven&previous-version=1.10.0&new-version=1.11.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>

Closes #1923 from dependabot[bot]/dependabot/maven/java/org.apache.commons-commons-csv-1.11.0.

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: William Hyun <william@apache.org>
dongjoon-hyun pushed a commit to apache/orc that referenced this pull request May 8, 2024
Bumps [org.apache.commons:commons-csv](https://github.com/apache/commons-csv) from 1.10.0 to 1.11.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/apache/commons-csv/blob/master/RELEASE-NOTES.txt">org.apache.commons:commons-csv's changelog</a>.</em></p>
<blockquote>
<p>Apache Commons CSV Version 1.11.0
Release Notes</p>
<p>This document contains the release notes for the 1.11.0 version of Apache Commons CSV.
Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p>
<p>Commons CSV requires at least Java 8.</p>
<p>The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types.</p>
<p>Feature and bug fix release (Java 8 or above)</p>
<p>Changes in this version include:</p>
<h2>New Features</h2>
<ul>
<li>CSV-308:  [Javadoc] Add example to CSVFormat#setHeaderComments() <a href="https://redirect.github.com/apache/commons-csv/issues/344">#344</a>. Thanks to Buddhi De Silva, Gary Gregory.</li>
<li>
<pre><code>      Add and use CSVFormat#setTrailingData(boolean) in CSVFormat.EXCEL for Excel compatibility [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Add and use CSVFormat#setLenientEof(boolean) in CSVFormat.EXCEL for Excel compatibility [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory.
</code></pre>
</li>
</ul>
<h2>Fixed Bugs</h2>
<ul>
<li>CSV-306:  Replace deprecated method in user guide, update external link <a href="https://redirect.github.com/apache/commons-csv/issues/324">#324</a>, <a href="https://redirect.github.com/apache/commons-csv/issues/325">#325</a>. Thanks to Sam Ng, Bruno P. Kinoshita.</li>
<li>
<pre><code>      Document duplicate header behavior [#309](apache/commons-csv#309). Thanks to Seth Falco, Bruno P. Kinoshita.
</code></pre>
</li>
<li>
<pre><code>      Add missing docs [#328](apache/commons-csv#328). Thanks to jkbkupczyk.
</code></pre>
</li>
<li>
<pre><code>      [StepSecurity] CI: Harden GitHub Actions [#329](apache/commons-csv#329), [#330](apache/commons-csv#330). Thanks to step-security-bot.
</code></pre>
</li>
<li>CSV-147:  Better error message during faulty CSV record read <a href="https://redirect.github.com/apache/commons-csv/issues/347">#347</a>. Thanks to Steven Peterson, Benedikt Ritter, Gary Gregory, Joerg Schaible, Buddhi De Silva, Elliotte Rusty Harold.</li>
<li>CSV-310:  Misleading error message when QuoteMode set to None <a href="https://redirect.github.com/apache/commons-csv/issues/352">#352</a>. Thanks to Buddhi De Silva.</li>
<li>CSV-311:  OutOfMemory for very long rows despite using column value of type Reader. Thanks to Christian Feuersaenger, Gary Gregory.</li>
<li>
<pre><code>      Use try-with-resources to manage JDBC Clob in CSVPrinter.printRecords(ResultSet). Thanks to Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      JDBC Blob columns are now output as Base64 instead of Object#toString(), which usually is InputStream#toString(). Thanks to Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Support unusual Excel use cases: Add support for trailing data after the closing quote, and EOF without a final closing quote [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      MongoDB CSV empty first column parsing fix [#412](apache/commons-csv#412). Thanks to Igor Kamyshnikov, Gary Gregory.
</code></pre>
</li>
</ul>
<h2>Changes</h2>
<ul>
<li>
<pre><code>      Bump commons-io:commons-io: from 2.11.0 to 2.16.1 [#408](apache/commons-csv#408), [#413](apache/commons-csv#413). Thanks to Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Bump commons-parent from 57 to 69 [#410](apache/commons-csv#410). Thanks to Gary Gregory, Dependabot.
</code></pre>
</li>
<li>
<pre><code>      Bump h2 from 2.1.214 to 2.2.224 [#333](apache/commons-csv#333), [#349](apache/commons-csv#349), [#359](apache/commons-csv#359). Thanks to Dependabot.
</code></pre>
</li>
<li>
<pre><code>      Bump commons-lang3 from 3.12.0 to 3.14.0. Thanks to Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Update exception message in CSVRecord#getNextRecord() [#348](apache/commons-csv#348). Thanks to Buddhi De Silva, Michael Osipov, Gary Gregory.
</code></pre>
</li>
<li>
<pre><code>      Bump tests using com.opencsv:opencsv from 5.8 to 5.9 [#373](apache/commons-csv#373). Thanks to Dependabot.
</code></pre>
</li>
</ul>
<p>Historical list of changes: <a href="https://commons.apache.org/proper/commons-csv/changes-report.html">https://commons.apache.org/proper/commons-csv/changes-report.html</a></p>
<p>For complete information on Apache Commons CSV, including instructions on how to submit bug reports,</p>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/apache/commons-csv/commit/74e12741b24e724bb2e60109daa0c834fd75a68a"><code>74e1274</code></a> Prepare for the next release candidate</li>
<li><a href="https://github.com/apache/commons-csv/commit/89cbc7bb3f7f840045ee1fa17863830110e8aebe"><code>89cbc7b</code></a> Prepare for the next release candidate</li>
<li><a href="https://github.com/apache/commons-csv/commit/447682ec4a4bba7ea3c4edf89a87c63ff5bf718e"><code>447682e</code></a> Match version to POM</li>
<li><a href="https://github.com/apache/commons-csv/commit/4c186f27f7b340aa7d78dc68d380200bcb49bb46"><code>4c186f2</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/420">#420</a> from apache/dependabot/github_actions/actions/checkou...</li>
<li><a href="https://github.com/apache/commons-csv/commit/8af37f7992e3e7fb37e0f2a5a9b02f27b9cb5e84"><code>8af37f7</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/418">#418</a> from apache/dependabot/github_actions/github/codeql-a...</li>
<li><a href="https://github.com/apache/commons-csv/commit/2238314ef83214142a4b6304c3cc36a20749b953"><code>2238314</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/419">#419</a> from apache/dependabot/github_actions/actions/upload-...</li>
<li><a href="https://github.com/apache/commons-csv/commit/2ccf6686364c9183a03ab52c944f63695abc2843"><code>2ccf668</code></a> Bump actions/checkout from 4.1.2 to 4.1.4</li>
<li><a href="https://github.com/apache/commons-csv/commit/26cf90ecbffaf0243dd01cdf941d0c13fb875a88"><code>26cf90e</code></a> Bump actions/upload-artifact from 4.3.2 to 4.3.3</li>
<li><a href="https://github.com/apache/commons-csv/commit/586310afbc7f93c356ede7602706b3a2a5a6b916"><code>586310a</code></a> Bump github/codeql-action from 3.25.1 to 3.25.3</li>
<li><a href="https://github.com/apache/commons-csv/commit/bea505a55b6ab3c4eca27f395b9d6fa6787d496a"><code>bea505a</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/416">#416</a> from apache/dependabot/github_actions/actions/upload-...</li>
<li>Additional commits viewable in <a href="https://github.com/apache/commons-csv/compare/rel/commons-csv-1.10.0...rel/commons-csv-1.11.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.commons:commons-csv&package-manager=maven&previous-version=1.10.0&new-version=1.11.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>

Closes #1923 from dependabot[bot]/dependabot/maven/java/org.apache.commons-commons-csv-1.11.0.

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: William Hyun <william@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants