ORC-751: [C++] Implement Predicate Pushdown for C++ Reader #476

wgtmac · 2020-02-04T15:55:16Z

Use RowReaderOptions to pass SearchArgument to enable PPD.
Modify RowReaderImpl::startNextStripe to seek to next matched row group
based on PPD.
RowReaderImpl::next seeks to next matched row group based on PPD result.
RowReaderImpl::seekToRow also jumps to the 1st matched row group after
target row.

1. Use RowReaderOptions to pass SearchArgument to enable PPD. 2. Modify RowReaderImpl::startNextStripe to seek to next matched row group based on PPD. 3. RowReaderImpl::next seeks to next matched row group based on PPD result. 4. RowReaderImpl::seekToRow also jumps to the 1st matched row group after target row.

wgtmac · 2020-02-04T16:10:22Z

@stiga-huang Can you help review and verify this patch in Impala? Thanks!

luksan47 · 2020-03-04T14:45:11Z

Hi, @wgtmac, I've started to implement predicate pushdown in Impala, will test the PR that way.

wgtmac · 2020-03-05T06:33:11Z

@luksan47 cool! Let me know if you have any feedback.

csringhofer · 2020-03-16T21:31:36Z

I reviewed Norbert's implementation for Impala (https://gerrit.cloudera.org/#/c/15403/) and found some problems that are very hard to implement on the client side:
https://issues.apache.org/jira/browse/ORC-611
https://issues.apache.org/jira/browse/ORC-612

The issues are related to types that are generally problematic (TIMESTAMP, CHAR, VARCHAR). It is possible to disable predicate pushdown for this types, but in the long term it would be good to have a proper solution.

Can you look at the issues? I think that it would be much better to implement them in the ORC lib, because it is easy to create buggy implementations in clients.

wgtmac · 2020-03-17T08:27:12Z

@csringhofer Thanks for letting me know! I have also noticed that I need to use lowerbound & upperbound in TimestampColumnStatistics to handle timezone difference. Will take a deeper look into these issues.

csringhofer · 2020-03-18T18:24:41Z

c++/src/Reader.cc

+        // it is guaranteed to be at start of a row group
+        currentRowInStripe = nextRowToRead;
+        if (currentRowInStripe < rowsInCurrentStripe) {
+          seekToRowGroup(static_cast<uint32_t>(currentRowInStripe / footer->rowindexstride()));


I think that there is a hidden performance regression here:
at the end of the call chain, e.g. ColumnReader::seekToRowGroup(), RleDecoderV2::seek(), ZlibDecompressionStream::seek(), the inputBuffer is reset to nullptr:

orc/c++/src/Compression.cc

Line 573 in 028261a

inputBuffer = nullptr;

This leads to calling ZlibDecompressionStream::readBuffer() and reading from the stream again, even if the whole stream was buffered previously to inputBuffer (which is usually the case in Impala, as we set the block size to 8 MB, but can also cause problems with the default 256KB).
In the worst case this can lead to reading the whole column O(num_of_row_groups) times.

The solution could be to make ZlibDecompressionStream/BlockDecompressionStream::seek() smarter and keep the inputBuffer if the new position (which is the header byte of the new compression block) is still in it.

A further improvement could be to make ZlibDecompressionStream/BlockDecompressionStream::Skip() smarter to avoid decompressing all the skipped blocks in this case (

orc/c++/src/Compression.cc

Line 545 in 028261a

// this is a stupid implementation for now.

), but this is not a regression so can go to another patch.

You are right. Performance of seeking is another headache if you use large compression blocks. In our internal ORC setup, we have optmized this by finishing compression blocks at the end of every row group. In this way, seeking will never read compression blocks of other row groups.

I have created a Jira (ORC-614) for the optimization of seek(), and we put together an implementation with Norbert, he will upload it soon as pull request. It seems to solve the regressions we experienced.

About finishing compression blocks at row group end: I actually created a ticket (IMPALA-8449) to do something similar in Parquet:
In Parquet we came from the other direction - pages are both the unit of indexing and compression in Parquet, but they are not aligned, so unlike ORC, index entries can point to different row numbers for every column. This decreases decreases the efficiency of combining filters for different columns + makes the reader implementation very complex.

c++/include/orc/Reader.hh

The current implementation of ZlibDecompressionStream::seek and BlockDecompressionStream::seek resets the state of the decompressor and the underlying file reader and throws away their buffers. This commit introduces two optimizations which rely on reusing the buffers that still contain useful data, and therefore reducing the time spent reading/uncompressing the buffers again. The first case is when the seeked position is already read from the input stream, but has not been decompressed yet, ie. it's not in the output stream. The second case is when the seeked position is already read and decompressed into the output stream. Moved the common data of the decompression streams into a common class. Tests: - Run the ORC tests, and the Impala tests working on ORC tables. - The regression that apache#476 would cause is not present anymore.

The current implementation of ZlibDecompressionStream::seek and BlockDecompressionStream::seek resets the state of the decompressor and the underlying file reader and throws away their buffers. This commit introduces two optimizations which rely on reusing the buffers that still contain useful data, and therefore reducing the time spent reading/uncompressing the buffers again. The first case is when the seeked position is already read and decompressed into the output stream. The second case is when the seeked position is already read from the input stream, but has not been decompressed yet, ie. it's not in the output stream. Tests: - Run the ORC tests, and the Impala tests working on ORC tables. - The regression that apache#476 would cause is not present anymore.

1. add stats to reflect ppd effectiveness. 2. disable ppd for char and varchar types.

wgtmac · 2020-09-08T14:03:25Z

@csringhofer @luksan47 @stiga-huang I have changed this PR to reflect the feedback. The fix for timestamp is addressed by a separate PR: #543. Please review it again. Sorry for the long delay.

### What changes were proposed in this pull request? Consistent TypeDescription handling for quoted field names ### Why are the changes needed? SARGs failing due to incorrect handling of quoted fieldNames ### How was this patch tested? TestVectorOrcFile.testQuotedPredicatePushdown

### What changes were proposed in this pull request? This PR updates the scan tool to print information about where the file is corrupted. It * reads data by batches until there is a problem * tries re-reading that batch column by column to find which column is corrupted * figures out the next location that the reader can seek to ### Why are the changes needed? It helps diagnose where (row & column) an ORC file is corrupted. ### How was this patch tested? It was tested on ORC files that were corrupted by bad machines.

Signed-off-by: Owen O'Malley <omalley@apache.org>

…messages optional. (apache#583) ### What changes were proposed in this pull request? - Change the stripe id to be 0 based, which seems more consistent. - Change the end of file stripe id to be the number of stripes rather than -1. - Limit the recovery row to the start of the next stripe. - Add --verbose that prints exceptions. - Fix the --help to give more information ### How was this patch tested? Tested by hand on the example files. Signed-off-by: Owen O'Malley <omalley@apache.org>

### What changes were proposed in this pull request? Use 64-bit versions of fstat to support files bigger than 2GB in Windows. ### Why are the changes needed? To support big files. ### How was this patch tested? Manual

### What changes were proposed in this pull request? This PR aims to only publish snapshots at apache orc repo by adding an if statement. ### Why are the changes needed? To save the resources at the downstream. ### How was this patch tested? N/A

Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This PR updates links for Trino. ### Why are the changes needed? PrestoSQL has been renamed to Trino. See: https://trino.io/blog/2020/12/27/announcing-trino.html ### How was this patch tested? N/A

apache#588) ### What changes were proposed in this pull request? RecordReaderImp should pass down the writer calendar info (writerUsedProlepticGregorian) when evaluating predicates to make sure column stats are properly deserialized (affects TimestampStatistics) ### Why are the changes needed? Correct evaluation of predicates with Timestamps ### How was this patch tested? TestRecordReaderImpl.testPredEvalTimestampStatsDiffWriter

### What changes were proposed in this pull request? Special ConvertTreeReader for Boolean using StringGroupFromAnyIntegerTreeReader for String/Char/Varchar types ### Why are the changes needed? Properly handle Boolean to String/Char/Varchar conversions ### How was this patch tested? TestSchemaEvolution.testBooleanToStringEvolution

Signed-off-by: Owen O'Malley <omalley@apache.org>

Fixes apache#640 Signed-off-by: Owen O'Malley <omalley@apache.org>

wgtmac · 2021-02-16T15:09:57Z

@dongjoon-hyun @pgaref This PR has been sleeping for a decade. Really appreciate it if you can help reviewing it once get the chance. Thanks!

pgaref

Hey @wgtmac thanks for the reminder!
Just went through the code and left some comments -- I believe we should add some more Tests here to increase confidence. Few example include: testPredEvalWithBooleanStats, testPredEvalWithIntStats etc.

orc/java/core/src/test/org/apache/orc/impl/TestRecordReaderImpl.java

Line 581 in 949c744

public void testPredEvalWithBooleanStats() throws Exception {

On another note, ORC-40 seems to be about building SearchArgument so we should move this to a new ticket -- open ORC-751 for this

c++/include/orc/Reader.hh

c++/src/Reader.cc

c++/src/sargs/SargsApplier.cc

c++/test/TestPredicatePushdown.cc

c++/src/Reader.cc

wgtmac · 2021-02-17T13:49:43Z

Thanks @pgaref for the comments! I will fix them and then let you know.

dongjoon-hyun · 2021-02-17T19:41:24Z

Thank you for pinging me, @wgtmac .

c++/src/Reader.hh

c++/src/ColumnReader.cc

dongjoon-hyun

I didn't compare the logic with Java reader yet, but I'm wondering if we keep the logic structure consistently in both readers.

Fixes apache#642 Signed-off-by: Owen O'Malley <omalley@apache.org>

This enables the analyze profile. As part of the change, it also fixes the real findbugs warnings and blocks some false positives. Fixes apache#643 Signed-off-by: Owen O'Malley <omalley@apache.org>

### What changes were proposed in this pull request? Broken link ### Why are the changes needed? Fix master build status badge ### How was this patch tested? Grip locally

Fixes apache#637 Signed-off-by: Owen O'Malley <oomalley@linkedin.com>

* Replaced Charset.forName with StandardCharsets * Final on some variables * Removed final on a private method * Simplified some conditions * Removed redundant casts * Removed duplicate condition branches * Removed the maven compiler config setting in bench

dongjoon-hyun · 2021-02-28T19:06:24Z

Since #645 is merged, could you rebase this to the master once more, @wgtmac ?

wgtmac · 2021-03-01T15:53:01Z

Since #645 is merged, could you rebase this to the master once more, @wgtmac ?

@dongjoon-hyun I'm working on it. Will let you know once ready.

wgtmac · 2021-03-02T13:57:42Z

Hey @wgtmac thanks for the reminder!
Just went through the code and left some comments -- I believe we should add some more Tests here to increase confidence. Few example include: testPredEvalWithBooleanStats, testPredEvalWithIntStats etc.

orc/java/core/src/test/org/apache/orc/impl/TestRecordReaderImpl.java

Line 581 in 949c744

public void testPredEvalWithBooleanStats() throws Exception {

On another note, ORC-40 seems to be about building SearchArgument so we should move this to a new ticket -- open ORC-751 for this

I have covered these kinds of cases in the TestPredicateLeaf.cc in a earlier commit. I can add more progressively to gain more confidence.

dongjoon-hyun · 2021-03-02T20:07:42Z

Thank you for updating, @wgtmac . But, it looks a little weird to me~
Now this PR has 53 commits and 775 file changes.

dongjoon-hyun · 2021-03-02T20:08:22Z

.asf.yaml

+  enabled_merge_buttons:
+    merge: false
+    squash: true
+    rebase: true


This should not be here.

) ### What changes were proposed in this pull request? The current implementation of ZlibDecompressionStream::seek and BlockDecompressionStream::seek resets the state of the decompressor and the underlying file reader and throws away their buffers. ### Why are the changes needed? This commit introduces two optimizations which rely on reusing the buffers that still contain useful data, and therefore reducing the time spent reading/uncompressing the buffers again. The first case is when the seeked position is already read and decompressed into the output stream. The second case is when the seeked position is already read from the input stream, but has not been decompressed yet, ie. it's not in the output stream. ### How was this patch tested? Tests: - Run the ORC tests, and the Impala tests working on ORC tables. - The regression that #476 would cause is not present anymore.

Bumps [org.apache.commons:commons-csv](https://github.com/apache/commons-csv) from 1.11.0 to 1.12.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/apache/commons-csv/blob/master/RELEASE-NOTES.txt">org.apache.commons:commons-csv's changelog</a>.</em></p> <blockquote> <p>Apache Commons CSV Version 1.12.0 Release Notes</p> <p>This document contains the release notes for the 1.12.0 version of Apache Commons CSV. Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p> <p>Commons CSV requires at least Java 8.</p> <p>The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types.</p> <p>Feature and bug fix release (Java 8 or above)</p> <p>Changes in this version include:</p> <h2>New Features</h2> <ul> <li>CSV-270: Add CSVException that extends IOException thrown on invalid input instead of IOException. Thanks to Thomas Kamps, Gary Gregory.</li> </ul> <h2>Fixed Bugs</h2> <ul> <li> <pre><code> Fix PMD issues for port to PMD 7.1.0. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Fix some Javadoc links [#442](apache/commons-csv#442). Thanks to Dávid Szigecsán, Gary Gregory. </code></pre> </li> <li> <pre><code> Extract duplicated code into a method [#444](apache/commons-csv#444). Thanks to Dávid Szigecsán. </code></pre> </li> <li> <pre><code> Migrate CSVFormat#print(File, Charset) to NIO [#445](apache/commons-csv#445). Thanks to Dávid Szigecsán. </code></pre> </li> <li> <pre><code> Fix documentation for CSVFormat private constructor [#466](apache/commons-csv#466). Thanks to Dávid Szigecsán. </code></pre> </li> <li>CSV-294: CSVFormat does not support explicit " as escape char. Thanks to Joern Huxhorn, Gary Gregory.</li> <li>CSV-150: Escaping is not disableable. Thanks to dota17, Gary Gregory, Jörn Huxhorn.</li> <li> <pre><code> Fix Javadoc warnings on Java 23. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Improve parser performance by up to 20%, YMMV. Thanks to Gary Gregory. </code></pre> </li> </ul> <h2>Changes</h2> <ul> <li> <pre><code> Bump commons-codec:commons-codec from 1.16.1 to 1.17.1 [#422](apache/commons-csv#422), [#449](apache/commons-csv#449). Thanks to Dependabot. </code></pre> </li> <li> <pre><code> Bump org.apache.commons:commons-parent from 69 to 75 [#435](apache/commons-csv#435), [#452](apache/commons-csv#452), [#465](apache/commons-csv#465), [#468](apache/commons-csv#468), [#475](apache/commons-csv#475). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump org.codehaus.mojo:taglist-maven-plugin from 3.0.0 to 3.1.0 [#441](apache/commons-csv#441). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump org.apache.commons:commons-lang3 from 3.14.0 to 3.17.0 [#450](apache/commons-csv#450), [#459](apache/commons-csv#459), [#470](apache/commons-csv#470). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump org.hamcrest:hamcrest from 2.2 to 3.0 [#455](apache/commons-csv#455). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump commons-io:commons-io from 2.16.1 to 2.17.0 [#476](apache/commons-csv#476). Thanks to Gary Gregory, Dependabot. </code></pre> </li> </ul> <p>Historical list of changes: <a href="https://commons.apache.org/proper/commons-csv/changes-report.html">https://commons.apache.org/proper/commons-csv/changes-report.html</a></p> <p>For complete information on Apache Commons CSV, including instructions on how to submit bug reports, patches, or suggestions for improvement, see the Apache Commons CSV website:</p> <p><a href="https://commons.apache.org/proper/commons-csv/">https://commons.apache.org/proper/commons-csv/</a></p> <p>Download page: <a href="https://commons.apache.org/proper/commons-csv/download_csv.cgi">https://commons.apache.org/proper/commons-csv/download_csv.cgi</a></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/apache/commons-csv/commit/67f0d6b30465d817a341b2e9cd31660a646e980c"><code>67f0d6b</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/89eacd90ef235444a79d16d695fec3ff9eb008d4"><code>89eacd9</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/f4e3269626140684b58901db9b9cd233ac960c9f"><code>f4e3269</code></a> Merge branch 'master' of <a href="https://gitbox.apache.org/repos/asf/commons-csv">https://gitbox.apache.org/repos/asf/commons-csv</a></li> <li><a href="https://github.com/apache/commons-csv/commit/342547b911dfe919787d9f53fb330f7d926ec6c3"><code>342547b</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/113147f1e7fbd84fe3948c52fda0ee2f6c6a8ea2"><code>113147f</code></a> Add dependency-review.yml to GitHub CI</li> <li><a href="https://github.com/apache/commons-csv/commit/86ce50b8078761e22cd947be247d5b66dda0d6b9"><code>86ce50b</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/478">#478</a> from apache/dependabot/github_actions/github/codeql-a...</li> <li><a href="https://github.com/apache/commons-csv/commit/5b2c26eedf942fc70080010af375b35ff3ddbbde"><code>5b2c26e</code></a> Bump github/codeql-action from 3.26.6 to 3.26.8</li> <li><a href="https://github.com/apache/commons-csv/commit/4f4b9cf2516762cd766368759c2c122f19f0caa5"><code>4f4b9cf</code></a> Sort members</li> <li><a href="https://github.com/apache/commons-csv/commit/6a11b896aa8601deceb119c3ee1fd32e2efe276c"><code>6a11b89</code></a> Enable Checkstyle for test sources and fix issues</li> <li><a href="https://github.com/apache/commons-csv/commit/fce94ea666f09f4c162cd1b67b86bd6e271a9558"><code>fce94ea</code></a> Fix header for Checkstyle</li> <li>Additional commits viewable in <a href="https://github.com/apache/commons-csv/compare/rel/commons-csv-1.11.0...rel/commons-csv-1.12.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.commons:commons-csv&package-manager=maven&previous-version=1.11.0&new-version=1.12.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Closes #2043 from dependabot[bot]/dependabot/maven/java/org.apache.commons-commons-csv-1.12.0. Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

Bumps [org.apache.commons:commons-csv](https://github.com/apache/commons-csv) from 1.11.0 to 1.12.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/apache/commons-csv/blob/master/RELEASE-NOTES.txt">org.apache.commons:commons-csv's changelog</a>.</em></p> <blockquote> <p>Apache Commons CSV Version 1.12.0 Release Notes</p> <p>This document contains the release notes for the 1.12.0 version of Apache Commons CSV. Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p> <p>Commons CSV requires at least Java 8.</p> <p>The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types.</p> <p>Feature and bug fix release (Java 8 or above)</p> <p>Changes in this version include:</p> <h2>New Features</h2> <ul> <li>CSV-270: Add CSVException that extends IOException thrown on invalid input instead of IOException. Thanks to Thomas Kamps, Gary Gregory.</li> </ul> <h2>Fixed Bugs</h2> <ul> <li> <pre><code> Fix PMD issues for port to PMD 7.1.0. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Fix some Javadoc links [#442](apache/commons-csv#442). Thanks to Dávid Szigecsán, Gary Gregory. </code></pre> </li> <li> <pre><code> Extract duplicated code into a method [#444](apache/commons-csv#444). Thanks to Dávid Szigecsán. </code></pre> </li> <li> <pre><code> Migrate CSVFormat#print(File, Charset) to NIO [#445](apache/commons-csv#445). Thanks to Dávid Szigecsán. </code></pre> </li> <li> <pre><code> Fix documentation for CSVFormat private constructor [#466](apache/commons-csv#466). Thanks to Dávid Szigecsán. </code></pre> </li> <li>CSV-294: CSVFormat does not support explicit " as escape char. Thanks to Joern Huxhorn, Gary Gregory.</li> <li>CSV-150: Escaping is not disableable. Thanks to dota17, Gary Gregory, Jörn Huxhorn.</li> <li> <pre><code> Fix Javadoc warnings on Java 23. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Improve parser performance by up to 20%, YMMV. Thanks to Gary Gregory. </code></pre> </li> </ul> <h2>Changes</h2> <ul> <li> <pre><code> Bump commons-codec:commons-codec from 1.16.1 to 1.17.1 [#422](apache/commons-csv#422), [#449](apache/commons-csv#449). Thanks to Dependabot. </code></pre> </li> <li> <pre><code> Bump org.apache.commons:commons-parent from 69 to 75 [#435](apache/commons-csv#435), [#452](apache/commons-csv#452), [#465](apache/commons-csv#465), [#468](apache/commons-csv#468), [#475](apache/commons-csv#475). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump org.codehaus.mojo:taglist-maven-plugin from 3.0.0 to 3.1.0 [#441](apache/commons-csv#441). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump org.apache.commons:commons-lang3 from 3.14.0 to 3.17.0 [#450](apache/commons-csv#450), [#459](apache/commons-csv#459), [#470](apache/commons-csv#470). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump org.hamcrest:hamcrest from 2.2 to 3.0 [#455](apache/commons-csv#455). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump commons-io:commons-io from 2.16.1 to 2.17.0 [#476](apache/commons-csv#476). Thanks to Gary Gregory, Dependabot. </code></pre> </li> </ul> <p>Historical list of changes: <a href="https://commons.apache.org/proper/commons-csv/changes-report.html">https://commons.apache.org/proper/commons-csv/changes-report.html</a></p> <p>For complete information on Apache Commons CSV, including instructions on how to submit bug reports, patches, or suggestions for improvement, see the Apache Commons CSV website:</p> <p><a href="https://commons.apache.org/proper/commons-csv/">https://commons.apache.org/proper/commons-csv/</a></p> <p>Download page: <a href="https://commons.apache.org/proper/commons-csv/download_csv.cgi">https://commons.apache.org/proper/commons-csv/download_csv.cgi</a></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/apache/commons-csv/commit/67f0d6b30465d817a341b2e9cd31660a646e980c"><code>67f0d6b</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/89eacd90ef235444a79d16d695fec3ff9eb008d4"><code>89eacd9</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/f4e3269626140684b58901db9b9cd233ac960c9f"><code>f4e3269</code></a> Merge branch 'master' of <a href="https://gitbox.apache.org/repos/asf/commons-csv">https://gitbox.apache.org/repos/asf/commons-csv</a></li> <li><a href="https://github.com/apache/commons-csv/commit/342547b911dfe919787d9f53fb330f7d926ec6c3"><code>342547b</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/113147f1e7fbd84fe3948c52fda0ee2f6c6a8ea2"><code>113147f</code></a> Add dependency-review.yml to GitHub CI</li> <li><a href="https://github.com/apache/commons-csv/commit/86ce50b8078761e22cd947be247d5b66dda0d6b9"><code>86ce50b</code></a> Merge pull request <a href="https://redirect.github.com/apache/commons-csv/issues/478">#478</a> from apache/dependabot/github_actions/github/codeql-a...</li> <li><a href="https://github.com/apache/commons-csv/commit/5b2c26eedf942fc70080010af375b35ff3ddbbde"><code>5b2c26e</code></a> Bump github/codeql-action from 3.26.6 to 3.26.8</li> <li><a href="https://github.com/apache/commons-csv/commit/4f4b9cf2516762cd766368759c2c122f19f0caa5"><code>4f4b9cf</code></a> Sort members</li> <li><a href="https://github.com/apache/commons-csv/commit/6a11b896aa8601deceb119c3ee1fd32e2efe276c"><code>6a11b89</code></a> Enable Checkstyle for test sources and fix issues</li> <li><a href="https://github.com/apache/commons-csv/commit/fce94ea666f09f4c162cd1b67b86bd6e271a9558"><code>fce94ea</code></a> Fix header for Checkstyle</li> <li>Additional commits viewable in <a href="https://github.com/apache/commons-csv/compare/rel/commons-csv-1.11.0...rel/commons-csv-1.12.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.commons:commons-csv&package-manager=maven&previous-version=1.11.0&new-version=1.12.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Closes #2043 from dependabot[bot]/dependabot/maven/java/org.apache.commons-commons-csv-1.12.0. Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 0cf506b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

wgtmac force-pushed the ORC-40 branch from 804c7ea to 7c8cb98 Compare February 4, 2020 16:07

wgtmac requested review from majetideepak and xndai February 4, 2020 16:08

csringhofer reviewed Mar 18, 2020

View reviewed changes

luksan47 reviewed Mar 19, 2020

View reviewed changes

c++/include/orc/Reader.hh Show resolved Hide resolved

luksan47 mentioned this pull request Mar 25, 2020

ORC-614: Implement efficient seek() in decompression streams #499

Closed

Modify PPD based on feedbacks.

10b57ce

1. add stats to reflect ppd effectiveness. 2. disable ppd for char and varchar types.

dongjoon-hyun and others added 12 commits December 10, 2020 15:30

Update site for 1.6.6

661e06e

Follow up to ORC-697 to suppress findbugs check for exception.

2399426

Signed-off-by: Owen O'Malley <omalley@apache.org>

ORC-702: [C++] Support big ORC files in Windows (apache#584)

17fd9c3

### What changes were proposed in this pull request? Use 64-bit versions of fstat to support files bigger than 2GB in Windows. ### Why are the changes needed? To support big files. ### How was this patch tested? Manual

ORC-706: Put back DataReaderProperties default maxDiskRangeChunkLimit

710bace

Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

ORC-707: FIX tzdata to recover Win32 build in AppVeyor (apache#590)

c5efb95

autumnust and others added 2 commits February 8, 2021 12:55

Fix a typo in the OrcConf javadoc.

8e40078

Signed-off-by: Owen O'Malley <omalley@apache.org>

ORC-748: Add constants for Trino writer

e39fd44

Fixes apache#640 Signed-off-by: Owen O'Malley <omalley@apache.org>

pgaref reviewed Feb 17, 2021

View reviewed changes

dongjoon-hyun reviewed Feb 17, 2021

View reviewed changes

c++/src/Reader.hh Show resolved Hide resolved

dongjoon-hyun reviewed Feb 17, 2021

View reviewed changes

c++/src/ColumnReader.cc Show resolved Hide resolved

dongjoon-hyun reviewed Feb 17, 2021

View reviewed changes

omalley and others added 3 commits February 17, 2021 12:40

ORC-749 Add checkstyle:check to analyze profile.

0b5bebc

Fixes apache#642 Signed-off-by: Owen O'Malley <omalley@apache.org>

ORC-750: Fix bench to use orc pom as parent.

1513c59

This enables the analyze profile. As part of the change, it also fixes the real findbugs warnings and blocks some false positives. Fixes apache#643 Signed-off-by: Owen O'Malley <omalley@apache.org>

ORC-752: FIX Master branch build badge (apache#644)

b05350a

### What changes were proposed in this pull request? Broken link ### Why are the changes needed? Fix master build status badge ### How was this patch tested? Grip locally

noirello mentioned this pull request Feb 20, 2021

Reader can filter noirello/pyorc#35

Closed

autumnust and others added 3 commits February 23, 2021 10:47

ORC-747: Abstract Dictionary interface to enable a hash dictionary.

8c5814b

Fixes apache#637 Signed-off-by: Owen O'Malley <oomalley@linkedin.com>

[C++] Fix stream state of ColumnReader after seek (apache#645)

3937521

ORC-751: [C++] Implement Predicate Push Down for C++ Reader

d1ea232

wgtmac requested a review from dongjoon-hyun March 2, 2021 13:58

dongjoon-hyun reviewed Mar 2, 2021

View reviewed changes

.asf.yaml

enabled_merge_buttons:

merge: false

squash: true

rebase: true

Copy link

Member

dongjoon-hyun Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here.

wgtmac changed the title ~~ORC-40: [C++] Implement Predicate Pushdown for C++ Reader~~ ORC-751: [C++] Implement Predicate Pushdown for C++ Reader Mar 3, 2021

wgtmac closed this Mar 3, 2021

wgtmac mentioned this pull request May 5, 2021

ORC-614: [C++] Implement efficient seek() in decompression streams #695

Merged

wgtmac deleted the ORC-40 branch February 23, 2022 06:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORC-751: [C++] Implement Predicate Pushdown for C++ Reader #476

ORC-751: [C++] Implement Predicate Pushdown for C++ Reader #476

wgtmac commented Feb 4, 2020

wgtmac commented Feb 4, 2020

luksan47 commented Mar 4, 2020

wgtmac commented Mar 5, 2020

csringhofer commented Mar 16, 2020

wgtmac commented Mar 17, 2020

csringhofer Mar 18, 2020

wgtmac Mar 19, 2020

csringhofer Mar 25, 2020

wgtmac commented Sep 8, 2020

wgtmac commented Feb 16, 2021

pgaref left a comment

wgtmac commented Feb 17, 2021

dongjoon-hyun commented Feb 17, 2021

dongjoon-hyun left a comment

dongjoon-hyun commented Feb 28, 2021

wgtmac commented Mar 1, 2021

wgtmac commented Mar 2, 2021

dongjoon-hyun commented Mar 2, 2021 •

edited

Loading

dongjoon-hyun Mar 2, 2021

ORC-751: [C++] Implement Predicate Pushdown for C++ Reader #476

ORC-751: [C++] Implement Predicate Pushdown for C++ Reader #476

Conversation

wgtmac commented Feb 4, 2020

wgtmac commented Feb 4, 2020

luksan47 commented Mar 4, 2020

wgtmac commented Mar 5, 2020

csringhofer commented Mar 16, 2020

wgtmac commented Mar 17, 2020

csringhofer Mar 18, 2020

Choose a reason for hiding this comment

wgtmac Mar 19, 2020

Choose a reason for hiding this comment

csringhofer Mar 25, 2020

Choose a reason for hiding this comment

wgtmac commented Sep 8, 2020

wgtmac commented Feb 16, 2021

pgaref left a comment

Choose a reason for hiding this comment

wgtmac commented Feb 17, 2021

dongjoon-hyun commented Feb 17, 2021

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Feb 28, 2021

wgtmac commented Mar 1, 2021

wgtmac commented Mar 2, 2021

dongjoon-hyun commented Mar 2, 2021 • edited Loading

dongjoon-hyun Mar 2, 2021

Choose a reason for hiding this comment

dongjoon-hyun commented Mar 2, 2021 •

edited

Loading