Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-751: [C++] Implement Predicate Pushdown for C++ Reader #476

Closed
wants to merge 53 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
7c8cb98
ORC-40: [C++] Implement Predicate Pushdown for C++ Reader
wgtmac Feb 4, 2020
10b57ce
Modify PPD based on feedbacks.
wgtmac Sep 8, 2020
661e06e
Update site for 1.6.6
dongjoon-hyun Dec 10, 2020
4cad055
ORC-696: Consistent TypeDescription handling for quoted field names
pgaref Dec 14, 2020
3f3f62f
ORC-697: Improve scan tool to report the location of corruption
omalley Dec 15, 2020
2399426
Follow up to ORC-697 to suppress findbugs check for exception.
omalley Dec 15, 2020
8d02646
ORC-699. Minor improvements to the scan tool that make the exception …
omalley Dec 19, 2020
17fd9c3
ORC-702: [C++] Support big ORC files in Windows (#584)
spektom Dec 21, 2020
7ba6481
ORC-704: Publish snapshots at only apache repo (#587)
williamhyun Dec 23, 2020
710bace
ORC-706: Put back DataReaderProperties default maxDiskRangeChunkLimit
pgaref Dec 27, 2020
c5efb95
ORC-707: FIX tzdata to recover Win32 build in AppVeyor (#590)
pgaref Dec 28, 2020
98bf06f
MINOR: Rename Presto SQL to Trino (#596)
martint Dec 30, 2020
60b03ef
ORC-705: Predicate evaluation should take into account writer calenda…
pgaref Dec 30, 2020
40495ba
ORC-709: FIX Boolean to StringGroup schema evolution (#594)
pgaref Dec 30, 2020
f6b6b2e
ORC-708: Use maven command directly (#598)
dongjoon-hyun Jan 3, 2021
864c11f
ORC-710: Update maven plugins
dongjoon-hyun Jan 4, 2021
68e9d62
ORC-711: Support CryptoExtension in create/decryptLocalKey (#608)
dongjoon-hyun Jan 8, 2021
f60218c
ORC-712: Add `USING IN SPARK` to website (#609)
dongjoon-hyun Jan 8, 2021
db62ea5
ORC-714: Remove MRUnit and its usage (#612)
dongjoon-hyun Jan 9, 2021
9b9bac2
ORC-718: Enable Checkstyle plugin and FileTabCharacter rule
williamhyun Jan 9, 2021
1d9a289
ORC-719: Enable UnusedImports rule
williamhyun Jan 9, 2021
d741a36
ORC-720: Run mvn checkstyle:check in GitHub action
williamhyun Jan 9, 2021
6b7e584
ORC-721: Use 'org.junit.Assert` instead of deprecated `junit.framewor…
williamhyun Jan 9, 2021
3ec9c98
ORC-723: Upgrade Mockito to 3.7.0
williamhyun Jan 9, 2021
0156ad7
ORC-724: PPD: Date IN single value comparison throws ClassCastExcepti…
pgaref Jan 10, 2021
6a12968
MINOR: Rename GitHub Action job name to `Build and test` (#621)
dongjoon-hyun Jan 11, 2021
898e448
ORC-725: Disable merge commits from Github `Merge Button` (#622)
dongjoon-hyun Jan 11, 2021
3a5cf4b
ORC-603. Upgrade `tools` to use Hadoop 2.10.1 (#607)
dongjoon-hyun Jan 12, 2021
57bc8fd
ORC-732: Update ORC API docs to 1.6.6 consistently
dongjoon-hyun Jan 13, 2021
d22edd4
ORC-730: Add website link and description to GitHub page (#623)
dongjoon-hyun Jan 13, 2021
50764d0
ORC-734: Use org.apache.commons.lang3 (#626)
dongjoon-hyun Jan 14, 2021
596e989
ORC-733: Upgrade Zookeeper to 3.6.2 (#625)
dongjoon-hyun Jan 14, 2021
7198f1e
MINOR: Update Java version to 8 in README.md. (#627)
williamhyun Jan 16, 2021
34a79ce
ORC-735: ConvertTool should not fail at a single ORC file. (#628)
williamhyun Jan 16, 2021
9233478
ORC-713: Add Java 15 to GitHub action (#611)
williamhyun Jan 16, 2021
51a88c7
ORC-736: Upgrade Hive to 3.1.2 (#629)
dongjoon-hyun Jan 18, 2021
e81e572
ORC-738: Add date type conversion support in `Java Tools` (#631)
darkamgine Jan 20, 2021
d58fbaa
Update site for 1.6.7
dongjoon-hyun Jan 22, 2021
f0c5f00
ORC-739: Use Maven Wrapper in java/CMakeLists.txt (#632)
dongjoon-hyun Jan 24, 2021
992a121
ORC-740: Add curl in debian and ubuntu Docker files
williamhyun Jan 24, 2021
1c41def
ORC-741: Schema Evolution missing column not handled in filters.
pavibhai Jan 22, 2021
2981629
ORC-745: Update README.md with new travis-ci.com links (#638)
dongjoon-hyun Jan 27, 2021
d7ad250
ORC-748: Add separate writer implementation for Trino (#639)
findepi Feb 8, 2021
3b40375
ORC-737: Upgrade Spark to 3.1.0
dongjoon-hyun Jan 18, 2021
8e40078
Fix a typo in the OrcConf javadoc.
autumnust Jan 27, 2021
e39fd44
ORC-748: Add constants for Trino writer
findepi Feb 8, 2021
0b5bebc
ORC-749 Add checkstyle:check to analyze profile.
omalley Feb 17, 2021
1513c59
ORC-750: Fix bench to use orc pom as parent.
omalley Feb 17, 2021
b05350a
ORC-752: FIX Master branch build badge (#644)
pgaref Feb 18, 2021
8c5814b
ORC-747: Abstract Dictionary interface to enable a hash dictionary.
autumnust Jan 27, 2021
caf2dcf
ORC-754: Code cleanup (#646)
pavibhai Feb 23, 2021
3937521
[C++] Fix stream state of ColumnReader after seek (#645)
wgtmac Feb 27, 2021
d1ea232
ORC-751: [C++] Implement Predicate Push Down for C++ Reader
wgtmac Mar 2, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
24 changes: 24 additions & 0 deletions .asf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
---
github:
description: "Apache ORC - the smallest, fastest columnar storage for Hadoop workloads"
homepage: https://orc.apache.org/
enabled_merge_buttons:
merge: false
squash: true
rebase: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here.

9 changes: 4 additions & 5 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: master
name: Build and test

on:
push:
Expand All @@ -11,13 +11,14 @@ on:
jobs:
build:
name: "Build with Java ${{ matrix.java }}"
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
java:
- 1.8
- 11
- 15
env:
MAVEN_OPTS: -Xmx2g
MAVEN_SKIP_RC: true
Expand All @@ -40,7 +41,5 @@ jobs:
mkdir -p ~/.m2
mkdir build
cd build
cmake ..
cmake -DANALYZE_JAVA=ON ..
make package test-out
cd ../java
mvn apache-rat:check
16 changes: 9 additions & 7 deletions .github/workflows/publish_snapshot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ on:

jobs:
publish-snapshot:
if: github.repository == 'apache/orc'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
Expand All @@ -15,10 +16,11 @@ jobs:
with:
java-version: 8

- name: Release Maven package
uses: samuelmeuli/action-maven-publish@v1
with:
directory: java
server_id: apache.snapshots.https
nexus_username: ${{ secrets.NEXUS_USER }}
nexus_password: ${{ secrets.NEXUS_PW }}
- name: Publish snapshot
env:
ASF_USERNAME: ${{ secrets.NEXUS_USER }}
ASF_PASSWORD: ${{ secrets.NEXUS_PW }}
run: |
cd java
echo "<settings><servers><server><id>apache.snapshots.https</id><username>$ASF_USERNAME</username><password>$ASF_PASSWORD</password></server></servers></settings>" > settings.xml
mvn --settings settings.xml -DskipTests deploy
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ Releases:
* Downloads: <a href="http://orc.apache.org/downloads">Apache ORC downloads</a>

The current build status:
* Master branch <a href="https://travis-ci.org/apache/orc/branches">
![master build status](https://travis-ci.org/apache/orc.svg?branch=master)</a>
* <a href="https://travis-ci.org/apache/orc/pull_requests">Pull Requests</a>
* Master branch <a href="https://travis-ci.com/apache/orc/branches">
![master build status](https://travis-ci.com/apache/orc.svg?branch=master)</a>
* <a href="https://travis-ci.com/github/apache/orc/pull_requests">Pull Requests</a>


Bug tracking: <a href="http://orc.apache.org/bugs">Apache Jira</a>
Expand All @@ -44,7 +44,7 @@ The subdirectories are:

### Building

* Install java 1.7 or higher
* Install java 1.8 or higher
* Install maven 3 or higher
* Install cmake

Expand Down
2 changes: 2 additions & 0 deletions c++/include/orc/Common.hh
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ namespace orc {
ORC_JAVA_WRITER = 0,
ORC_CPP_WRITER = 1,
PRESTO_WRITER = 2,
SCRITCHLEY_GO = 3,
TRINO_WRITER = 4,
UNKNOWN_WRITER = INT32_MAX
};

Expand Down
17 changes: 17 additions & 0 deletions c++/include/orc/Reader.hh
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include "orc/Common.hh"
#include "orc/orc-config.hh"
#include "orc/Statistics.hh"
#include "orc/sargs/SearchArgument.hh"
#include "orc/Type.hh"
#include "orc/Vector.hh"

Expand Down Expand Up @@ -191,6 +192,11 @@ namespace orc {
*/
RowReaderOptions& setEnableLazyDecoding(bool enable);

/**
* Set search argument for predicate push down
*/
RowReaderOptions& searchArgument(std::unique_ptr<SearchArgument> sargs);

/**
* Should enable encoding block mode
*/
Expand Down Expand Up @@ -245,6 +251,11 @@ namespace orc {
* What scale should all Hive 0.11 decimals be normalized to?
*/
int32_t getForcedScaleOnHive11Decimal() const;

/**
* Get search argument for predicate push down
*/
std::shared_ptr<SearchArgument> getSearchArgument() const;
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
};


Expand Down Expand Up @@ -538,6 +549,12 @@ namespace orc {
*/
virtual void seekToRow(uint64_t rowNumber) = 0;

/**
* If PPD is enabled, returns true and store number of selected RGs and
* number of evaluated RGs into the stats pair; otherwise returns false.
*/
virtual bool getPPDStats(std::pair<uint64_t, uint64_t>& stats) const = 0;

};
}

Expand Down
9 changes: 9 additions & 0 deletions c++/src/ColumnReader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,9 @@ namespace orc {
std::unordered_map<uint64_t, PositionProvider>& positions) {
ColumnReader::seekToRowGroup(positions);
inputStream->seek(positions.at(columnId));
// clear buffer state after seek
bufferEnd = nullptr;
bufferPointer = nullptr;
}

class StringDictionaryColumnReader: public ColumnReader {
Expand Down Expand Up @@ -838,6 +841,9 @@ namespace orc {
ColumnReader::seekToRowGroup(positions);
blobStream->seek(positions.at(columnId));
lengthRle->seek(positions.at(columnId));
// clear buffer state after seek
lastBuffer = nullptr;
lastBufferLength = 0;
}

class StructColumnReader: public ColumnReader {
Expand Down Expand Up @@ -1553,6 +1559,9 @@ namespace orc {
ColumnReader::seekToRowGroup(positions);
valueStream->seek(positions.at(columnId));
scaleDecoder->seek(positions.at(columnId));
// clear buffer state after seek
buffer = nullptr;
bufferEnd = nullptr;
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
}

class Decimal128ColumnReader: public Decimal64ColumnReader {
Expand Down
10 changes: 10 additions & 0 deletions c++/src/Options.hh
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ namespace orc {
bool throwOnHive11DecimalOverflow;
int32_t forcedScaleOnHive11Decimal;
bool enableLazyDecoding;
std::shared_ptr<SearchArgument> sargs;

RowReaderOptionsPrivate() {
selection = ColumnSelection_NONE;
Expand Down Expand Up @@ -249,6 +250,15 @@ namespace orc {
privateBits->enableLazyDecoding = enable;
return *this;
}

RowReaderOptions& RowReaderOptions::searchArgument(std::unique_ptr<SearchArgument> sargs) {
privateBits->sargs = std::move(sargs);
return *this;
}

std::shared_ptr<SearchArgument> RowReaderOptions::getSearchArgument() const {
return privateBits->sargs;
}
}

#endif
2 changes: 2 additions & 0 deletions c++/src/OrcFile.cc
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
#include <io.h>
#define S_IRUSR _S_IREAD
#define S_IWUSR _S_IWRITE
#define stat _stat64
#define fstat _fstat64
#else
#include <unistd.h>
#define O_BINARY 0
Expand Down
Loading