Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2430: Add parquet joiner v2 #1335

Merged
merged 67 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
f5144b2
add initial ParquetJoiner implementation
Jan 28, 2024
01a08dd
add initial ParquetJoiner implementation
Feb 1, 2024
28c987c
Merge remote-tracking branch 'origin/master' into add-parquet-joiner
Feb 12, 2024
7ae3505
refactor ParquetJoiner implementation
Feb 17, 2024
05eb22a
extend the main test for multiple files on the right
Feb 20, 2024
6bb950d
extend the main test for multiple files on the right
Feb 22, 2024
87b923c
Merge branch 'master' into add-parquet-joiner
Feb 22, 2024
f9536c3
converge join logic, crate a draft of options and rewriter
Feb 23, 2024
d7f11d9
move ParquetJoinTest logic to ParquetRewriterTest
Feb 27, 2024
e8e7ffe
improve Parquet stitching test
Mar 1, 2024
3ee946c
remove custom ParquetRewriter constructor
Mar 6, 2024
fd409c4
remove custom ParquetRewriter constructor
Mar 6, 2024
5a98219
refactor ParquetRewriter
Mar 12, 2024
7b2fd1a
apply spotless and address PR comments
Mar 14, 2024
8da8291
move extra column writing into processBlocksFromReader
Mar 15, 2024
68e41ba
add getInputFiles back
Mar 16, 2024
98b9b23
Merge remote-tracking branch 'fork/master' into add-parquet-joiner
Mar 16, 2024
6d2c222
fix extra ParquetRewriter constructor so tests can pass
Mar 16, 2024
883e935
remove not needed TODOs
Mar 20, 2024
8ef36b5
address PR comments
Mar 24, 2024
79cc2b8
Merge remote-tracking branch 'origin/master' into add-parquet-joiner
Apr 11, 2024
0bbf72f
rename inputFilesR to inputFilesToJoin
Apr 11, 2024
ca53bff
rename inputFilesR to inputFilesToJoinColumns
Apr 11, 2024
1e7998a
add getParquetInputFiles listing to the rewrite start logging
Apr 11, 2024
2ee9b40
redesign file joiner in ParquetRewriter
Apr 28, 2024
fc32dfd
Merge remote-tracking branch 'origin/master' into add-parquet-joiner-v2
Apr 28, 2024
db52c85
redesign file joiner in ParquetRewriter
Apr 28, 2024
9057e91
redesign file joiner in ParquetRewriter
Apr 28, 2024
5b055c0
redesign file joiner in ParquetRewriter
Apr 28, 2024
b70f88f
uncomment some code
Apr 28, 2024
270126b
fix ParquetRewriter joiner test
May 4, 2024
008cb40
Merge remote-tracking branch 'refs/remotes/origin/master' into add-pa…
Jul 25, 2024
0dc1793
add initial ParquetJoiner implementation
Jul 25, 2024
a53d108
add initial ParquetJoiner implementation
Jul 31, 2024
4da0b85
typo
Aug 6, 2024
c5c7b38
typo
Aug 6, 2024
92c95db
typo
Aug 6, 2024
86f7a4c
typo
Aug 6, 2024
73a4af4
docs
Aug 6, 2024
18feef4
typo
Aug 7, 2024
b24bffa
add getExtraMetadata()
Aug 7, 2024
21a5926
extract ensureRowCount()
Aug 7, 2024
c521a95
typo
Aug 7, 2024
1ea6755
typo
Aug 7, 2024
f2e01a2
add logging into getSchema()
Aug 7, 2024
d393125
typo
Aug 7, 2024
64d3bb2
add closing of input files readers
Aug 7, 2024
f50666a
fix RewriteOptions builder for inputFilesToJoin
Aug 7, 2024
d306336
Merge remote-tracking branch 'refs/remotes/origin/master' into add-pa…
Aug 7, 2024
ae9589d
fix ParquetRewriter constructor
Aug 7, 2024
9157960
extend tests for ParquetRewriter
Aug 14, 2024
3b722e4
spotless
Aug 14, 2024
bdba14c
refactor ParquetRewriterTest
Aug 15, 2024
a89eba6
add tests into ParquetRewriterTest
Aug 16, 2024
57432ee
Merge remote-tracking branch 'origin/master' into add-parquet-joiner-v2
Aug 26, 2024
f674bcf
extend tests in ParquetRewriterTest for joiner part
Aug 26, 2024
8514f39
add testMergeFilesToJoinWithDifferentRowCount test into ParquetRewrit…
Aug 27, 2024
0aaf963
Merge remote-tracking branch 'origin/master' into add-parquet-joiner-v2
Aug 29, 2024
4340c42
add testOneInputFileManyInputFilesToJoin with and without JoinColumns…
Aug 29, 2024
e475648
Merge remote-tracking branch 'origin/master' into add-parquet-joiner-v2
Aug 31, 2024
bb42979
add encrypt validation into ParquetRewriterTest's testOneInputFileMan…
Aug 31, 2024
5b97a4c
refactor ParquetRewriter slightly to address PR comments
Sep 8, 2024
27ba73b
add javadoc to ParquetRewriter
Sep 9, 2024
07f1e74
add javadoc to ParquetRewriter
Sep 10, 2024
e96c022
fix javadoc in ParquetRewriter to comply with Maven javadoc plugin
Sep 13, 2024
d1c1d76
fix javadoc in ParquetRewriter to comply with Maven javadoc plugin
Sep 13, 2024
9de20d7
fix javadoc in ParquetRewriter to comply with Maven javadoc plugin
Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading