Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Minor - Fix parquet writer passing wrong param #3790

Merged
merged 1 commit into from
Nov 21, 2023

Conversation

marin-ma
Copy link
Contributor

@marin-ma marin-ma commented Nov 21, 2023

When specifying row group options, "parquet.block.size" was not respected, e.g.

df.write.format("parquet")
.mode("overwrite")
.option("parquet.block.rows",1048576)
.option("parquet.block.size", 1000000000)
.save(s"{outputPath}")

Verified locally.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@marin-ma marin-ma requested a review from rui-mo November 21, 2023 03:31
Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zhouyuan
Copy link
Contributor

CC: @PengleiShi

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue before this fix? Maybe we can add more details to the PR description.

@zhouyuan
Copy link
Contributor

the two failures are not related
#3796

@marin-ma marin-ma merged commit 0d4e43c into apache:main Nov 21, 2023
17 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3790_time.csv log/native_master_11_20_2023_60fc2a02e_time.csv difference percentage
q1 34.38 34.37 -0.013 99.96%
q2 24.78 24.70 -0.080 99.68%
q3 37.60 35.40 -2.204 94.14%
q4 35.25 36.40 1.146 103.25%
q5 68.74 68.86 0.117 100.17%
q6 6.98 7.01 0.037 100.52%
q7 82.64 85.39 2.747 103.32%
q8 87.38 87.44 0.058 100.07%
q9 122.47 124.93 2.458 102.01%
q10 46.77 46.83 0.061 100.13%
q11 19.29 19.99 0.709 103.68%
q12 26.15 24.57 -1.580 93.96%
q13 46.17 44.85 -1.316 97.15%
q14 14.82 18.77 3.948 126.63%
q15 27.41 27.13 -0.275 99.00%
q16 15.50 15.41 -0.091 99.41%
q17 101.65 102.21 0.557 100.55%
q18 149.88 149.83 -0.052 99.97%
q19 13.60 13.48 -0.125 99.08%
q20 27.40 28.10 0.700 102.55%
q21 221.79 220.09 -1.698 99.23%
q22 13.09 12.89 -0.202 98.46%
total 1223.75 1228.65 4.904 100.40%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants