Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3855][VL] Fix ORC related failed UT #3805

Merged
merged 3 commits into from
Nov 27, 2023
Merged

Conversation

chenxu14
Copy link
Contributor

@chenxu14 chenxu14 commented Nov 22, 2023

What changes were proposed in this pull request?

Fix ORC related failed UT, below two tests were not fixed - will fix them in following commits

    .exclude("SPARK-37965: Spark support read/write orc file with invalid char in field name")
    .exclude("SPARK-38173: Quoted column cannot be recognized correctly when quotedRegexColumnNames is true")

How was this patch tested?

pass GHA

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhouyuan
Copy link
Contributor

@JkSelf all unit ORC tests are passing, it should safe to include oap-project/velox#417 now

@rui-mo
Copy link
Contributor

rui-mo commented Nov 23, 2023

@chenxu14 The tests under // ReaderFactory is not registered for format orc. are all ORC failed tests. Can we include all of them in this PR? For example:

tempsnip

@chenxu14
Copy link
Contributor Author

@chenxu14 The tests under // ReaderFactory is not registered for format orc. are all ORC failed tests. Can we include all of them in this PR? For example:

tempsnip

I see

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@chenxu14
Copy link
Contributor Author

Can you help me to determine if the failed UT is related to ORC @rui-mo

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenxu14 Thanks for trying to enable more tests. Seems there is only 1 failed test on Spark 3.3, but several failures on Spark 3.4. Maybe there are some new issues on Spark 3.4 because its framework was introduced weeks ago. Could you disable the failed ones especially on Spark 3.4 first?

cc @JkSelf

.exclude("SPARK-15474 Write and read back non-empty schema with empty dataframe - orc")
.exclude("SPARK-23271 empty RDD when saved should write a metadata only file - orc")
.exclude("SPARK-22146 read files containing special characters using orc")
.exclude("SPARK-30362: test input metrics for DSV2")
Copy link
Contributor

@rui-mo rui-mo Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe exclude SPARK-30362: test input metrics for DSV2 with below message.

// Unknown. Need to investigate.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Copy link

Run Gluten Clickhouse CI

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
@zhouyuan zhouyuan changed the title [VL] Fix ORC related failed UT [GLUTEN-3855][VL] Fix ORC related failed UT Nov 27, 2023
Copy link

#3855

Copy link

Run Gluten Clickhouse CI

@JkSelf JkSelf merged commit b707ba6 into apache:main Nov 27, 2023
17 checks passed
@zhouyuan
Copy link
Contributor

@chenxu14 thank you for fixing the ORC tests!

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3805_time.csv log/native_master_11_27_2023_08cdbee7e_time.csv difference percentage
q1 36.08 33.96 -2.118 94.13%
q2 25.09 24.50 -0.590 97.65%
q3 38.55 36.70 -1.847 95.21%
q4 37.72 36.44 -1.275 96.62%
q5 71.74 70.50 -1.240 98.27%
q6 7.26 7.29 0.034 100.47%
q7 84.49 83.48 -1.011 98.80%
q8 88.24 88.10 -0.138 99.84%
q9 127.04 122.04 -5.002 96.06%
q10 45.68 46.95 1.266 102.77%
q11 21.40 19.82 -1.584 92.60%
q12 26.49 26.08 -0.414 98.44%
q13 46.68 48.07 1.384 102.96%
q14 16.27 15.26 -1.006 93.82%
q15 29.38 28.91 -0.469 98.40%
q16 15.74 15.89 0.155 100.98%
q17 103.11 103.54 0.428 100.42%
q18 150.30 151.52 1.214 100.81%
q19 12.93 12.96 0.027 100.21%
q20 29.39 28.70 -0.687 97.66%
q21 227.16 226.06 -1.095 99.52%
q22 13.36 13.19 -0.166 98.75%
total 1254.10 1239.96 -14.133 98.87%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants