-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Pass partition id to velox functions #4344
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
3 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
we should use rdd partition id instead of task partition id for this function.
|
883255c
to
276bd90
Compare
Run Gluten Clickhouse CI |
276bd90
to
8852eec
Compare
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
How was this issue produced? This spark_partition_id() function should fall back in this test. Could you clarify a bit? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This issue is found after i offload spark_partition_id to velox. native spark_partition_id is simple which returns a config as you mentioned, the config here is the partition id we set for each Velox task. Before the partition id we passed to native is task partition id, not rdd partition id. They are different when there are coalesce or union operatiors. I mentioned this here just for a reminder for my self. |
8852eec
to
97b7384
Compare
Run Gluten Clickhouse CI |
97b7384
to
7870d07
Compare
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
86c3d89
to
531ce75
Compare
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just two comments, could you help check and clarify? BTW, could you let us know how get_partition_id()
function is used on your side? Directly used in sql? Thanks! cc @zhouyuan
cpp/velox/jni/VeloxJniWrapper.cc
Outdated
@@ -111,6 +111,8 @@ Java_io_glutenproject_vectorized_PlanEvaluatorJniWrapper_nativeValidateWithFailu | |||
|
|||
// A query context used for function validation. | |||
velox::core::QueryCtx queryCtx; | |||
std::unordered_map<std::string, std::string> configs{{velox::core::QueryConfig::kSparkPartitionId, "0"}}; | |||
queryCtx.testingOverrideConfigUnsafe(std::move(configs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems testingOverrideConfigUnsafe
should be used in test code only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks.
@@ -111,6 +111,8 @@ Java_io_glutenproject_vectorized_PlanEvaluatorJniWrapper_nativeValidateWithFailu | |||
|
|||
// A query context used for function validation. | |||
velox::core::QueryCtx queryCtx; | |||
std::unordered_map<std::string, std::string> configs{{velox::core::QueryConfig::kSparkPartitionId, "0"}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pass this config for validator? Seems if lacked, it doesn't have any impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
This function is used for data profiling, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work! cc @rui-mo
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
[VL] Pass partition id to velox functions.
[VL] Pass partition id to velox functions.
[VL] Pass partition id to velox functions.
What changes were proposed in this pull request?
Pass partition id to velox functions.
How was this patch tested?
UT.