-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Make bloom_filter_agg fall back when might_contain is not transformable #3917
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
@JkSelf , @PHILO-HE , @jinchengchenghh , coudl you help to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work! Just some suggestions.
@@ -714,6 +714,13 @@ case class AddTransformHintRule() extends Rule[SparkPlan] { | |||
s"${e.getMessage}, original sparkplan is " + | |||
s"${plan.getClass}(${plan.children.toList.map(_.getClass)})") | |||
} | |||
|
|||
if (TransformHints.isAlreadyTagged(plan) && TransformHints.isNotTransformable(plan)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, we can move this part into a dedicated rule after AddTransformHintRule
(see link).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, moved to a separate rule.
@@ -130,6 +130,33 @@ class Spark34Shims extends SparkShims { | |||
@transient locations: Array[String] = Array.empty): PartitionedFile = | |||
PartitionedFile(partitionValues, SparkPath.fromPathString(filePath), start, length, locations) | |||
|
|||
override def handleBloomFilterFallback(plan: SparkPlan)(fun: SparkPlan => Unit): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method looks repeated. If so, maybe, we can just move this method into a common place. But just move the check for BloomFilterAggregate
into shims, as I note it doesn't exist in spark3.2.
Spark 3.3/3.4:
def hasBloomFilterAggregate(agg): Boolean = {
agg.aggregateExpressions.exists( expr.aggregateFunction.isInstanceOf[BloomFilterAggregate])
}
Spark 3.2:
def hasBloomFilterAggregate(agg): Boolean = {
false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks,
Also note this doc: https://github.com/oap-project/gluten/blob/main/docs/velox-backend-limitations.md#runtime-bloomfilter. With this fix, we need to document the behavior into fallback section. Thanks! |
Run Gluten Clickhouse CI |
Thanks, updated the doc. |
Run Gluten Clickhouse CI |
* need fall back related bloom filter agg. | ||
*/ | ||
case class FallbackBloomFilterAggIfNeeded() extends Rule[SparkPlan] { | ||
override def apply(plan: SparkPlan): SparkPlan = plan.transformDown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Maybe, we can skip the handling and just return the original plan if spark.gluten.sql.native.bloomFilter=false
. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, thanks.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @zhli1142015 @PHILO-HE .
…t transformable (apache#3917)" This reverts commit d7d8e28.
…t transformable (apache#3917)" This reverts commit d7d8e28.
What changes were proposed in this pull request?
Fallback bloom_filter_agg when might_contain fall back, either because of might_contain self or other expressions in same operator. This fixes below error:
How was this patch tested?
UT.