-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49673][CONNECT] Increase CONNECT_GRPC_ARROW_MAX_BATCH_SIZE to 0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE #48122
Conversation
...connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any existing tests that use the new limits to ensure that everything still works?
I've added an E2E test that explicitly tests multiple batches and changed the static limit to 10MiB there to not create too much memory pressure. |
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
Show resolved
Hide resolved
Merged to master. |
…0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE ### What changes were proposed in this pull request? Increases the default `maxBatchSize` from 4MiB * 0.7 to 128MiB (= CONNECT_GRPC_MAX_MESSAGE_SIZE) * 0.7. This makes better use of the allowed maximum message size. This limit is used when creating Arrow batches for the `SqlCommandResult` in the `SparkConnectPlanner` and for `ExecutePlanResponse.ArrowBatch` in `processAsArrowBatches`. This, for example, lets us return much larger `LocalRelations` in the `SqlCommandResult` (i.e., for the `SHOW PARTITIONS` command) while still staying within the GRPC message size limit. ### Why are the changes needed? There are `SqlCommandResults` that exceed 0.7 * 4MiB. ### Does this PR introduce _any_ user-facing change? Now support `SqlCommandResults` <= 0.7 * 128 MiB instead of only <= 0.7 * 4MiB and ExecutePlanResponses will now better use the limit of 128MiB. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48122 from dillitz/increase-sql-command-batch-size. Authored-by: Robert Dillitz <robert.dillitz@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE ### What changes were proposed in this pull request? Increases the default `maxBatchSize` from 4MiB * 0.7 to 128MiB (= CONNECT_GRPC_MAX_MESSAGE_SIZE) * 0.7. This makes better use of the allowed maximum message size. This limit is used when creating Arrow batches for the `SqlCommandResult` in the `SparkConnectPlanner` and for `ExecutePlanResponse.ArrowBatch` in `processAsArrowBatches`. This, for example, lets us return much larger `LocalRelations` in the `SqlCommandResult` (i.e., for the `SHOW PARTITIONS` command) while still staying within the GRPC message size limit. ### Why are the changes needed? There are `SqlCommandResults` that exceed 0.7 * 4MiB. ### Does this PR introduce _any_ user-facing change? Now support `SqlCommandResults` <= 0.7 * 128 MiB instead of only <= 0.7 * 4MiB and ExecutePlanResponses will now better use the limit of 128MiB. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48122 from dillitz/increase-sql-command-batch-size. Authored-by: Robert Dillitz <robert.dillitz@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Increases the default
maxBatchSize
from 4MiB * 0.7 to 128MiB (=CONNECT_GRPC_MAX_MESSAGE_SIZE) * 0.7. This makes better use of the allowed maximum message size.
This limit is used when creating Arrow batches for the
SqlCommandResult
in theSparkConnectPlanner
and forExecutePlanResponse.ArrowBatch
inprocessAsArrowBatches
. This, for example, lets us return much largerLocalRelations
in theSqlCommandResult
(i.e., for theSHOW PARTITIONS
command) while still staying within the GRPC message size limit.Why are the changes needed?
There are
SqlCommandResults
that exceed 0.7 * 4MiB.Does this PR introduce any user-facing change?
Now support
SqlCommandResults
<= 0.7 * 128 MiB instead of only <= 0.7 * 4MiB and ExecutePlanResponses will now better use the limit of 128MiB.How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.