Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49673][CONNECT] Increase CONNECT_GRPC_ARROW_MAX_BATCH_SIZE to …
…0.7 * CONNECT_GRPC_MAX_MESSAGE_SIZE ### What changes were proposed in this pull request? Increases the default `maxBatchSize` from 4MiB * 0.7 to 128MiB (= CONNECT_GRPC_MAX_MESSAGE_SIZE) * 0.7. This makes better use of the allowed maximum message size. This limit is used when creating Arrow batches for the `SqlCommandResult` in the `SparkConnectPlanner` and for `ExecutePlanResponse.ArrowBatch` in `processAsArrowBatches`. This, for example, lets us return much larger `LocalRelations` in the `SqlCommandResult` (i.e., for the `SHOW PARTITIONS` command) while still staying within the GRPC message size limit. ### Why are the changes needed? There are `SqlCommandResults` that exceed 0.7 * 4MiB. ### Does this PR introduce _any_ user-facing change? Now support `SqlCommandResults` <= 0.7 * 128 MiB instead of only <= 0.7 * 4MiB and ExecutePlanResponses will now better use the limit of 128MiB. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48122 from dillitz/increase-sql-command-batch-size. Authored-by: Robert Dillitz <robert.dillitz@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
- Loading branch information