You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've checked the NDS-H of this repository, and it's quite similar to TPC-H. I tested spark rapids with TPC-H's SF100 on my server with 8 NVIDIA A100 NVLink GPUs and found that the speed with 8 instances is not as fast as using CPUs. I also used optimization methods, such as setting spark.sql.files.maxPartitionBytes=2gb and spark.sql.adaptive.enabled=true.
I am using both the Pandas API on Spark. And Spark SQL is faster, but some queries are still not as fast a running on the same GPU server without GPU, only CPU.
Is this result expected?
Or is it that Spark Rapids can speed up certain data and queries, such as some queries of NDS (TPC-DS)?
The text was updated successfully, but these errors were encountered:
Can you share the entire Spark configuration settings that you used for your run? We have benchmarked NDS-H internally and all queries run faster on GPU, though normally we benchmark at a larger scale factor such as SF3000.
Here is a set of configs that we have used in our benchmarks:
I've checked the NDS-H of this repository, and it's quite similar to TPC-H. I tested spark rapids with TPC-H's SF100 on my server with 8 NVIDIA A100 NVLink GPUs and found that the speed with 8 instances is not as fast as using CPUs. I also used optimization methods, such as setting
spark.sql.files.maxPartitionBytes=2gb
andspark.sql.adaptive.enabled=true
.I am using both the Pandas API on Spark. And Spark SQL is faster, but some queries are still not as fast a running on the same GPU server without GPU, only CPU.
Is this result expected?
Or is it that Spark Rapids can speed up certain data and queries, such as some queries of NDS (TPC-DS)?
The text was updated successfully, but these errors were encountered: