Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any Performance Results ? #197

Open
luweizheng opened this issue Sep 18, 2024 · 1 comment
Open

Any Performance Results ? #197

luweizheng opened this issue Sep 18, 2024 · 1 comment
Assignees

Comments

@luweizheng
Copy link

luweizheng commented Sep 18, 2024

I've checked the NDS-H of this repository, and it's quite similar to TPC-H. I tested spark rapids with TPC-H's SF100 on my server with 8 NVIDIA A100 NVLink GPUs and found that the speed with 8 instances is not as fast as using CPUs. I also used optimization methods, such as setting spark.sql.files.maxPartitionBytes=2gb and spark.sql.adaptive.enabled=true.

I am using both the Pandas API on Spark. And Spark SQL is faster, but some queries are still not as fast a running on the same GPU server without GPU, only CPU.

Is this result expected?

Or is it that Spark Rapids can speed up certain data and queries, such as some queries of NDS (TPC-DS)?

@gerashegalov gerashegalov added the ? - Needs Triage Need team to review and classify label Nov 26, 2024
@mattahrens mattahrens self-assigned this Nov 26, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 26, 2024
@mattahrens
Copy link
Collaborator

Can you share the entire Spark configuration settings that you used for your run? We have benchmarked NDS-H internally and all queries run faster on GPU, though normally we benchmark at a larger scale factor such as SF3000.

Here is a set of configs that we have used in our benchmarks:

                   "--conf" "spark.sql.adaptive.enabled=true"
                   "--conf" "spark.sql.files.maxPartitionBytes=2gb"
                   "--conf" "spark.driver.maxResultSize=2GB"
                   "--conf" "spark.driver.memory=50G"
                   "--conf" "spark.executor.cores=16"
                   "--conf" "spark.executor.memory=16G"
                   "--conf" "spark.executor.resource.gpu.amount=1"
                   "--conf" "spark.task.resource.gpu.amount=0.0625"
                   "--conf" "spark.rapids.memory.host.spillStorageSize=32G"
                   "--conf" "spark.rapids.memory.pinnedPool.size=8g"
                   "--conf" "spark.rapids.sql.concurrentGpuTasks=4"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants