-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set default RMM pool to ASYNC for cuda 11.2+ #4606
Conversation
Signed-off-by: Rong Ou <rong.ou@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we really ready for this? It would be good to have benchmark results showing this is not going to be a regression for some use-cases. It seems like we still have unresolved performance issues that appear to be tied to this allocator, e.g.:#4536.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Rong Ou <rong.ou@gmail.com>
We do have benchmarks on TPCDS at various scale factors. The idea is turn this on early in 22.04 and get it more widely tested and benchmarked. If we do run into some block we can always revert it back before the release is cut. |
Do we have those metrics isolated to just this change? There are quite a few performance improvements going in recently, and I want to make sure we don't end up masking a regression from the async allocator with a performance improvement from elsewhere. I would really like to see what this allocator costs for queries that don't spill, and last I knew we only had metrics at scale for UCX shuffle which of course can spill quite a lot due to its aggressive caching of shuffle outputs. The only metrics I've seen for a query that doesn't spill is from #4536, and I find it odd that we want to check this in without first understanding the impact footprint for a 500X slowdown. |
To be clear most of the tests didn't show that slowness. I would need to do some more analysis to see if there was any real difference that I saw beyond this one corner case. |
This is the result for TPCDS at 1TB. I'll dig more into #4536. |
#4536 turns out to be a red herring. |
build |
Fixes #4515
Signed-off-by: Rong Ou rong.ou@gmail.com