Set default RMM pool to ASYNC for cuda 11.2+ #4606

rongou · 2022-01-21T23:10:38Z

Signed-off-by: Rong Ou rong.ou@gmail.com

Signed-off-by: Rong Ou <rong.ou@gmail.com>

jlowe

Are we really ready for this? It would be good to have benchmark results showing this is not going to be a regression for some use-cases. It seems like we still have unresolved performance issues that appear to be tied to this allocator, e.g.:#4536.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

Signed-off-by: Rong Ou <rong.ou@gmail.com>

rongou · 2022-01-22T01:32:11Z

We do have benchmarks on TPCDS at various scale factors. The idea is turn this on early in 22.04 and get it more widely tested and benchmarked. If we do run into some block we can always revert it back before the release is cut.

jlowe · 2022-01-24T14:52:01Z

We do have benchmarks on TPCDS at various scale factors.

Do we have those metrics isolated to just this change? There are quite a few performance improvements going in recently, and I want to make sure we don't end up masking a regression from the async allocator with a performance improvement from elsewhere.

I would really like to see what this allocator costs for queries that don't spill, and last I knew we only had metrics at scale for UCX shuffle which of course can spill quite a lot due to its aggressive caching of shuffle outputs. The only metrics I've seen for a query that doesn't spill is from #4536, and I find it odd that we want to check this in without first understanding the impact footprint for a 500X slowdown.

revans2 · 2022-01-24T15:21:52Z

I would really like to see what this allocator costs for queries that don't spill, and last I knew we only had metrics at scale for UCX shuffle which of course can spill quite a lot due to its aggressive caching of shuffle outputs. The only metrics I've seen for a query that doesn't spill is from #4536, and I find it odd that we want to check this in without first understanding the impact footprint for a 500X slowdown.

To be clear most of the tests didn't show that slowness. I would need to do some more analysis to see if there was any real difference that I saw beyond this one corner case.

rongou · 2022-01-24T17:55:20Z

This is the result for TPCDS at 1TB. I'll dig more into #4536.

.

rongou · 2022-01-24T19:16:02Z

#4536 turns out to be a red herring.

rongou · 2022-01-27T17:36:43Z

build

Set default RMM pool to ASYNC for cuda 11.2+

a3e7d1a

Signed-off-by: Rong Ou <rong.ou@gmail.com>

rongou added documentation Improvements or additions to documentation performance A performance related task/issue shuffle things that impact the shuffle plugin ease of use Makes the product simpler to use or configure labels Jan 21, 2022

rongou added this to the Jan 10 - Jan 28 milestone Jan 21, 2022

rongou requested review from jlowe, abellina and revans2 January 21, 2022 23:10

rongou self-assigned this Jan 21, 2022

jlowe reviewed Jan 21, 2022

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Outdated Show resolved Hide resolved

set ASYNC as default

bfc96fe

Signed-off-by: Rong Ou <rong.ou@gmail.com>

sameerz mentioned this pull request Jan 25, 2022

[FEA] Benchmark cuDF branch that uses global memory for regular expressions #4621

Closed

jlowe approved these changes Jan 26, 2022

View reviewed changes

abellina approved these changes Jan 27, 2022

View reviewed changes

rongou merged commit cbb9b14 into NVIDIA:branch-22.04 Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set default RMM pool to ASYNC for cuda 11.2+ #4606

Set default RMM pool to ASYNC for cuda 11.2+ #4606

rongou commented Jan 21, 2022

jlowe left a comment

rongou commented Jan 22, 2022

jlowe commented Jan 24, 2022

revans2 commented Jan 24, 2022

rongou commented Jan 24, 2022

rongou commented Jan 24, 2022

rongou commented Jan 27, 2022

Set default RMM pool to ASYNC for cuda 11.2+ #4606

Set default RMM pool to ASYNC for cuda 11.2+ #4606

Conversation

rongou commented Jan 21, 2022

jlowe left a comment

Choose a reason for hiding this comment

rongou commented Jan 22, 2022

jlowe commented Jan 24, 2022

revans2 commented Jan 24, 2022

rongou commented Jan 24, 2022

rongou commented Jan 24, 2022

rongou commented Jan 27, 2022