Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation #2429

andygrove · 2021-05-17T16:02:30Z

Signed-off-by: Andy Grove andygrove73@gmail.com

When running on a desktop where I am using the GPU to drive the display, I need to reduce the value for the spark.rapids.memory.gpu.allocFraction settings.

This PR changes the meaning of RMM_ALLOC_FRACTION to be based on % of available rather than total GPU memory. Note that this is just used for the initial allocation and doesn't not restrict total memory usage.

~~Rather than modifying the source code and remembering to undo the change before committing changes, I would prefer to be able to override this setting from the command line when running Maven.~~

~~This PR provides a generic way to override RapidsConf settings using Java system properties.~~

~~For example, when running tests with Maven, I can now run:~~

mvn clean verify -DargLine="-Dspark.rapids.memory.gpu.allocFraction=0.7"

~~When running tests from an IDE, I can specify -Dspark.rapids.memory.gpu.allocFraction=0.7.~~

revans2 · 2021-05-17T19:09:39Z

CONTRIBUTING.md

+syntax can be used to pass Java system properties to the Maven Surefire plugin. 
+
+```shell script
+mvn verify -DargLine="-Dspark.rapids.memory.gpu.allocFraction=0.7"


I understand why you want this, but it scares me. Spark does not support this and if someone sets a config through a system property in production I'm not sure how we can debug it. Spark will not see it so it will not be logged/displayed in the typical places. We already have something that does this just for the tests but it is not well documented.

spark-rapids/tests/src/test/scala/com/nvidia/spark/rapids/SparkQueryCompareTestSuite.scala

Lines 88 to 94 in 247b758

val commandLineVariables = System.getenv("SPARK_CONF")

if (commandLineVariables != null) {

commandLineVariables.split(",").foreach { s =>

val a = s.split("=")

builder.config(a(0), a(1))

}

}

Could you use that instead? and document it a little better.

jlowe · 2021-05-18T14:48:54Z

The side-channel hacks to adjust the Spark configs are interesting but IMO don't get to the crux of the problem: the defaults do not work for this kind of setup. That's a bad initial experience for someone just trying things out on their desktop with their display GPU.

Should we make at least the initial pool allocation fraction be based on free memory rather than total memory? Or is there an alternative where the system defaults still perform well on our expected setup but don't crash outright on this setup?

revans2 · 2021-05-18T16:32:39Z

The side-channel hacks to adjust the Spark configs are interesting but IMO don't get to the crux of the problem: the defaults do not work for this kind of setup. That's a bad initial experience for someone just trying things out on their desktop with their display GPU.

Should we make at least the initial pool allocation fraction be based on free memory rather than total memory? Or is there an alternative where the system defaults still perform well on our expected setup but don't crash outright on this setup?

We initially had it on free memory, but it makes it impossible to share a GPU among tasks in a production setting, and can mask a GPU accidentally being shared.

I am fine with us having different defaults for the unit/integration tests that look at the free GPU memory and try to configure things accordingly, so long as we also fail if there is a bad setup.

jlowe · 2021-05-18T16:49:21Z

it makes it impossible to share a GPU among tasks in a production setting

Not following. All I'm talking about is the initial RMM pool size, not the maximum pool size or anything that has to do with tasks. The RMM pool will try to grow from the initial size as it does today if the application attempts to allocate beyond that initial size.

can mask a GPU accidentally being shared.

Yes, that is definitely the case, as we are trying to share in this setup. That's sort of my point -- should we make it easier to share by default. Maybe the answer is no. If that's the case, maybe we should emit a more useful error message when the initial pool size cannot be allocated to point users to the possibility that the GPU is being used by another process concurrently? Bonus points for emitting details about the GPU that is being used, amount of free memory on it, amount trying to be allocated, etc.

revans2 · 2021-05-18T17:26:21Z

If it is just the initial pool size then I would be fine with a change like that. We would need a lot of good documentation explaining it.

…her than percentage of *total* memory to use for initial allocation Signed-off-by: Andy Grove <andygrove73@gmail.com>

andygrove · 2021-05-20T21:34:31Z

I've updated this and it works well for my desktop. I wasn't sure whether to use the term available or free in the documentation but I went for available since that is the term we use when describing the behavior of spark.rapids.python.memory.gpu.allocFraction and hopefully I'm using the term consistently with that.

andygrove · 2021-05-24T14:08:46Z

build

revans2 · 2021-05-24T14:24:15Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuDeviceManager.scala

@@ -168,7 +168,7 @@ object GpuDeviceManager extends Logging {
    // Align workaround for https://github.com/rapidsai/rmm/issues/527
    def truncateToAlignment(x: Long): Long = x & ~511L

-    var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.total).toLong)
+    var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong)
    if (initialAllocation > info.free) {


This check is useless now, but we probably want to have some checks for a minimum amount of free memory and warn about that, even before the initial allocation.

tgravescs · 2021-06-01T20:31:47Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuDeviceManager.scala

@@ -168,7 +168,7 @@ object GpuDeviceManager extends Logging {
    // Align workaround for https://github.com/rapidsai/rmm/issues/527
    def truncateToAlignment(x: Long): Long = x & ~511L

-    var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.total).toLong)
+    var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong)


so what if free is tiny? do we want a minimum, sometimes I can see this having caught bad setups. Meaning you tried to start 2 exectuors on same gpu when you shouldn't have, so now this will hide that kind of failures. I guess you will probably start seeing failures but I'm wondering how hard they will be to debug.

oops I now see Bobby made similar comment about minimum.

I had missed Bobby's previous comment somehow. I am working on addressing this now. I also noticed that some of the existing error messages need a little rework as well.

abellina · 2021-06-02T15:05:32Z

Is this what we want? Perhaps I misunderstand this PR.

Say I set allocFraction to 50% and launch N executors using a 32GB GPU:

First executor: 16 GB
Second executor: 8 GB
Third executor: 4 GB
Fourth executor: 2 GB
and so on.

Would spark.rapids.memory.gpu.reserve not help this case?

andygrove · 2021-06-02T15:17:29Z

That's a good point @abellina ... I will rethink the approach and look at having the initial allocation for the tests be set based on free memory instead

revans2 · 2021-06-02T16:29:17Z

@abellina @andygrove the reason why we are doing this is because what you described does not happen in practice. Yes it happens for our tests, but that is the only place. We are not recommending that people share a GPU between different executors, and we have never recommended that. In fact Spark, kuberneted, and Yarn do not support doing it. We had to hack things up to make it work for our tests.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

abellina · 2021-06-02T17:20:57Z

Maybe there could be a config that sets what we are basing the fraction from: free or total. Use free as default, especially for production setups, but use total if you know what you are doing (the test hack)?

But agree that basing it off free by default is going to make for a better experience.

revans2 · 2021-06-02T17:28:42Z

Why add a config that no one is going to set? If we hit a situation where we need a config we can look into it. But for now I would just have a hard coded minimum amount of free memory where we output a warning if the initial allocation would be too low. That is not going to save us in all cases (because there will be races) but it should help

abellina · 2021-06-02T17:39:43Z

Why add a config that no one is going to set?

Never mind. I missed that since this is just the init setting, it will grow as necessary. This doesn't preclude the application I had in mind (to test shuffle locally), as I originally thought. Thanks @revans2 @andygrove.

andygrove · 2021-06-02T18:30:39Z

This is ready for another review. I added the minimum check. I could not figure out a good way to test it though.

jlowe · 2021-06-02T19:03:08Z

build

jlowe

The integration tests failed, probably because it intentionally shares the GPU to run multiple tests in parallel. The integration tests probably need to be updated to set the new min alloc config.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2021-06-02T23:39:59Z

build

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2021-06-03T00:35:51Z

build

abellina · 2021-06-03T13:00:59Z

integration_tests/src/main/python/spark_session.py

@@ -83,6 +83,7 @@ def with_gpu_session(func, conf={}):
    """
    copy = dict(conf)
    copy['spark.rapids.sql.enabled'] = 'true'
+    copy['spark.rapids.memory.gpu.minAllocFraction'] = '0'


nit, could we add a comment here (and the other place where this is set for the tests)? So we remember why this is needed.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2021-06-03T14:27:30Z

build

abellina

Thanks @andygrove

andygrove · 2021-06-03T18:17:42Z

build

…, rather than total memory, for initial allocation (NVIDIA#2429) * RMM_ALLOC_FRACTION now specifies percentage of *available* memory rather than percentage of *total* memory to use for initial allocation Signed-off-by: Andy Grove <andygrove73@gmail.com> * Add check for minimum alloc amount and improve error messages Signed-off-by: Andy Grove <andygrove@nvidia.com> * specify minAllocFraction in integration tests Signed-off-by: Andy Grove <andygrove@nvidia.com> * Specify minAllocFraction=0 when testing in parallel Signed-off-by: Andy Grove <andygrove@nvidia.com> * Add comments in tests where minAllocFraction is set Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove self-assigned this May 17, 2021

andygrove added this to the May 10 - May 21 milestone May 17, 2021

andygrove added the ease of use Makes the product simpler to use or configure label May 17, 2021

andygrove mentioned this pull request May 17, 2021

[FEA] Can RTX 3080 be supported with plugin 0.6 and CUDA 11? #2280

Closed

revans2 reviewed May 17, 2021

View reviewed changes

pxLi changed the base branch from branch-0.6 to branch-21.06 May 19, 2021 01:08

RMM_ALLOC_FRACTION now specifies percentage of *available* memory rat…

07ef0d7

…her than percentage of *total* memory to use for initial allocation Signed-off-by: Andy Grove <andygrove73@gmail.com>

andygrove force-pushed the rapids-conf-override branch from 04b031c to 07ef0d7 Compare May 20, 2021 21:27

andygrove changed the title ~~Allow RapidsConf settings to be overridden by Java system properties~~ Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation May 20, 2021

revans2 reviewed May 24, 2021

View reviewed changes

sameerz modified the milestones: May 10 - May 21, May 24 - Jun 4 May 25, 2021

tgravescs reviewed Jun 1, 2021

View reviewed changes

andygrove linked an issue Jun 1, 2021 that may be closed by this pull request

[FEA] Can RTX 3080 be supported with plugin 0.6 and CUDA 11? #2280

Closed

Add check for minimum alloc amount and improve error messages

aef2894

Signed-off-by: Andy Grove <andygrove@nvidia.com>

abellina previously approved these changes Jun 2, 2021

View reviewed changes

jlowe previously approved these changes Jun 2, 2021

View reviewed changes

jlowe reviewed Jun 2, 2021

View reviewed changes

specify minAllocFraction in integration tests

160e200

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove dismissed stale reviews from jlowe and abellina via 160e200 June 2, 2021 23:39

Specify minAllocFraction=0 when testing in parallel

d55b645

Signed-off-by: Andy Grove <andygrove@nvidia.com>

abellina previously approved these changes Jun 3, 2021

View reviewed changes

abellina reviewed Jun 3, 2021

View reviewed changes

jlowe previously approved these changes Jun 3, 2021

View reviewed changes

Add comments in tests where minAllocFraction is set

8e12208

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove dismissed stale reviews from jlowe and abellina via 8e12208 June 3, 2021 14:27

abellina approved these changes Jun 3, 2021

View reviewed changes

revans2 approved these changes Jun 3, 2021

View reviewed changes

andygrove merged commit 1084682 into NVIDIA:branch-21.06 Jun 3, 2021

andygrove deleted the rapids-conf-override branch June 3, 2021 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation #2429

Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation #2429

andygrove commented May 17, 2021 •

edited

Loading

revans2 May 17, 2021

jlowe commented May 18, 2021

revans2 commented May 18, 2021

jlowe commented May 18, 2021

revans2 commented May 18, 2021

andygrove commented May 20, 2021

andygrove commented May 24, 2021

revans2 May 24, 2021

tgravescs Jun 1, 2021

tgravescs Jun 1, 2021

andygrove Jun 1, 2021

abellina commented Jun 2, 2021 •

edited

Loading

andygrove commented Jun 2, 2021

revans2 commented Jun 2, 2021

abellina commented Jun 2, 2021

revans2 commented Jun 2, 2021

abellina commented Jun 2, 2021

andygrove commented Jun 2, 2021

jlowe commented Jun 2, 2021

jlowe left a comment

andygrove commented Jun 2, 2021

andygrove commented Jun 3, 2021

abellina Jun 3, 2021

andygrove Jun 3, 2021

andygrove commented Jun 3, 2021

abellina left a comment

andygrove commented Jun 3, 2021

	val commandLineVariables = System.getenv("SPARK_CONF")
	if (commandLineVariables != null) {
	commandLineVariables.split(",").foreach { s =>
	val a = s.split("=")
	builder.config(a(0), a(1))
	}
	}

Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation #2429

Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation #2429

Conversation

andygrove commented May 17, 2021 • edited Loading

revans2 May 17, 2021

Choose a reason for hiding this comment

jlowe commented May 18, 2021

revans2 commented May 18, 2021

jlowe commented May 18, 2021

revans2 commented May 18, 2021

andygrove commented May 20, 2021

andygrove commented May 24, 2021

revans2 May 24, 2021

Choose a reason for hiding this comment

tgravescs Jun 1, 2021

Choose a reason for hiding this comment

tgravescs Jun 1, 2021

Choose a reason for hiding this comment

andygrove Jun 1, 2021

Choose a reason for hiding this comment

abellina commented Jun 2, 2021 • edited Loading

andygrove commented Jun 2, 2021

revans2 commented Jun 2, 2021

abellina commented Jun 2, 2021

revans2 commented Jun 2, 2021

abellina commented Jun 2, 2021

andygrove commented Jun 2, 2021

jlowe commented Jun 2, 2021

jlowe left a comment

Choose a reason for hiding this comment

andygrove commented Jun 2, 2021

andygrove commented Jun 3, 2021

abellina Jun 3, 2021

Choose a reason for hiding this comment

andygrove Jun 3, 2021

Choose a reason for hiding this comment

andygrove commented Jun 3, 2021

abellina left a comment

Choose a reason for hiding this comment

andygrove commented Jun 3, 2021

andygrove commented May 17, 2021 •

edited

Loading

abellina commented Jun 2, 2021 •

edited

Loading