Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation #2429

Merged
merged 5 commits into from
Jun 3, 2021

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented May 17, 2021

Signed-off-by: Andy Grove andygrove73@gmail.com

When running on a desktop where I am using the GPU to drive the display, I need to reduce the value for the spark.rapids.memory.gpu.allocFraction settings.

This PR changes the meaning of RMM_ALLOC_FRACTION to be based on % of available rather than total GPU memory. Note that this is just used for the initial allocation and doesn't not restrict total memory usage.

Rather than modifying the source code and remembering to undo the change before committing changes, I would prefer to be able to override this setting from the command line when running Maven.

This PR provides a generic way to override RapidsConf settings using Java system properties.

For example, when running tests with Maven, I can now run:

mvn clean verify -DargLine="-Dspark.rapids.memory.gpu.allocFraction=0.7"

When running tests from an IDE, I can specify -Dspark.rapids.memory.gpu.allocFraction=0.7.

@andygrove andygrove self-assigned this May 17, 2021
@andygrove andygrove added this to the May 10 - May 21 milestone May 17, 2021
@andygrove andygrove added the ease of use Makes the product simpler to use or configure label May 17, 2021
CONTRIBUTING.md Outdated
syntax can be used to pass Java system properties to the Maven Surefire plugin.

```shell script
mvn verify -DargLine="-Dspark.rapids.memory.gpu.allocFraction=0.7"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you want this, but it scares me. Spark does not support this and if someone sets a config through a system property in production I'm not sure how we can debug it. Spark will not see it so it will not be logged/displayed in the typical places. We already have something that does this just for the tests but it is not well documented.

val commandLineVariables = System.getenv("SPARK_CONF")
if (commandLineVariables != null) {
commandLineVariables.split(",").foreach { s =>
val a = s.split("=")
builder.config(a(0), a(1))
}
}

Could you use that instead? and document it a little better.

@jlowe
Copy link
Member

jlowe commented May 18, 2021

The side-channel hacks to adjust the Spark configs are interesting but IMO don't get to the crux of the problem: the defaults do not work for this kind of setup. That's a bad initial experience for someone just trying things out on their desktop with their display GPU.

Should we make at least the initial pool allocation fraction be based on free memory rather than total memory? Or is there an alternative where the system defaults still perform well on our expected setup but don't crash outright on this setup?

@revans2
Copy link
Collaborator

revans2 commented May 18, 2021

The side-channel hacks to adjust the Spark configs are interesting but IMO don't get to the crux of the problem: the defaults do not work for this kind of setup. That's a bad initial experience for someone just trying things out on their desktop with their display GPU.

Should we make at least the initial pool allocation fraction be based on free memory rather than total memory? Or is there an alternative where the system defaults still perform well on our expected setup but don't crash outright on this setup?

We initially had it on free memory, but it makes it impossible to share a GPU among tasks in a production setting, and can mask a GPU accidentally being shared.

I am fine with us having different defaults for the unit/integration tests that look at the free GPU memory and try to configure things accordingly, so long as we also fail if there is a bad setup.

@jlowe
Copy link
Member

jlowe commented May 18, 2021

it makes it impossible to share a GPU among tasks in a production setting

Not following. All I'm talking about is the initial RMM pool size, not the maximum pool size or anything that has to do with tasks. The RMM pool will try to grow from the initial size as it does today if the application attempts to allocate beyond that initial size.

can mask a GPU accidentally being shared.

Yes, that is definitely the case, as we are trying to share in this setup. That's sort of my point -- should we make it easier to share by default. Maybe the answer is no. If that's the case, maybe we should emit a more useful error message when the initial pool size cannot be allocated to point users to the possibility that the GPU is being used by another process concurrently? Bonus points for emitting details about the GPU that is being used, amount of free memory on it, amount trying to be allocated, etc.

@revans2
Copy link
Collaborator

revans2 commented May 18, 2021

If it is just the initial pool size then I would be fine with a change like that. We would need a lot of good documentation explaining it.

@pxLi pxLi changed the base branch from branch-0.6 to branch-21.06 May 19, 2021 01:08
…her than percentage of *total* memory to use for initial allocation

Signed-off-by: Andy Grove <andygrove73@gmail.com>
@andygrove andygrove force-pushed the rapids-conf-override branch from 04b031c to 07ef0d7 Compare May 20, 2021 21:27
@andygrove andygrove changed the title Allow RapidsConf settings to be overridden by Java system properties Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation May 20, 2021
@andygrove
Copy link
Contributor Author

I've updated this and it works well for my desktop. I wasn't sure whether to use the term available or free in the documentation but I went for available since that is the term we use when describing the behavior of spark.rapids.python.memory.gpu.allocFraction and hopefully I'm using the term consistently with that.

@andygrove
Copy link
Contributor Author

build

@@ -168,7 +168,7 @@ object GpuDeviceManager extends Logging {
// Align workaround for https://github.com/rapidsai/rmm/issues/527
def truncateToAlignment(x: Long): Long = x & ~511L

var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.total).toLong)
var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong)
if (initialAllocation > info.free) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is useless now, but we probably want to have some checks for a minimum amount of free memory and warn about that, even before the initial allocation.

@@ -168,7 +168,7 @@ object GpuDeviceManager extends Logging {
// Align workaround for https://github.com/rapidsai/rmm/issues/527
def truncateToAlignment(x: Long): Long = x & ~511L

var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.total).toLong)
var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what if free is tiny? do we want a minimum, sometimes I can see this having caught bad setups. Meaning you tried to start 2 exectuors on same gpu when you shouldn't have, so now this will hide that kind of failures. I guess you will probably start seeing failures but I'm wondering how hard they will be to debug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I now see Bobby made similar comment about minimum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed Bobby's previous comment somehow. I am working on addressing this now. I also noticed that some of the existing error messages need a little rework as well.

@andygrove andygrove linked an issue Jun 1, 2021 that may be closed by this pull request
@abellina
Copy link
Collaborator

abellina commented Jun 2, 2021

Is this what we want? Perhaps I misunderstand this PR.

Say I set allocFraction to 50% and launch N executors using a 32GB GPU:

  • First executor: 16 GB
  • Second executor: 8 GB
  • Third executor: 4 GB
  • Fourth executor: 2 GB
  • and so on.

Would spark.rapids.memory.gpu.reserve not help this case?

@andygrove
Copy link
Contributor Author

That's a good point @abellina ... I will rethink the approach and look at having the initial allocation for the tests be set based on free memory instead

@revans2
Copy link
Collaborator

revans2 commented Jun 2, 2021

@abellina @andygrove the reason why we are doing this is because what you described does not happen in practice. Yes it happens for our tests, but that is the only place. We are not recommending that people share a GPU between different executors, and we have never recommended that. In fact Spark, kuberneted, and Yarn do not support doing it. We had to hack things up to make it work for our tests.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@abellina
Copy link
Collaborator

abellina commented Jun 2, 2021

Maybe there could be a config that sets what we are basing the fraction from: free or total. Use free as default, especially for production setups, but use total if you know what you are doing (the test hack)?

But agree that basing it off free by default is going to make for a better experience.

@revans2
Copy link
Collaborator

revans2 commented Jun 2, 2021

Why add a config that no one is going to set? If we hit a situation where we need a config we can look into it. But for now I would just have a hard coded minimum amount of free memory where we output a warning if the initial allocation would be too low. That is not going to save us in all cases (because there will be races) but it should help

@abellina
Copy link
Collaborator

abellina commented Jun 2, 2021

Why add a config that no one is going to set?

Never mind. I missed that since this is just the init setting, it will grow as necessary. This doesn't preclude the application I had in mind (to test shuffle locally), as I originally thought. Thanks @revans2 @andygrove.

@andygrove
Copy link
Contributor Author

This is ready for another review. I added the minimum check. I could not figure out a good way to test it though.

abellina
abellina previously approved these changes Jun 2, 2021
jlowe
jlowe previously approved these changes Jun 2, 2021
@jlowe
Copy link
Member

jlowe commented Jun 2, 2021

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests failed, probably because it intentionally shares the GPU to run multiple tests in parallel. The integration tests probably need to be updated to set the new min alloc config.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove dismissed stale reviews from jlowe and abellina via 160e200 June 2, 2021 23:39
@andygrove
Copy link
Contributor Author

build

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove
Copy link
Contributor Author

build

abellina
abellina previously approved these changes Jun 3, 2021
@@ -83,6 +83,7 @@ def with_gpu_session(func, conf={}):
"""
copy = dict(conf)
copy['spark.rapids.sql.enabled'] = 'true'
copy['spark.rapids.memory.gpu.minAllocFraction'] = '0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, could we add a comment here (and the other place where this is set for the tests)? So we remember why this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

jlowe
jlowe previously approved these changes Jun 3, 2021
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove dismissed stale reviews from jlowe and abellina via 8e12208 June 3, 2021 14:27
@andygrove
Copy link
Contributor Author

build

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 1084682 into NVIDIA:branch-21.06 Jun 3, 2021
@andygrove andygrove deleted the rapids-conf-override branch June 3, 2021 22:40
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…, rather than total memory, for initial allocation (NVIDIA#2429)

* RMM_ALLOC_FRACTION now specifies percentage of *available* memory rather than percentage of *total* memory to use for initial allocation

Signed-off-by: Andy Grove <andygrove73@gmail.com>

* Add check for minimum alloc amount and improve error messages

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* specify minAllocFraction in integration tests

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Specify minAllocFraction=0 when testing in parallel

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add comments in tests where minAllocFraction is set

Signed-off-by: Andy Grove <andygrove@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…, rather than total memory, for initial allocation (NVIDIA#2429)

* RMM_ALLOC_FRACTION now specifies percentage of *available* memory rather than percentage of *total* memory to use for initial allocation

Signed-off-by: Andy Grove <andygrove73@gmail.com>

* Add check for minimum alloc amount and improve error messages

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* specify minAllocFraction in integration tests

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Specify minAllocFraction=0 when testing in parallel

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add comments in tests where minAllocFraction is set

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ease of use Makes the product simpler to use or configure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Can RTX 3080 be supported with plugin 0.6 and CUDA 11?
6 participants