-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation #2429
Conversation
CONTRIBUTING.md
Outdated
syntax can be used to pass Java system properties to the Maven Surefire plugin. | ||
|
||
```shell script | ||
mvn verify -DargLine="-Dspark.rapids.memory.gpu.allocFraction=0.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you want this, but it scares me. Spark does not support this and if someone sets a config through a system property in production I'm not sure how we can debug it. Spark will not see it so it will not be logged/displayed in the typical places. We already have something that does this just for the tests but it is not well documented.
spark-rapids/tests/src/test/scala/com/nvidia/spark/rapids/SparkQueryCompareTestSuite.scala
Lines 88 to 94 in 247b758
val commandLineVariables = System.getenv("SPARK_CONF") | |
if (commandLineVariables != null) { | |
commandLineVariables.split(",").foreach { s => | |
val a = s.split("=") | |
builder.config(a(0), a(1)) | |
} | |
} |
Could you use that instead? and document it a little better.
The side-channel hacks to adjust the Spark configs are interesting but IMO don't get to the crux of the problem: the defaults do not work for this kind of setup. That's a bad initial experience for someone just trying things out on their desktop with their display GPU. Should we make at least the initial pool allocation fraction be based on free memory rather than total memory? Or is there an alternative where the system defaults still perform well on our expected setup but don't crash outright on this setup? |
We initially had it on free memory, but it makes it impossible to share a GPU among tasks in a production setting, and can mask a GPU accidentally being shared. I am fine with us having different defaults for the unit/integration tests that look at the free GPU memory and try to configure things accordingly, so long as we also fail if there is a bad setup. |
Not following. All I'm talking about is the initial RMM pool size, not the maximum pool size or anything that has to do with tasks. The RMM pool will try to grow from the initial size as it does today if the application attempts to allocate beyond that initial size.
Yes, that is definitely the case, as we are trying to share in this setup. That's sort of my point -- should we make it easier to share by default. Maybe the answer is no. If that's the case, maybe we should emit a more useful error message when the initial pool size cannot be allocated to point users to the possibility that the GPU is being used by another process concurrently? Bonus points for emitting details about the GPU that is being used, amount of free memory on it, amount trying to be allocated, etc. |
If it is just the initial pool size then I would be fine with a change like that. We would need a lot of good documentation explaining it. |
…her than percentage of *total* memory to use for initial allocation Signed-off-by: Andy Grove <andygrove73@gmail.com>
04b031c
to
07ef0d7
Compare
I've updated this and it works well for my desktop. I wasn't sure whether to use the term |
build |
@@ -168,7 +168,7 @@ object GpuDeviceManager extends Logging { | |||
// Align workaround for https://github.com/rapidsai/rmm/issues/527 | |||
def truncateToAlignment(x: Long): Long = x & ~511L | |||
|
|||
var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.total).toLong) | |||
var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong) | |||
if (initialAllocation > info.free) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is useless now, but we probably want to have some checks for a minimum amount of free memory and warn about that, even before the initial allocation.
@@ -168,7 +168,7 @@ object GpuDeviceManager extends Logging { | |||
// Align workaround for https://github.com/rapidsai/rmm/issues/527 | |||
def truncateToAlignment(x: Long): Long = x & ~511L | |||
|
|||
var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.total).toLong) | |||
var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what if free is tiny? do we want a minimum, sometimes I can see this having caught bad setups. Meaning you tried to start 2 exectuors on same gpu when you shouldn't have, so now this will hide that kind of failures. I guess you will probably start seeing failures but I'm wondering how hard they will be to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops I now see Bobby made similar comment about minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had missed Bobby's previous comment somehow. I am working on addressing this now. I also noticed that some of the existing error messages need a little rework as well.
Is this what we want? Perhaps I misunderstand this PR. Say I set allocFraction to 50% and launch N executors using a 32GB GPU:
Would |
That's a good point @abellina ... I will rethink the approach and look at having the initial allocation for the tests be set based on free memory instead |
@abellina @andygrove the reason why we are doing this is because what you described does not happen in practice. Yes it happens for our tests, but that is the only place. We are not recommending that people share a GPU between different executors, and we have never recommended that. In fact Spark, kuberneted, and Yarn do not support doing it. We had to hack things up to make it work for our tests. |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Maybe there could be a config that sets what we are basing the fraction from: free or total. Use free as default, especially for production setups, but use total if you know what you are doing (the test hack)? But agree that basing it off free by default is going to make for a better experience. |
Why add a config that no one is going to set? If we hit a situation where we need a config we can look into it. But for now I would just have a hard coded minimum amount of free memory where we output a warning if the initial allocation would be too low. That is not going to save us in all cases (because there will be races) but it should help |
Never mind. I missed that since this is just the init setting, it will grow as necessary. This doesn't preclude the application I had in mind (to test shuffle locally), as I originally thought. Thanks @revans2 @andygrove. |
This is ready for another review. I added the minimum check. I could not figure out a good way to test it though. |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration tests failed, probably because it intentionally shares the GPU to run multiple tests in parallel. The integration tests probably need to be updated to set the new min alloc config.
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
@@ -83,6 +83,7 @@ def with_gpu_session(func, conf={}): | |||
""" | |||
copy = dict(conf) | |||
copy['spark.rapids.sql.enabled'] = 'true' | |||
copy['spark.rapids.memory.gpu.minAllocFraction'] = '0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, could we add a comment here (and the other place where this is set for the tests)? So we remember why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove
build |
…, rather than total memory, for initial allocation (NVIDIA#2429) * RMM_ALLOC_FRACTION now specifies percentage of *available* memory rather than percentage of *total* memory to use for initial allocation Signed-off-by: Andy Grove <andygrove73@gmail.com> * Add check for minimum alloc amount and improve error messages Signed-off-by: Andy Grove <andygrove@nvidia.com> * specify minAllocFraction in integration tests Signed-off-by: Andy Grove <andygrove@nvidia.com> * Specify minAllocFraction=0 when testing in parallel Signed-off-by: Andy Grove <andygrove@nvidia.com> * Add comments in tests where minAllocFraction is set Signed-off-by: Andy Grove <andygrove@nvidia.com>
…, rather than total memory, for initial allocation (NVIDIA#2429) * RMM_ALLOC_FRACTION now specifies percentage of *available* memory rather than percentage of *total* memory to use for initial allocation Signed-off-by: Andy Grove <andygrove73@gmail.com> * Add check for minimum alloc amount and improve error messages Signed-off-by: Andy Grove <andygrove@nvidia.com> * specify minAllocFraction in integration tests Signed-off-by: Andy Grove <andygrove@nvidia.com> * Specify minAllocFraction=0 when testing in parallel Signed-off-by: Andy Grove <andygrove@nvidia.com> * Add comments in tests where minAllocFraction is set Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove andygrove73@gmail.com
When running on a desktop where I am using the GPU to drive the display, I need to reduce the value for the
spark.rapids.memory.gpu.allocFraction
settings.This PR changes the meaning of
RMM_ALLOC_FRACTION
to be based on % of available rather than total GPU memory. Note that this is just used for the initial allocation and doesn't not restrict total memory usage.Rather than modifying the source code and remembering to undo the change before committing changes, I would prefer to be able to override this setting from the command line when running Maven.This PR provides a generic way to override RapidsConf settings using Java system properties.For example, when running tests with Maven, I can now run:mvn clean verify -DargLine="-Dspark.rapids.memory.gpu.allocFraction=0.7"
When running tests from an IDE, I can specify-Dspark.rapids.memory.gpu.allocFraction=0.7
.