Skip to content

Commit

Permalink
Change RMM_ALLOC_FRACTION to represent percentage of available memory…
Browse files Browse the repository at this point in the history
…, rather than total memory, for initial allocation (NVIDIA#2429)

* RMM_ALLOC_FRACTION now specifies percentage of *available* memory rather than percentage of *total* memory to use for initial allocation

Signed-off-by: Andy Grove <andygrove73@gmail.com>

* Add check for minimum alloc amount and improve error messages

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* specify minAllocFraction in integration tests

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Specify minAllocFraction=0 when testing in parallel

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add comments in tests where minAllocFraction is set

Signed-off-by: Andy Grove <andygrove@nvidia.com>
  • Loading branch information
andygrove authored Jun 3, 2021
1 parent 4ed5578 commit a462cdc
Show file tree
Hide file tree
Showing 6 changed files with 38 additions and 11 deletions.
3 changes: 2 additions & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,12 @@ Name | Description | Default Value
-----|-------------|--------------
<a name="alluxio.pathsToReplace"></a>spark.rapids.alluxio.pathsToReplace|List of paths to be replaced with corresponding alluxio scheme. Eg, when configureis set to "s3:/foo->alluxio://0.1.2.3:19998/foo,gcs:/bar->alluxio://0.1.2.3:19998/bar", which means: s3:/foo/a.csv will be replaced to alluxio://0.1.2.3:19998/foo/a.csv and gcs:/bar/b.csv will be replaced to alluxio://0.1.2.3:19998/bar/b.csv|None
<a name="cloudSchemes"></a>spark.rapids.cloudSchemes|Comma separated list of additional URI schemes that are to be considered cloud based filesystems. Schemes already included: dbfs, s3, s3a, s3n, wasbs, gs. Cloud based stores generally would be total separate from the executors and likely have a higher I/O read cost. Many times the cloud filesystems also get better throughput when you have multiple readers in parallel. This is used with spark.rapids.sql.format.parquet.reader.type|None
<a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of total GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|0.9
<a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of available GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|0.9
<a name="memory.gpu.debug"></a>spark.rapids.memory.gpu.debug|Provides a log of GPU memory allocations and frees. If set to STDOUT or STDERR the logging will go there. Setting it to NONE disables logging. All other values are reserved for possible future expansion and in the mean time will disable logging.|NONE
<a name="memory.gpu.direct.storage.spill.batchWriteBuffer.size"></a>spark.rapids.memory.gpu.direct.storage.spill.batchWriteBuffer.size|The size of the GPU memory buffer used to batch small buffers when spilling to GDS. Note that this buffer is mapped to the PCI Base Address Register (BAR) space, which may be very limited on some GPUs (e.g. the NVIDIA T4 only has 256 MiB), and it is also used by UCX bounce buffers.|8388608
<a name="memory.gpu.direct.storage.spill.enabled"></a>spark.rapids.memory.gpu.direct.storage.spill.enabled|Should GPUDirect Storage (GDS) be used to spill GPU memory buffers directly to disk. GDS must be enabled and the directory `spark.local.dir` must support GDS. This is an experimental feature. For more information on GDS, see https://docs.nvidia.com/gpudirect-storage/.|false
<a name="memory.gpu.maxAllocFraction"></a>spark.rapids.memory.gpu.maxAllocFraction|The fraction of total GPU memory that limits the maximum size of the RMM pool. The value must be greater than or equal to the setting for spark.rapids.memory.gpu.allocFraction. Note that this limit will be reduced by the reserve memory configured in spark.rapids.memory.gpu.reserve.|1.0
<a name="memory.gpu.minAllocFraction"></a>spark.rapids.memory.gpu.minAllocFraction|The fraction of total GPU memory that limits the minimum size of the RMM pool. The value must be less than or equal to the setting for spark.rapids.memory.gpu.allocFraction.|0.25
<a name="memory.gpu.oomDumpDir"></a>spark.rapids.memory.gpu.oomDumpDir|The path to a local directory where a heap dump will be created if the GPU encounters an unrecoverable out-of-memory (OOM) error. The filename will be of the form: "gpu-oom-<pid>.hprof" where <pid> is the process ID.|None
<a name="memory.gpu.pool"></a>spark.rapids.memory.gpu.pool|Select the RMM pooling allocator to use. Valid values are "DEFAULT", "ARENA", and "NONE". With "DEFAULT", `rmm::mr::pool_memory_resource` is used; with "ARENA", `rmm::mr::arena_memory_resource` is used. If set to "NONE", pooling is disabled and RMM just passes through to CUDA memory allocation directly. Note: "ARENA" is the recommended pool allocator if CUDF is built with Per-Thread Default Stream (PTDS), as "DEFAULT" is known to be unstable (https://github.com/NVIDIA/spark-rapids/issues/1141)|ARENA
<a name="memory.gpu.pooling.enabled"></a>spark.rapids.memory.gpu.pooling.enabled|Should RMM act as a pooling allocator for GPU memory, or should it just pass through to CUDA memory allocation directly. DEPRECATED: please use spark.rapids.memory.gpu.pool instead.|true
Expand Down
5 changes: 3 additions & 2 deletions docs/tuning-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,9 @@ Default value: `0.9`

Allocating memory on a GPU can be an expensive operation. RAPIDS uses a pooling allocator
called [RMM](https://github.com/rapidsai/rmm) to mitigate this overhead. By default, on startup
the plugin will allocate `90%` (`0.9`) of the memory on the GPU and keep it as a pool that can
be allocated from. If the pool is exhausted more memory will be allocated and added to the pool.
the plugin will allocate `90%` (`0.9`) of the _available_ memory on the GPU and keep it as a pool
that can be allocated from. If the pool is exhausted more memory will be allocated and added to
the pool.
Most of the time this is a huge win, but if you need to share the GPU with other
[libraries](additional-functionality/ml-integration.md) that are not aware of RMM this can lead
to memory issues, and you may need to disable pooling.
Expand Down
3 changes: 3 additions & 0 deletions integration_tests/run_pyspark_from_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,9 @@ else
then
export PYSP_TEST_spark_rapids_memory_gpu_allocFraction=$MEMORY_FRACTION
export PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=$MEMORY_FRACTION
# when running tests in parallel, we allocate less than the default minAllocFraction per test
# so we need to override this setting here
export PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0
python "${RUN_TESTS_COMMAND[@]}" "${TEST_PARALLEL_OPTS[@]}" "${TEST_COMMON_OPTS[@]}"
else
"$SPARK_HOME"/bin/spark-submit --jars "${ALL_JARS// /,}" \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,16 +170,24 @@ object GpuDeviceManager extends Logging {
// Align workaround for https://github.com/rapidsai/rmm/issues/527
def truncateToAlignment(x: Long): Long = x & ~511L

var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.total).toLong)
if (initialAllocation > info.free) {
logWarning(s"Initial RMM allocation (${toMB(initialAllocation)} MB) is " +
s"larger than free memory (${toMB(info.free)} MB)")
var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong)
val minAllocation = truncateToAlignment((conf.rmmAllocMinFraction * info.total).toLong)
if (initialAllocation < minAllocation) {
throw new IllegalArgumentException(s"The initial allocation of " +
s"${toMB(initialAllocation)} MB (calculated from ${RapidsConf.RMM_ALLOC_FRACTION} " +
s"(=${conf.rmmAllocFraction}) and ${toMB(info.free)} MB free memory) was less than " +
s"the minimum allocation of ${toMB(minAllocation)} (calculated from " +
s"${RapidsConf.RMM_ALLOC_MIN_FRACTION} (=${conf.rmmAllocMinFraction}) " +
s"and ${toMB(info.total)} MB total memory)")
}
val maxAllocation = truncateToAlignment((conf.rmmAllocMaxFraction * info.total).toLong)
if (maxAllocation < initialAllocation) {
throw new IllegalArgumentException(s"${RapidsConf.RMM_ALLOC_MAX_FRACTION} " +
s"configured as ${conf.rmmAllocMaxFraction} which is less than the " +
s"${RapidsConf.RMM_ALLOC_FRACTION} setting of ${conf.rmmAllocFraction}")
throw new IllegalArgumentException(s"The initial allocation of " +
s"${toMB(initialAllocation)} MB (calculated from ${RapidsConf.RMM_ALLOC_FRACTION} " +
s"(=${conf.rmmAllocFraction}) and ${toMB(info.free)} MB free memory) was more than " +
s"the maximum allocation of ${toMB(maxAllocation)} (calculated from " +
s"${RapidsConf.RMM_ALLOC_MAX_FRACTION} (=${conf.rmmAllocMaxFraction}) " +
s"and ${toMB(info.total)} MB total memory)")
}
val reserveAmount = conf.rmmAllocReserve
if (reserveAmount >= maxAllocation) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -320,10 +320,11 @@ object RapidsConf {
.createOptional

private val RMM_ALLOC_MAX_FRACTION_KEY = "spark.rapids.memory.gpu.maxAllocFraction"
private val RMM_ALLOC_MIN_FRACTION_KEY = "spark.rapids.memory.gpu.minAllocFraction"
private val RMM_ALLOC_RESERVE_KEY = "spark.rapids.memory.gpu.reserve"

val RMM_ALLOC_FRACTION = conf("spark.rapids.memory.gpu.allocFraction")
.doc("The fraction of total GPU memory that should be initially allocated " +
.doc("The fraction of available GPU memory that should be initially allocated " +
"for pooled memory. Extra memory will be allocated as needed, but it may " +
"result in more fragmentation. This must be less than or equal to the maximum limit " +
s"configured via $RMM_ALLOC_MAX_FRACTION_KEY.")
Expand All @@ -340,6 +341,13 @@ object RapidsConf {
.checkValue(v => v >= 0 && v <= 1, "The fraction value must be in [0, 1].")
.createWithDefault(1)

val RMM_ALLOC_MIN_FRACTION = conf(RMM_ALLOC_MIN_FRACTION_KEY)
.doc("The fraction of total GPU memory that limits the minimum size of the RMM pool. " +
s"The value must be less than or equal to the setting for $RMM_ALLOC_FRACTION.")
.doubleConf
.checkValue(v => v >= 0 && v <= 1, "The fraction value must be in [0, 1].")
.createWithDefault(0.25)

val RMM_ALLOC_RESERVE = conf(RMM_ALLOC_RESERVE_KEY)
.doc("The amount of GPU memory that should remain unallocated by RMM and left for " +
"system use such as memory needed for kernels, kernel launches or JIT compilation.")
Expand Down Expand Up @@ -1380,6 +1388,8 @@ class RapidsConf(conf: Map[String, String]) extends Logging {

lazy val rmmAllocMaxFraction: Double = get(RMM_ALLOC_MAX_FRACTION)

lazy val rmmAllocMinFraction: Double = get(RMM_ALLOC_MIN_FRACTION)

lazy val rmmAllocReserve: Long = get(RMM_ALLOC_RESERVE)

lazy val hostSpillStorageSize: Long = get(HOST_SPILL_STORAGE_SIZE)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,14 @@ class GpuDeviceManagerSuite extends FunSuite with Arm {
val totalGpuSize = Cuda.memGetInfo().total
val initPoolFraction = 0.1
val maxPoolFraction = 0.2
// we need to reduce the minAllocFraction for this test since the
// initial allocation here is less than the default minimum
val minPoolFraction = 0.01
val conf = new SparkConf()
.set(RapidsConf.POOLED_MEM.key, "true")
.set(RapidsConf.RMM_POOL.key, "ARENA")
.set(RapidsConf.RMM_ALLOC_FRACTION.key, initPoolFraction.toString)
.set(RapidsConf.RMM_ALLOC_MIN_FRACTION.key, minPoolFraction.toString)
.set(RapidsConf.RMM_ALLOC_MAX_FRACTION.key, maxPoolFraction.toString)
.set(RapidsConf.RMM_ALLOC_RESERVE.key, "0")
try {
Expand Down

0 comments on commit a462cdc

Please sign in to comment.