[BUG]cuML using memory outside of RMM Pool #4485
Labels
? - Needs Triage
Need team to review and classify
bug
Something isn't working
inactive-30d
inactive-90d
Describe the bug
I am observing we use
426 Mib
memory outside the pool when training/using a cuML model.See MRE below (trace here) where we throw an CUSOLVER_STATUS_INTERNAL_ERROR when we set pool to a limit near the devices memory limit(15109MiB in this case) . Please note that, this works if set pool to a smaller value or don't set one at all.
Steps/Code to reproduce bug
Expected behavior
I would expect us to use the RMM Pool
Additional Context:
This seems to be cause of problems in a dask-sql+dask-ml workflow where the pool grows to maximum device memory ( which is the default behavior) causing problems with the ML inference.
CC: @randerzander
The text was updated successfully, but these errors were encountered: