You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This task could live in cuDF but I am putting it here since it is Spark related. We can add another issue in cuDF if we want to proceed.
This issue: #4669 brought to light a shutdown race between JVM's shutdown hooks and Spark's orderly ShutdownHookManager (also triggered by the JVM). Since the JVM doesn't guarantee the order of execution, the MemoryCleaner in cuDF could think there is a leak, which it checks when shutting down, when it really is a false positive since the SparkContext hasn't been stopped (so ExecutorPlugin is still alive, and so is the RapidsBufferCatalog).
Ideally we could add the MemoryCleaner leak check as the lowest priority (last) hook that Spark will trigger, to prevent false positives.
This is only visible when the ref count debug is turned on, so it is not an issue for production jobs, and hence it is low priority.
The text was updated successfully, but these errors were encountered:
pxLi
changed the title
[FEA] Research associating MemoryCleaner to Spark's ShutdownHookManager
[BUG] Research associating MemoryCleaner to Spark's ShutdownHookManager
Aug 4, 2022
This task could live in cuDF but I am putting it here since it is Spark related. We can add another issue in cuDF if we want to proceed.
This issue: #4669 brought to light a shutdown race between JVM's shutdown hooks and Spark's orderly
ShutdownHookManager
(also triggered by the JVM). Since the JVM doesn't guarantee the order of execution, theMemoryCleaner
in cuDF could think there is a leak, which it checks when shutting down, when it really is a false positive since theSparkContext
hasn't been stopped (soExecutorPlugin
is still alive, and so is theRapidsBufferCatalog
).Ideally we could add the
MemoryCleaner
leak check as the lowest priority (last) hook that Spark will trigger, to prevent false positives.This is only visible when the ref count debug is turned on, so it is not an issue for production jobs, and hence it is low priority.
The text was updated successfully, but these errors were encountered: