You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue reported here #4664 shows an underlying design issue with the singleton RapidsBufferStore.
In the test, the SparkSession was being stopped to prevent other problems (see #4669), but the underlying problem still remains: Is calling .close() on a session and creating a new session supported by Spark?
For shuffle specifically, when .close() is called on the session the SparkContext is also stopped, which resets the shuffle ids used and possibly ids for other blocks I haven't looked into (like broadcast blocks). The issue appears to be that stopping the context doesn't mean that unregisterShuffle calls are getting issued synchronously (hence the bug reported in #4664, because otherwise the store's block would have been removed). So there is some state that isn't clean when the context .close() returns, and there could be a late unregisterShuffle.
At this stage what we think should happen is that RapidsBufferStore should not be a singleton and somehow tied to the context. So that an unregisterShuffle (or other cleanup call) can be ignored for a store that was meant for a new context.
The text was updated successfully, but these errors were encountered:
The issue reported here #4664 shows an underlying design issue with the singleton
RapidsBufferStore
.In the test, the
SparkSession
was being stopped to prevent other problems (see #4669), but the underlying problem still remains: Is calling.close()
on a session and creating a new session supported by Spark?For shuffle specifically, when
.close()
is called on the session theSparkContext
is also stopped, which resets the shuffle ids used and possibly ids for other blocks I haven't looked into (like broadcast blocks). The issue appears to be that stopping the context doesn't mean thatunregisterShuffle
calls are getting issued synchronously (hence the bug reported in #4664, because otherwise the store's block would have been removed). So there is some state that isn't clean when the context.close()
returns, and there could be a lateunregisterShuffle
.At this stage what we think should happen is that
RapidsBufferStore
should not be a singleton and somehow tied to the context. So that anunregisterShuffle
(or other cleanup call) can be ignored for a store that was meant for a new context.The text was updated successfully, but these errors were encountered: