Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Closing and restarting a SparkSession can remove valid blocks in RapidsBufferStore #4680

Open
abellina opened this issue Feb 2, 2022 · 0 comments
Labels
bug Something isn't working shuffle things that impact the shuffle plugin

Comments

@abellina
Copy link
Collaborator

abellina commented Feb 2, 2022

The issue reported here #4664 shows an underlying design issue with the singleton RapidsBufferStore.

In the test, the SparkSession was being stopped to prevent other problems (see #4669), but the underlying problem still remains: Is calling .close() on a session and creating a new session supported by Spark?

For shuffle specifically, when .close() is called on the session the SparkContext is also stopped, which resets the shuffle ids used and possibly ids for other blocks I haven't looked into (like broadcast blocks). The issue appears to be that stopping the context doesn't mean that unregisterShuffle calls are getting issued synchronously (hence the bug reported in #4664, because otherwise the store's block would have been removed). So there is some state that isn't clean when the context .close() returns, and there could be a late unregisterShuffle.

At this stage what we think should happen is that RapidsBufferStore should not be a singleton and somehow tied to the context. So that an unregisterShuffle (or other cleanup call) can be ignored for a store that was meant for a new context.

@abellina abellina added bug Something isn't working ? - Needs Triage Need team to review and classify shuffle things that impact the shuffle plugin labels Feb 2, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working shuffle things that impact the shuffle plugin
Projects
None yet
Development

No branches or pull requests

2 participants