Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49548][CONNECT] Replace coarse-locking in SparkConnectSessionManager with ConcurrentMap #48036

Closed
wants to merge 2 commits into from

Conversation

changgyoopark-db
Copy link
Contributor

What changes were proposed in this pull request?

Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions.

Why are the changes needed?

It is a spin-off from #48034 where #48034 addresses many-execution cases whereas this addresses many-session situations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

@changgyoopark-db changgyoopark-db changed the title [WIP][SPARK_49548][CONNECT] Replace coarse-locking in SparkConnectSessionManager with ConcurrentMap [SPARK-49548][CONNECT] Replace coarse-locking in SparkConnectSessionManager with ConcurrentMap Sep 9, 2024
@changgyoopark-db
Copy link
Contributor Author

@juliuszsompolski Can you please review this PR (as you're the one who wrote the majority of the code in this file)? Thanks!

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all works. Thanks for optimizing this!

@changgyoopark-db
Copy link
Contributor Author

@hvanhovell @HyukjinKwon Hello Herman and Hyukjin, would you mind merging this PR? Thanks!

@changgyoopark-db
Copy link
Contributor Author

@grundprinzip Hi Martin, can you review and merge this change too? Thanks a lot!

scheduledExecutor.foreach { executor =>
ThreadUtils.shutdown(executor, FiniteDuration(1, TimeUnit.MINUTES))
private[connect] def shutdown(): Unit = {
sessionsLock.synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add Julek's comment here on why sessionLock is needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing this! As far as I understand, his comments address a potential data race between sessionStore and closedSessionsCache, and the data race can be resolved without relying on this particular lock: not quite related to this piece of code.

Locking here is to protect scheduledExecutor, and I think that is self-explanatory at line 55 (@GuardedBy("sessionsLock")).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the comment above 👍

scheduledExecutor.foreach { executor =>
ThreadUtils.shutdown(executor, FiniteDuration(1, TimeUnit.MINUTES))
private[connect] def shutdown(): Unit = {
sessionsLock.synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the comment above 👍

@HyukjinKwon
Copy link
Member

Merged to master.

@changgyoopark-db changgyoopark-db deleted the SPARK-49548 branch September 19, 2024 06:13
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…anager with ConcurrentMap

### What changes were proposed in this pull request?

Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions.

### Why are the changes needed?

It is a spin-off from apache#48034 where apache#48034 addresses many-execution cases whereas this addresses many-session situations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48036 from changgyoopark-db/SPARK-49548.

Authored-by: Changgyoo Park <changgyoo.park@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…anager with ConcurrentMap

### What changes were proposed in this pull request?

Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions.

### Why are the changes needed?

It is a spin-off from apache#48034 where apache#48034 addresses many-execution cases whereas this addresses many-session situations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48036 from changgyoopark-db/SPARK-49548.

Authored-by: Changgyoo Park <changgyoo.park@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants