-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49548][CONNECT] Replace coarse-locking in SparkConnectSessionManager with ConcurrentMap #48036
Conversation
40d0ec4
to
259d5cd
Compare
@juliuszsompolski Can you please review this PR (as you're the one who wrote the majority of the code in this file)? Thanks! |
.../server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala
Outdated
Show resolved
Hide resolved
259d5cd
to
ae4c5c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this all works. Thanks for optimizing this!
@hvanhovell @HyukjinKwon Hello Herman and Hyukjin, would you mind merging this PR? Thanks! |
@grundprinzip Hi Martin, can you review and merge this change too? Thanks a lot! |
scheduledExecutor.foreach { executor => | ||
ThreadUtils.shutdown(executor, FiniteDuration(1, TimeUnit.MINUTES)) | ||
private[connect] def shutdown(): Unit = { | ||
sessionsLock.synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add Julek's comment here on why sessionLock is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing this! As far as I understand, his comments address a potential data race between sessionStore and closedSessionsCache, and the data race can be resolved without relying on this particular lock: not quite related to this piece of code.
Locking here is to protect scheduledExecutor, and I think that is self-explanatory at line 55 (@GuardedBy("sessionsLock")).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see the comment above 👍
scheduledExecutor.foreach { executor => | ||
ThreadUtils.shutdown(executor, FiniteDuration(1, TimeUnit.MINUTES)) | ||
private[connect] def shutdown(): Unit = { | ||
sessionsLock.synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see the comment above 👍
Merged to master. |
…anager with ConcurrentMap ### What changes were proposed in this pull request? Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions. ### Why are the changes needed? It is a spin-off from apache#48034 where apache#48034 addresses many-execution cases whereas this addresses many-session situations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48036 from changgyoopark-db/SPARK-49548. Authored-by: Changgyoo Park <changgyoo.park@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…anager with ConcurrentMap ### What changes were proposed in this pull request? Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions. ### Why are the changes needed? It is a spin-off from apache#48034 where apache#48034 addresses many-execution cases whereas this addresses many-session situations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48036 from changgyoopark-db/SPARK-49548. Authored-by: Changgyoo Park <changgyoo.park@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions.
Why are the changes needed?
It is a spin-off from #48034 where #48034 addresses many-execution cases whereas this addresses many-session situations.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing test cases.
Was this patch authored or co-authored using generative AI tooling?
No.