Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Threads.Condition for package locks #41366

Closed
wants to merge 1 commit into from
Closed

Conversation

vchuravy
Copy link
Member

@Sacha0 reported a concurrency violation in this code.

ERROR: On worker 4:
LoadError: ConcurrencyViolationError("lock must be held")
Stacktrace:
  [1] concurrency_violation
    @ ./condition.jl:8
  [2] assert_havelock
    @ ./condition.jl:25 [inlined]
  [3] assert_havelock
    @ ./condition.jl:48 [inlined]
  [4] assert_havelock
    @ ./condition.jl:72 [inlined]
  [5] notify
    @ ./condition.jl:132
  [6] #notify#548
    @ ./condition.jl:130 [inlined]
  [7] _require
    @ ./loading.jl:1159
  [8] require
    @ ./loading.jl:1013
  [9] require
    @ ./loading.jl:997
 [10] top-level scope
    @ /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/macros.jl:200
 [11] include
    @ ./client.jl:451
 [12] top-level scope
    @ none:1
 [13] eval
    @ ./boot.jl:373
 [14] #115
    @ /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:274
 [15] run_work_thunk
    @ /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:63
 [16] run_work_thunk
    @ /build/source/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:72
 [17] #108
    @ ./threadingconstructs.jl:178
in expression starting at /tmp/nix-build-delve-build.drv-0/delve-source/test-

This was on #38405, but I suspect with enough creativity one could come up with a pure Threads.@Spawn example.
@Sacha0 can I punt thinking about a test to you?

@vchuravy vchuravy added the multithreading Base.Threads and related functionality label Jun 25, 2021
@vchuravy vchuravy requested a review from vtjnash June 25, 2021 17:48
Comment on lines +1077 to +1078
lock(loading) do
wait(loading)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock/wait code pattern is always wrong (it is why it forces you to write this out), since it is failing to protect the data. You'll need to rewrite much more of this code to be thread-safe and data-race-free.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on that? The usage pattern of Threads.Condition is something that is sparsely documented https://docs.julialang.org/en/v1/base/multi-threading/#Synchronization, so if this is "always" wrong it would be great if you could improve the documentation.

Is your point that I might be woken to soon and I need to ensure that the condition I am waiting for has actually been met?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you take a lock, you need to ensure it contains all of the shared data

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use a lock for all the global package state, which can be shared by all the per-package Conditions?

@Sacha0 Sacha0 added backport 1.7 bugfix This change fixes an existing bug labels Jun 26, 2021
@KristofferC
Copy link
Sponsor Member

KristofferC commented Jul 1, 2021

Since this is marked as a "bugfix". What is the bug here, how can it be triggered? Anything that has changed here compared to earlier Julia versions?

@vchuravy
Copy link
Member Author

vchuravy commented Jul 1, 2021

Since this is marked as a "bugfix". What is the bug here, how can it be triggered? Anything that has changed here compared to earlier Julia versions?

It's not a regression, but package loading/require is not threadsafe. I can try coming up with a MWE, but the error above stems from a parallel test runner used at RelationalAI.

@vchuravy
Copy link
Member Author

I wont have time to work on this, but I created a reproducer independent of Distributed.jl #41546

@vchuravy vchuravy closed this Jul 11, 2021
@vchuravy vchuravy deleted the vc/pkg_loading branch July 11, 2021 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants