-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Multiple concurrent writes to Dict detected!"
with DTables.reduce
#437
Comments
Yep, definitely a Dagger bug (and in the same newly-upgraded submission logic)! I see the bug - I forgot to add locking at Line 128 in 0d525cc
else branch). Will plan to push a fix tonight, and also will take a look at that copyto! error and see if it's related.
Thanks again for the excellent reporting! |
Thanks, I'm just glad you're able to fix these issues pretty quickly! |
I'm finding a possibly related error in some code I have that looks similar to the OP example. Code excerpt: gdt = groupby(dt, cols)
gkeys = sort!(collect(keys(gdt)))
sums = map(gkeys) do key
reduce(+, gdt[key]; cols = sum_cols)
end .|> fetch Error (it's printed to a log, so the nice formatting is lost, unfortunately):
The key part of the error I think is: |
Aside from the most obvious issue (which I have fixed locally), I'm also seeing a variety of concurrency issues, and am trying to narrow them down. |
Thanks for the update, hopefully the other issues can be resolved soon as well! |
@jpsamaroo Any updates on this front? |
Sorry, not yet, I'm in the middle of getting ready to move across the US, so I will have to get back to this over the weekend/next week. I did find a variety of nearly identical segfaults across the stack, so there is definitely a common source, I just need to find it. |
I've narrowed this down to some issue with the usage of |
Great, thanks for the update and for your work on this; I really appreciate it!
I got to do that a year ago; it's definitely a lot of work, so I understand how busy you must be! Hopefully that all goes smoothly for you! |
I've found the issue - |
Ok, I've got a fix locally for this that gets the following example working: using Dagger, DTables
using Distributed
addprocs(1)
@everywhere using DTables
remotecall_fetch(2) do
N = 2
dt = DTable((a = 1:N, b = rand(N)))
@sync for i in 1:20
Threads.@spawn begin
println("Iter $i")
fetch(reduce(+, dt; cols = [:a]))
end
end
end I'll push the full fix for it soon! Another fun issue that was found in the process of debugging this: |
I occasionally get the above error message with the following example. I'm not sure if this issue should go in DTables.jl, but I'm putting it here because the other issues I posted there got migrated here :)
Contents of
mwe.jl
:Results:
Notice that not every run results in the error.
I also occasionally see the following error printed, but the result of the
remotecall_fetch
still returns normally with the correct answer:This is with DTables v0.4.1 and Dagger v0.18.3.
The text was updated successfully, but these errors were encountered: