-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashes when using @threads with intersection functions #91
Comments
In julia there is generally no performance penalty on loops, so I wouldn't worry about getting rid of them. |
Got it, thanks! |
You could add |
I'm not sure if libGEOS is thread safe, so be careful with using threads. |
From https://trac.osgeo.org/geos/wiki/RFC3
EDIT: Be careful with using threads. |
I am not sure if this is worth opening another issue, but I am having some troubles with these functions being thread safe I believe. When I run the following MWE code (written for this question)
I get a very long error that kills my Julia session, specifically:
This error is avoided by removing the Threads.@threads. Additionally, it oddly works when I reduce the number of loops (e.g., changing it to Threads.@threads for j = 1:10). I have 4 cores available. |
I see this crashing as well. Here is a bit simpler example that crashes every time I run it (Julia 1.7.1, GEOS_jll 3.10.0+0, LibGEOS.jl 0.6.8)
If I add some randomness
the crashing still happens, but much more seldomly. |
Even simpler crasher
|
@jaakkor2, about #91 (comment) I haven't tried threading yet, but I thought the idea of the |
Sorry for confusion. It seems that using one global context and threading is asking for trouble. https://lists.osgeo.org/pipermail/geos-devel/2017-November/008157.html discussion here says that the API is thread-safe, but the implementation is not. And likely Julia interface needs modification.. |
Ah good to know. Would still be interesting to try it out with thread local contexts, to see if any remaining thread safety issues cause problems for examples like this. |
This still crashes
|
If
|
@jaakkor2 The following seems to work using
|
Sorry, the above is not correct and operates on a single core. My apologies, I am quite new to Julia! |
If you are testing these please lock access to the shared resource. |
Unfortunately the contexts don't seem to be published. |
Hmm good point about locking access to share resources. And you are right that these evaled functions don't include the optional context argument, it would be good to add them. If you want to test the contexts for threading purposes, you can replace |
I've responded in https://discourse.julialang.org/t/improving-performance-on-nested-for-loops-sparsearrays-libgeos/74038/11: I don’t think it was an intentional design for it to remain internal, just that we didn’t implement it all the way through back then. |
@skygering I haven't been able to reproduce any of the crashes on | | |_| | | | (_| | | Version 1.8.0 (2022-08-17)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
...
(@v1.8) pkg> st
Status `~/.julia/environments/v1.8/Project.toml`
...
[cf35fbd7] GeoInterface v1.0.1
[a90b1aa1] LibGEOS v0.7.2 Can I doublecheck which version of Julia and LibGEOS are you reproducing the crashes on? For #91 (comment), I get: | | |_| | | | (_| | | Version 1.8.0 (2022-08-17)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using LibGEOS
julia> function findintersecting(g1, g2)
n1, n2 = length(g1), length(g2)
polygon_intersection = fill(0.0,n1,n2)
Threads.@threads for i = 1:n1
for j = 1:n2
if intersects(g1[i], g2[j])
polygon_intersection[i,j] = 1
end
end
end
sum(polygon_intersection)
end
findintersecting (generic function with 1 method)
julia> pnts = 1.0*[[-1,-1],[+1,-1],[+1,+1],[-1,+1]] .+ [0.00*randn(2) for i=1:4]
4-element Vector{Vector{Float64}}:
[-1.0, -1.0]
[1.0, -1.0]
[1.0, 1.0]
[-1.0, 1.0]
julia> geom = [[pnts; [pnts[1]]]]
1-element Vector{Vector{Vector{Float64}}}:
[[-1.0, -1.0], [1.0, -1.0], [1.0, 1.0], [-1.0, 1.0], [-1.0, -1.0]]
julia> g1 = [LibGEOS.Polygon(geom) for i=1:500]
500-element Vector{Polygon}:
Polygon(Ptr{Nothing} @0x00006000028030c0)
Polygon(Ptr{Nothing} @0x0000600002800230)
Polygon(Ptr{Nothing} @0x00006000028004b0)
Polygon(Ptr{Nothing} @0x0000600002802c60)
Polygon(Ptr{Nothing} @0x0000600002800a50)
Polygon(Ptr{Nothing} @0x00006000028000a0)
Polygon(Ptr{Nothing} @0x0000600002803d90)
Polygon(Ptr{Nothing} @0x0000600002800410)
Polygon(Ptr{Nothing} @0x0000600002801720)
Polygon(Ptr{Nothing} @0x0000600002801c70)
⋮
Polygon(Ptr{Nothing} @0x000060000281eb20)
Polygon(Ptr{Nothing} @0x000060000281d5e0)
Polygon(Ptr{Nothing} @0x000060000281eda0)
Polygon(Ptr{Nothing} @0x000060000281ee90)
Polygon(Ptr{Nothing} @0x000060000281e120)
Polygon(Ptr{Nothing} @0x000060000281caf0)
Polygon(Ptr{Nothing} @0x000060000281f250)
Polygon(Ptr{Nothing} @0x000060000281c870)
Polygon(Ptr{Nothing} @0x000060000281f610)
Polygon(Ptr{Nothing} @0x000060000281fe30)
julia> findintersecting(g1,g1)
250000.0 For #91 (comment), I get: | | |_| | | | (_| | | Version 1.8.0 (2022-08-17)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> import LibGEOS
julia> ctx = [LibGEOS.GEOSContext() for i=1:Threads.nthreads()]
1-element Vector{LibGEOS.GEOSContext}:
LibGEOS.GEOSContext(Ptr{Nothing} @0x000000012f809e00)
julia> function cra(n, ctx)
p = [[[-1.,-1],[+1,-1],[+1,+1],[-1,+1],[-1,-1]]]
g1 = [LibGEOS.Polygon(p) for i=1:n]
r = fill(false, n, n)
Threads.@threads for i=1:n
for j=1:n
r[i,j] = LibGEOS.intersects(g1[i].ptr, g1[j].ptr, ctx[Threads.threadid()])
end
end
return r
end
cra (generic function with 1 method)
julia> cra(500, ctx)
500×500 Matrix{Bool}:
1 1 1 1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
⋮ ⋮ ⋮ ⋱ ⋮ ⋮
1 1 1 1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 … 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 For #91 (comment), I get: | | |_| | | | (_| | | Version 1.8.0 (2022-08-17)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> import LibGEOS
julia> function cra(n)
ctx = [LibGEOS.GEOSContext() for i=1:Threads.nthreads()]
p = [[[-1.,-1],[+1,-1],[+1,+1],[-1,+1],[-1,-1]]]
g1 = [LibGEOS.Polygon(p) for i=1:n]
Threads.@threads for i=1:n
for j=1:n
LibGEOS.intersects(g1[i].ptr, g1[j].ptr, ctx[Threads.threadid()])
end
end
end
cra (generic function with 1 method)
julia> cra(1000)
julia> |
Crashes for me with LibGEOS v0.7.2 and GeoInterface v1.0.1 if i start Julia with |
I also have LibGEOS v0.7.2 and GeoInterface v1.0.1. I am also running with 4 threads like @jaakkor2 mentioned. With the last example that I provided, I am crashing every time. It doesn't seem like v0.7.2 includes the new Polygon updates, which had memory issues, but I don't think it is that given that I did not have to change the code for making Polygons from vectors. |
Looking at the source code of LibGEOS, we need to construct geometries unique to a context. So this works: function cra(n)
@info "Using $(Threads.nthreads()) threads"
out = zeros(n, n) # don't use falses, it's not thread safe ?!
ctx = [LibGEOS.GEOSContext() for i=1:Threads.nthreads()]
p = [[[-1.,-1],[+1,-1],[+1,+1],[-1,+1],[-1,-1]]]
g1 = [LibGEOS.Polygon(p) for i=1:n]
# Generate a unique per thread/context geometry (!)
g2 = [LibGEOS.cloneGeom.(getfield.(g1, :ptr), Ref(c)) for c in ctx]
Threads.@threads for i=1:n
for j=1:n
ti = Threads.threadid()
out[i, j] = LibGEOS.intersects(g2[ti][i], g2[ti][j], ctx[ti])
end
end
sum(out)
end
cra(1000)
# [ Info: Using 8 threads
# 1.0e6 |
This also works for me! Thank you so much @evetion! However, it is kind of annoying that we have to make a geometry for each thread though... I guess we shouldn't have a huge number of threads but that doesn't seem the best for performance, especially if we are working with a huge number of polygons. I don't really see a way to create less though right off the bat. This almost seems to get rid of the benefit of threads over @distributed in Julia, since we now have to have a copy of every geometry for each thread. Rather than the threads safely sharing memory, we are just creating a new version of every object in memory to prevent data races. In fact, this seems to work as well, which does not use contexts but rather just makes nthreads copies of every polygon. Thoughts? function cra2(n)
nthreads= Threads.nthreads()
@info "Using $(nthreads) threads"
out = zeros(n, n) # don't use falses, it's not thread safe ?!
p = [[[-1.,-1],[+1,-1],[+1,+1],[-1,+1],[-1,-1]]]
g1 = [LibGEOS.Polygon(p) for i=1:n]
g2 = [LibGEOS.cloneGeom.(getfield.(g1, :ptr)) for i in 1:nthreads]
Threads.@threads for i=1:n
for j=1:n
ti = Threads.threadid()
out[i, j] = LibGEOS.intersects(g2[ti][i], g2[ti][j])
end
end
sum(out)
end I am also wondering if we should expose the context parameter within constructors like Polygon, since in most cases they calls functions createPolygon (which does accept a context), in addition to functions that take in the geometries like intersects, intersection, etc. |
Yeah the current state wasn't by design, so I think that will be very welcome :) |
I have started the PR to expose context within the functions and types :) However, I am still not convinced on the use of contexts as used above. It does stop the errors, but as I showed above this doesn't actually have to do with contexts, and rather with having a separate copy of each geometry used within each thread. This seems really inefficent. This might be the design of the underlying library, but then it seems like it really isn't thread-safe. They have just eliminated all possible race conditions by creating "separate" memory for each thread. Thoughts? |
Yeah, unfortunately it seems that way.. https://www.mail-archive.com/geos-devel@lists.osgeo.org/msg05011.html |
To keep this conversation going, I am also having issues with distributed computing using Julia's @distributed and getting seg faults. For example: using Distributed
addprocs(3)
@everywhere using LibGEOS
coord = [[[0.,0.],[+1.,0.],[+1.,+1.],[0.,+1.],[0.,0.]]]
coord_lst = [coord for i=1:5]
@distributed (+) for c in coord_lst
rect = LibGEOS.Polygon(c)
LibGEOS.area(rect)
end gives 5 as expected. However, coord = [[[0.,0.],[+1.,0.],[+1.,+1.],[0.,+1.],[0.,0.]]]
coord_lst = [coord for i=1:5]
poly_lst = [LibGEOS.Polygon(c) for c in coord_lst]
@distributed (+) for rect in poly_lst
LibGEOS.area(rect)
end gives a seg fault. Here is the beginning of the error message. It is really long. The stack trace at the bottom of the error message says that there is an unsafe read: I am a bit surprised by this error given that area seems like it should be a read only function and each polygon is in its own process. Luckily I think that making the polygons within each process actually works for what I need to do, but I thought I would bring it up. |
In the example, |
I am a bit confused on where the memory is being freed. I tried your way @jaakkor2 and that does work! However, if I get rid of using Distributed
addprocs(3)
@everywhere using LibGEOS
coord = [[[0.,0.],[+1.,0.],[+1.,+1.],[0.,+1.],[0.,0.]]]
poly_lst = [LibGEOS.Polygon(coord) for i in 1:5]
@distributed for p in poly_lst
LibGEOS.area(p)
end also throws a seg fault. I feel like this suggests that the ownership of However, if this is the case than @yeesian and my thought process that making Polygons (or any other Geom type) out of a vector of coordinates does not require any copying/cloning might be wrong. This would require an update of those constructors. However, I also made a little example that "translates" the coordinates. When using the translated coordinates I am still getting a seg fault... using Distributed
addprocs(3)
@everywhere using LibGEOS
function translate(coords, centroid)
return coords .+ repeat([centroid], length(coords))
end
coord = [[[0.,0.],[+1.,0.],[+1.,+1.],[0.,+1.],[0.,0.]]]
coord_lst = [[translate(coord[1], [0.0, 2.0*i])] for i=0:5]
poly_lst = [LibGEOS.Polygon(c) for c in coord_lst]
@distributed for p in poly_lst
LibGEOS.area(p)
end I checked that all of the elements of coord_lst are in distinct places in memory using the following: [pointer(c) for c in coord_lst] I can't for the life of me see how this is different from @jaakkor2's solution above, which does not give me a seg fault. If anyone sees a flaw in my logic please let me know. I am trying hard to get a hang of Julia and Distributed and these memory problems are tricky!! 😄 I appreciate everyone's help and patience! |
I just tried a test on my mac from @skygering 's exemple. @testset "distributed" begin
@show "in distributed"
function f(n)
@show Threads.nthreads()
ctx = [LibGEOS.GEOSContext() for i=1:Threads.nthreads()]
p = [[[-1.,-1],[+1,-1],[+1,+1],[-1,+1],[-1,-1]]]
g1 = [LibGEOS.Polygon(p) for i=1:n]
Threads.@threads for i=1:n
@async println("$(Threads.threadid())")
for j=1:n
LibGEOS.intersects(g1[i].ptr, g1[j].ptr, ctx[Threads.threadid()])
end
end
GC.gc(true)
return nothing
end
f(100)
f(10)
LibGEOS.Polygon([[[-1.,-1],[+1,-1],[+1,+1],[-1,+1],[-1,-1]]])
nothing
end I got a lucky lead once (it never happened again, I mostly had double free or invalid checksums with other runs)
I went down the rabbit hole for a bit. The short of it is that reading is not safe, the envelope of the geometry is lazily computed: That probably means that geometries have to live in a single thread or have a very deliberate handover between threads. |
@skygering , I tried this code: using Distributed
@testset "distributed" begin
addprocs(3)
@everywhere using LibGEOS
function translate(coords, centroid)
return coords .+ repeat([centroid], length(coords))
end
coord = [[[0.0, 0.0], [+1.0, 0.0], [+1.0, +1.0], [0.0, +1.0], [0.0, 0.0]]]
coord_lst = [[translate(coord[1], [0.0, 2.0 * i])] for i = 0:5]
poly_lst = [LibGEOS.Polygon(c) for c in coord_lst]
@show [p.ptr for p in poly_lst]
@distributed for p in poly_lst
@show p.ptr
LibGEOS.area(p)
end
end I got some signal 11 (a bunch of them). After going down the rabbit hole (I had never heard of threading or distribution in julia before today), I found an explanation out of left field. The Polygon struct has been serialized on the TCP socket, and the serialization rule sets the
Here is worker 2's isolated output:
My suspicions were arounsed when I saw that my C stacktrace was stopping at Later I spotted the pointers were all 0x0000000000000000 in the loop:
|
@nraynaud Thank you for looking into this! So what I am getting from this is that LibGEOS Geometry objects are not made to exist between threads or even between processes. Confirming what you said above, I did find that Julia's distributed processing uses TCP/IP sockets be default: https://gensoft.pasteur.fr/docs/julia/1.4.1/stdlib/Distributed.html#Cluster-Manager-Interface-1
So maybe if you want to be able to do this there might be a way to set up a cluster manager that sends data in a way where the pointers aren't being set to Null. Disappointing that it isn't the default behavior, but it doesn't seem to be a problem with LibGEOS, but rather that explicit pointers are needed when working with GEOS C API and those don't work well with default Distributed processing behavior. Would you agree with that assessment? Here is a discourse question of someone else having the same problem: https://discourse.julialang.org/t/passing-pointers-to-workers-process-results-in-null-pointers/73418
I don't understand the work around suggested here, but maybe someone in the community has found a solution. I will ask on the slack. |
After asking on Slack, I realized that it isn't possible to share pointers between processes because they don't share the same memory space. This is why they are set to NULL. Since the LibGEOS objects are just pointers, I don't see a way to fix this problem. Maybe others have ideas though? When using @distributed, it would seem that we need to make the polygons within each process. Since we concluded above that we need a copy of each geometry per each thread, or we need to make the geometries within threads, it seems that there is no way to parallelize LibGEOS operations on a pre-existing set of geometries that is shared or passed between threads/processes. @yeesian Do you know if ArchGDAL.jl has these problems? It seems similarly setup so I would assume it has the same distributed problems due to the pointers, but do you know if it is thread-safe? ArchGDAL and LibGEOS seem to have very different uses though. |
@skygering to confirm what people told you, you can also share slabs of memory between processes, but the objects have to be allocated in the slab (which certainly doesn't happen by a random malloc in the middle of an algorithm, allocators would have to be passed around), and you need some kind of policy as to what the processes will do with the memory. The path to distributed computing with libGEOS seems to be quite narrow. I guess the ideal application with those constraints would have a serialization/deserialization time significantly smaller than its computiing time. Which would probably work as well with arrays of coordinates sent over the socket and a completely thread oblivious GEOS. I'm sorry. You're not alone, tho, I looked into the issue because my work wanted to use multithreading with LibGEOS. We'll have to add some significant checks to ensure that objects don't cross the wrong context (including probably storing the context next to the pointer and checking it in the finalizer). |
Yeah ArchGDAL (and GDAL more generally) is most often used for reading / writing geospatial data files. It does come compiled with GEOS support, but it won't be superseding what is achievable with GEOS. If the question is about the thread safety of I/O (and not the geometry operations), it has caveats based on GDAL limitations. (I guess the more directly useful answer is: "As most C/C++ libraries, an object is not thread-safe unless it is explicitly |
Currently, I have two arrays of type
Array{Polygon,1}
, calledgrid_1
andgrid_2
. I would like to find the intersecting area of every point ofgrid_1
with every point ofgrid_2
(i.e., yield a matrix with dimensionslength(grid_1) by length(grid_2)
. I am wondering if there is anyway that may be faster that running a two for loops as follows?I was thinking of using something of the form
area.(intersection.(grid_1[i],grid_2))
and then only have to loop over one dimension, but that does not appear to be implemented (and I am not sure these array calculations would even be faster than a for loop). Any suggestions are appreciated!The text was updated successfully, but these errors were encountered: