-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance and type stability fixes for geometries #316
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failures are an unrelated issue with GDAL_jll on mac. Otherwise this is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bee5cb4 !
@evetion do you have time to review this? |
Just got back from holiday, will take a look soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely PR! I would only ask to improve the test coverage, to include a few tests for the new preparedgeometry types and the empty (unknown) geometries.
It could also be interesting to sprinkle a few @inferred
statements, especially around the getgeom
calls?
Last but not least, could you showcase/tell us about performance improvements you noticed to document this PR? I also see you tried your hand at a simple getpoint
method, but that didn't work out?
@evetion As for performance improvements this is just from getting rid of all the red and yellow bits in ProfileView.jl rasterising GADM.jl output with Rasters.jl. The before and after included all the changes to Rasters.jl and DimensionalData.jl as well as I was optimising the whole stack... so I'm not totally clear on total times attributable to ArchGDAL. But the fraction in ArchGDAL.jl shrunk a lot from this PR combined with low-allocating Mostly I think the improvement for my use cases from this PR is type stability in |
I put together some benchmarks of these changes. There is still another 2x improvment (or more) in fixing Setup code (using GADM for an easy example, which is using ArchGDAL for loading geometries): using Rasters, GADM, BenchmarkTools
using Rasters.LookupArrays
kw = (; order=ForwardOrdered(), span=Regular(0.1), sampling=Intervals(Start()), crs=EPSG(4326))
ds = Y(Projected(15.0:0.1:55.0; kw...)), X(Projected(70.0:0.1:140; kw...))
x = GADM.get("CHN") This branch: julia> @btime rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches);
512.476 ms (5890904 allocations: 116.34 MiB)
julia> @benchmark rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches)
BenchmarkTools.Trial: 8 samples with 1 evaluation.
Range (min … max): 508.071 ms … 681.143 ms ┊ GC (min … max): 0.00% … 9.94%
Time (median): 675.751 ms ┊ GC (median): 9.88%
Time (mean ± σ): 653.399 ms ± 59.138 ms ┊ GC (mean ± σ): 8.62% ± 3.44%
▁ ▁ ▁▁██
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁████ ▁
508 ms Histogram: frequency by time 681 ms <
Memory estimate: 116.34 MiB, allocs estimate: 5890904. On master: julia> @btime rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches);
7.986 s (29741237 allocations: 939.08 MiB)
julia> @benchmark rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
Range (min … max): 4.445 s … 5.336 s ┊ GC (min … max): 4.43% … 4.82%
Time (median): 4.890 s ┊ GC (median): 4.65%
Time (mean ± σ): 4.890 s ± 629.958 ms ┊ GC (mean ± σ): 4.65% ± 0.27%
█ █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
4.44 s Histogram: frequency by time 5.34 s <
Memory estimate: 939.08 MiB, allocs estimate: 29741237. Edit: with this PR and specialised julia> @btime rasterize($x.geom; to=$ds, missingval=0, fill=1, boundary=:touches);
344.444 ms (2964101 allocations: 71.68 MiB) Edit2 making sure Rasters.jl allways uses the fast path: julia> @btime rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches)
216.206 ms (37298 allocations: 27.02 MiB) At this point |
The abstract types in this PR are getting out of control, because we dont get dimensionality or object type from the enum, we have to use huge It might be nicer to use generic type parameters to indicate the kinds of objects and if they are 2D, 3D or have M fields, and delete all the So, something like: abstract type AbstractGeometry{E,T,D,M} end
struct Geometry{E,T,D,M} <: AbstractGeometry{E,T,D,M}
....
end
struct IGeometry{E,T,D,M} <: AbstractGeometry{E,T,D,M}
...
end
# With default constructors like this:
Geometry{E}(x) where E = Geometry{W,geomtype(W),dimensionality(W),hasm(W)}(x) Then, for example, the types would be: Geometry{wkbLineStringZM,LineString,3,HasM}
Geometry{wkbPoint,Point,2,NoM} And we can just use dispatch to match any of the type paramters instead of using so many huge |
I think that sounds reasonable; it feels nice to have reached a point where this doesn't feel like over-engineering. I didn't fully follow the example (E.g. would We might want to think a little more about whether we want to restrain it such the type parameters are independent of each other and all constructed types are valid (versus e.g. allowing for potentially invalid types to be constructed, but having it be easier for implementing the dispatch logic), and the pros and cons of the options, but I think yeah there'll definitely be something which is better than the status quo haha. |
Probably that type shouldn't exist and we could make sure of that in the inner constructor. We could also not include the enum in the type at all, and generate it when needed. |
I think I like this more. How about something like Geometry{wkbLineStringZM,3}
Geometry{wkbPoint,2} And we can generate (i) |
I actually meant the other way round: Geometry{LineString,3,HasM} As We can get It might be better to merge this as-is when I fix the tests, and think about this separately. I'm not sure. |
Ah yes sorry you're right!
Yeah I think it'll be a big change regardless of the outcome of the proposal, and agree about thinking about it separately. Happy with this PR once the tests are fixed |
I think this is good to go. I found a few existing type-related problems while doing this: |
It's still saying changes requested. @evetion do you want to check this and approve if its ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be good to get the unit test coverage back up -- I've noticed that uncovered lines are historically a rich source of bugs. LGTM otherwise
Ok, made a few changes. Measures handling was really not ready so I wound that back a little. This means I've tested most of the types more extensively for |
I'm going to consider the review by @evetion in #316 (review) as resolved by #316 (comment) and the subsequent commits to increase test coverage. |
Improves type stability where possible by keeping enums as types in
Val{wkbLineString}
and by forcing child object types fromgetgeom
.Also adds a low allocationgetpoint
method.Edit: review when I get tests passing, can't always test on a local machine at present