Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance and type stability fixes for geometries #316

Merged
merged 9 commits into from
Oct 23, 2022

Conversation

rafaqz
Copy link
Collaborator

@rafaqz rafaqz commented Aug 5, 2022

Improves type stability where possible by keeping enums as types in Val{wkbLineString} and by forcing child object types from getgeom.

Also adds a low allocation getpoint method.

Edit: review when I get tests passing, can't always test on a local machine at present

@rafaqz rafaqz requested review from evetion and yeesian August 5, 2022 23:27
yeesian
yeesian previously approved these changes Aug 6, 2022
Copy link
Owner

@yeesian yeesian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's come such a long way from #160 congratulations y'all! ❤️

I haven't spotted anything, but I feel more confident in @evetion than my own judgment at this point!

src/geointerface.jl Outdated Show resolved Hide resolved
@rafaqz
Copy link
Collaborator Author

rafaqz commented Aug 11, 2022

The failures are an unrelated issue with GDAL_jll on mac. Otherwise this is good to go.

@yeesian yeesian self-requested a review August 11, 2022 03:45
yeesian
yeesian previously approved these changes Aug 11, 2022
Copy link
Owner

@yeesian yeesian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bee5cb4 !

@rafaqz
Copy link
Collaborator Author

rafaqz commented Aug 13, 2022

@evetion do you have time to review this?

@evetion
Copy link
Collaborator

evetion commented Aug 14, 2022

@evetion do you have time to review this?

Just got back from holiday, will take a look soon

Copy link
Collaborator

@evetion evetion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely PR! I would only ask to improve the test coverage, to include a few tests for the new preparedgeometry types and the empty (unknown) geometries.

It could also be interesting to sprinkle a few @inferred statements, especially around the getgeom calls?

Last but not least, could you showcase/tell us about performance improvements you noticed to document this PR? I also see you tried your hand at a simple getpoint method, but that didn't work out?

test/test_geometry.jl Outdated Show resolved Hide resolved
test/test_geometry.jl Outdated Show resolved Hide resolved
src/geointerface.jl Show resolved Hide resolved
src/ogr/geometry.jl Show resolved Hide resolved
src/ogr/geometry.jl Outdated Show resolved Hide resolved
test/test_geometry.jl Show resolved Hide resolved
@rafaqz
Copy link
Collaborator Author

rafaqz commented Aug 17, 2022

@evetion getpoint works great, it's just a bit unrelated and less finished, so I will PR and test separately. Either we can use GDAL to get all the points eagerly (allocating a vector) or get them lazily one at a time only allocating one point all up. I need to test which is best.

As for performance improvements this is just from getting rid of all the red and yellow bits in ProfileView.jl rasterising GADM.jl output with Rasters.jl.

The before and after included all the changes to Rasters.jl and DimensionalData.jl as well as I was optimising the whole stack... so I'm not totally clear on total times attributable to ArchGDAL. But the fraction in ArchGDAL.jl shrunk a lot from this PR combined with low-allocating getpoint.

Mostly I think the improvement for my use cases from this PR is type stability in getgeom on polygons, getting a known IGeometry{wkbLineString} (or at least a small union with wkbUnkown). It lets any functions using getgeom compile clean native code instead of dealing with boxed results.

@yeesian yeesian mentioned this pull request Aug 19, 2022
src/ogr/geometry.jl Outdated Show resolved Hide resolved
@rafaqz
Copy link
Collaborator Author

rafaqz commented Sep 13, 2022

I put together some benchmarks of these changes. There is still another 2x improvment (or more) in fixing getgeom, but just type stability gives order of magnitude performance gains for rasterization:

Setup code (using GADM for an easy example, which is using ArchGDAL for loading geometries):

using Rasters, GADM, BenchmarkTools
using Rasters.LookupArrays
kw = (; order=ForwardOrdered(), span=Regular(0.1), sampling=Intervals(Start()), crs=EPSG(4326))
ds = Y(Projected(15.0:0.1:55.0; kw...)), X(Projected(70.0:0.1:140; kw...))
x = GADM.get("CHN")

This branch:

julia> @btime rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches);
  512.476 ms (5890904 allocations: 116.34 MiB)

julia> @benchmark rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches)
BenchmarkTools.Trial: 8 samples with 1 evaluation.
 Range (min  max):  508.071 ms  681.143 ms  ┊ GC (min  max): 0.00%  9.94%
 Time  (median):     675.751 ms               ┊ GC (median):    9.88%
 Time  (mean ± σ):   653.399 ms ±  59.138 ms  ┊ GC (mean ± σ):  8.62% ± 3.44%

  ▁                                                   ▁    ▁▁██
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁████ ▁
  508 ms           Histogram: frequency by time          681 ms <


 Memory estimate: 116.34 MiB, allocs estimate: 5890904.

On master:

julia> @btime rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches);
  7.986 s (29741237 allocations: 939.08 MiB)

julia> @benchmark rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  4.445 s     5.336 s  ┊ GC (min  max): 4.43%  4.82%
 Time  (median):     4.890 s               ┊ GC (median):    4.65%
 Time  (mean ± σ):   4.890 s ± 629.958 ms  ┊ GC (mean ± σ):  4.65% ± 0.27%

  █                                                        █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.44 s         Histogram: frequency by time         5.34 s <

 Memory estimate: 939.08 MiB, allocs estimate: 29741237.

Edit: with this PR and specialised getpoint for wkbLineString

julia> @btime rasterize($x.geom; to=$ds, missingval=0, fill=1, boundary=:touches);
344.444 ms (2964101 allocations: 71.68 MiB)

Edit2 making sure Rasters.jl allways uses the fast path:

julia> @btime rasterize(x.geom; to=ds, missingval=0, fill=1, boundary=:touches)
   216.206 ms (37298 allocations: 27.02 MiB)

At this point rasterize is 95% n PolygonInbounds.jl code, so ArchGDAL seems 2 orders of magnitude faster.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Sep 15, 2022

The abstract types in this PR are getting out of control, because we dont get dimensionality or object type from the enum, we have to use huge Union types to define everything in groups we can dispatch on.

It might be nicer to use generic type parameters to indicate the kinds of objects and if they are 2D, 3D or have M fields, and delete all the Union groups for e.g. all 3d object types, all M object types etc.

So, something like:

abstract type AbstractGeometry{E,T,D,M} end
struct Geometry{E,T,D,M} <: AbstractGeometry{E,T,D,M}
    ....
end
struct IGeometry{E,T,D,M} <: AbstractGeometry{E,T,D,M} 
    ...
end
# With default constructors like this:
Geometry{E}(x) where E = Geometry{W,geomtype(W),dimensionality(W),hasm(W)}(x)

Then, for example, the types would be:

Geometry{wkbLineStringZM,LineString,3,HasM}
Geometry{wkbPoint,Point,2,NoM}

And we can just use dispatch to match any of the type paramters instead of using so many huge Union types.

Any comments on this @yeesian @evetion ?

@yeesian
Copy link
Owner

yeesian commented Sep 16, 2022

I think that sounds reasonable; it feels nice to have reached a point where this doesn't feel like over-engineering. I didn't fully follow the example (E.g. would Geometry{wkbLineStringZM,Polygon,3,NoM} correspond to a valid type?) but it makes sense to have dimensionality, geomtype, etc as type parameters to me.

We might want to think a little more about whether we want to restrain it such the type parameters are independent of each other and all constructed types are valid (versus e.g. allowing for potentially invalid types to be constructed, but having it be easier for implementing the dispatch logic), and the pros and cons of the options, but I think yeah there'll definitely be something which is better than the status quo haha.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Sep 16, 2022

Probably that type shouldn't exist and we could make sure of that in the inner constructor.

We could also not include the enum in the type at all, and generate it when needed.

@yeesian
Copy link
Owner

yeesian commented Oct 4, 2022

We could also not include the enum in the type at all, and generate it when needed.

I think I like this more. How about something like

Geometry{wkbLineStringZM,3}
Geometry{wkbPoint,2}

And we can generate (i) LineString and HasM from (wkbLineStringZM, 3) and (ii) Point and NoM from (wkbPoint,2).

@rafaqz
Copy link
Collaborator Author

rafaqz commented Oct 4, 2022

I actually meant the other way round:

Geometry{LineString,3,HasM}

As wkbLineString has no use in dispatch, so we would still need all the big unions I have defined.

We can get wkbLineString From the other three.

It might be better to merge this as-is when I fix the tests, and think about this separately. I'm not sure.

@yeesian
Copy link
Owner

yeesian commented Oct 5, 2022

I actually meant the other way round

Ah yes sorry you're right!

It might be better to merge this as-is when I fix the tests, and think about this separately. I'm not sure.

Yeah I think it'll be a big change regardless of the outcome of the proposal, and agree about thinking about it separately. Happy with this PR once the tests are fixed

@rafaqz
Copy link
Collaborator Author

rafaqz commented Oct 8, 2022

I think this is good to go.

I found a few existing type-related problems while doing this: flattento2d! and setcoorddim! etc don't actually work because we don't update the type. Even though its and in place function we will have to use the return value in future. I'll make some issues and fix elsewhere.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Oct 10, 2022

It's still saying changes requested. @evetion do you want to check this and approve if its ok?

@yeesian yeesian requested a review from evetion October 13, 2022 02:37
Copy link
Owner

@yeesian yeesian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be good to get the unit test coverage back up -- I've noticed that uncovered lines are historically a rich source of bugs. LGTM otherwise

src/ogr/geometry.jl Outdated Show resolved Hide resolved
src/ogr/geometry.jl Outdated Show resolved Hide resolved
src/ogr/geometry.jl Outdated Show resolved Hide resolved
src/geointerface.jl Outdated Show resolved Hide resolved
src/geointerface.jl Show resolved Hide resolved
src/types.jl Outdated Show resolved Hide resolved
@rafaqz
Copy link
Collaborator Author

rafaqz commented Oct 22, 2022

Ok, made a few changes. Measures handling was really not ready so I wound that back a little. This means GI.m and ismeasured are not tested for the true case as there isn't an obvious way to make measured objects.

I've tested most of the types more extensively for is3d, for type parameters and child type parameters. The missing lines now (besides GI.m and ismeasured) seem to be errors in coverage, like constants counted as lines of code that don't run.

@yeesian
Copy link
Owner

yeesian commented Oct 23, 2022

I'm going to consider the review by @evetion in #316 (review) as resolved by #316 (comment) and the subsequent commits to increase test coverage.

@yeesian yeesian removed the request for review from evetion October 23, 2022 20:14
@yeesian yeesian merged commit f4f1adf into yeesian:master Oct 23, 2022
@yeesian yeesian mentioned this pull request Oct 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants