Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return alpha option & use pygeos for alphashaping if available #278

Merged
merged 13 commits into from
Apr 9, 2020

Conversation

ljwolf
Copy link
Member

@ljwolf ljwolf commented Apr 6, 2020

This provides options to return the alpha radius from alpha_shape_auto, as well as the detected bounding disks.

import libpysal, geopandas, matplotlib.pyplot as plt, numpy
numpy.random.seed(2478879)
testpoints = numpy.random.normal(size=(100,2))
testalpha, testradius, testx = libpysal.cg.alpha_shapes.alpha_shape_auto(testpoints, return_circles=True)
ax = geopandas.GeoDataFrame(geometry=[testalpha]).plot()
ax.scatter(*numpy.row_stack(testalpha.boundary.coords).T, marker='.', color='k')
[ax.add_patch(plt.Circle(c, radius=testradius, edgecolor='r', facecolor='none')) for c in testx]
plt.scatter(*numpy.row_stack(testx).T, color='r', marker='x')

ashape_bounding

It still needs

  • confirmation that it "works" for n < 4 (edit: now has decent behavior in these cases. When n=3, returns the alpha shape (a triangle) that triangle's radius, and the circles defined by each pair of boundary points. When n=2, it returns the empty hull, a radius of half the distance between the points, and the circle with those points as diameters. When n=1, it returns the point, zero, and the point.)
  • tests

I keep with the rest of the package using radius to refer to the radius of the circles, alpha to refer to 1/radius, and circle (not disk) to refer to the shapes defining the alpha_shape. Strictly speaking, Edselbrunner et al use alpha to mean radius itself, since when alpha = infty, the alpha shape is the convex hull (the space bounded by circles with infinite radius). Two points are "alpha-neighbors" when they admit a circle of radius alpha with no other points. etc. But, since we're consistent, I think this is fine.

In theory, I suppose this should be exposed for the individual alpha_shape method, but since alpha is specified in that case, it's not very hard to do:

ahull = alpha_shape(points, alpha)
circles = construct_bounding_circles(ahull, 1/alpha)

@codecov
Copy link

codecov bot commented Apr 6, 2020

Codecov Report

Merging #278 into master will increase coverage by 0.07%.
The diff coverage is 84.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #278      +/-   ##
==========================================
+ Coverage   80.99%   81.06%   +0.07%     
==========================================
  Files         115      115              
  Lines       11580    11671      +91     
==========================================
+ Hits         9379     9461      +82     
- Misses       2201     2210       +9     
Impacted Files Coverage Δ
libpysal/cg/alpha_shapes.py 85.00% <80.37%> (-1.67%) ⬇️
libpysal/cg/tests/test_ashapes.py 96.49% <100.00%> (+1.03%) ⬆️
libpysal/weights/tests/test_weights.py 99.65% <0.00%> (+<0.01%) ⬆️
libpysal/weights/weights.py 78.92% <0.00%> (+0.90%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14c7509...978a5e6. Read the comment docs.

Copy link
Member

@jGaboardi jGaboardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promising!

libpysal/cg/alpha_shapes.py Outdated Show resolved Hide resolved
libpysal/cg/alpha_shapes.py Show resolved Hide resolved
libpysal/cg/alpha_shapes.py Outdated Show resolved Hide resolved
libpysal/cg/alpha_shapes.py Show resolved Hide resolved
@jGaboardi jGaboardi self-requested a review April 6, 2020 22:35
@darribas
Copy link
Member

darribas commented Apr 7, 2020

This looks good to me too. My only question would be: have you done any check on whether performance when return_disks=False deteriorates? I'm OK with minor things but if it's a notable one, it might be worth rethinking how we implement this. Could you provide a couple of timing experiments?

@ljwolf
Copy link
Member Author

ljwolf commented Apr 8, 2020

have you done any check on whether performance with return_disks=False deteriorates?

Sure, I can run timing experiments now. But, for what it's worth I think it's fine for two reasons.

  1. For any hull, the number of points on the boundary is usually quite small compared to the number of points. The alpha shape is no different. This code only computes the limiting circles when the hull is completed, so we're only using the boundary points to build the limiting circles.
  2. On @jGaboardi's suggestion, the code to build the limiting circles now uses jit(nopython=True) and has no for loops nor branching internally, so it should be quite fast.

rethinking how we implement this

There isn't really another way to implement this from an algorithmic perspective.

Solely for the API/UX, we could only support building the circles in a second call using the constructor I outlined in the first post, construct_bounding_circles(alpha_shape, radius). But, if we did, we would still need to have return_radius in alpha_shape_auto. I don't think forcing two calls here is a good design decision from a UX perspective... numpy has many return_xxx options for functions to do similar things as this. Further, circle construction is not on by default, so there is no performance penalty there, either.

@ljwolf ljwolf closed this Apr 8, 2020
@ljwolf ljwolf reopened this Apr 8, 2020
@ljwolf
Copy link
Member Author

ljwolf commented Apr 8, 2020

whoops did not mean to close!

@ljwolf
Copy link
Member Author

ljwolf commented Apr 9, 2020

performance

size

For point 1, the average number of points on the alpha-hull of a pointcloud with 100k points (normally-distrbuted, though) is around 70.
alpha_shape_boundary_size

Weirder shapes will see this flex a little, but it should remain approximately logarithmic. So, we're talking very few points on the boundary compared to the number of triangles constructed elsewhere. Run time will always be dominated by constructing the delaunay triangulation/alpha shape

time

n points time with circles time without circles difference significant?
100 33.3 ms +/- 2.37 μs 33.2 ms +/- 31 μs +.01 ms (+.3%) ☑️
1000 437 ms +/- 30.7 ms 440ms +/- 30.7 ms - 3 ms (-.06%)
10000 8.22s +/- 555 ms 7.81 s +/- 104 ms +.41 s (+5%)
100000 105s +/- 2.29 s 101s +/- 1.21 s +4 s (+3%)

line profile at n=100000

nearly all of the time spent is in:

  1. verifying whether the hull is "valid" (87.4%)
  2. constructing the alpha shape itself (11.2%)
  3. building shapely points (.8%)
  4. delaunay triangulation (.3%)

it would be much more productive to see if we can avoid shapely & use numba for point-in-polygon checks (nearly 90% of runtime) than to worry about the extra 200 microseconds for circle construction (less than .1% of runtime)

Timer unit: 1e-06 s

Total time: 159.808 s
File: /home/lw17329/Dropbox/dev/libpysal/libpysal/cg/alpha_shapes.py
Function: alpha_shape_auto at line 526

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   526                                           @requires('geopandas', 'shapely')
   527                                           def alpha_shape_auto(xys, step=1, verbose=False,
   528                                                                return_radius=False, return_circles=False):
   529                                               '''
   530                                               [DOCSTRING ELIDED]
   580                                               '''
   581         1          8.0      8.0      0.0      if not HAS_JIT:
   582                                                   warn(NUMBA_WARN)
   583         1         45.0     45.0      0.0      from shapely import geometry as geom
   584         1          6.0      6.0      0.0      if return_circles:
   585         1          5.0      5.0      0.0          return_radius = True
   586         1         10.0     10.0      0.0      if xys.shape[0] < 4:
   587                                                   from shapely import ops
   588                                                   if xys.shape[0] == 3:
   589                                                       multipoint = ops.cascaded_union([geom.Point(xy)
   590                                                                                        for xy in xys])
   591                                                       alpha_shape = multipoint.convex_hull.buffer(0)
   592                                                   else:
   593                                                       alpha_shape = geom.Polygon([])
   594                                                   if xys.shape[0] == 1:
   595                                                       if return_radius:
   596                                                           if return_circles:
   597                                                               out = [alpha_shape, 0, alpha_shape]
   598                                                           return alpha_shape, 0
   599                                                       return alpha_shape
   600                                                   elif xys.shape[0] == 2:
   601                                                       if return_radius:
   602                                                           r = spat.distance.euclidean(xys[0], xys[1])/2
   603                                                           if return_circles:
   604                                                               circle = _construct_centers(xys[0], xys[1], r)
   605                                                               return [alpha_shape, r, circle]
   606                                                           return [alpha_shape, r]
   607                                                       return alpha_shape
   608                                                   elif return_radius:  # this handles xys.shape[0] == 3
   609                                                       radius = r_circumcircle_triangle_single(xys[0], xys[1], xys[2])
   610                                                       if return_circles:
   611                                                           circles = construct_bounding_circles(alpha_shape, radius)
   612                                                           return [alpha_shape, radius, circles]
   613                                                       return [alpha_shape, radius]
   614                                                   return alpha_shape
   615         1     552954.0 552954.0      0.3      triangulation = spat.Delaunay(xys)
   616         1       7084.0   7084.0      0.0      triangles = xys[triangulation.simplices]
   617         1          5.0      5.0      0.0      a_pts = triangles[:, 0, :]
   618         1          2.0      2.0      0.0      b_pts = triangles[:, 1, :]
   619         1          1.0      1.0      0.0      c_pts = triangles[:, 2, :]
   620         1      21852.0  21852.0      0.0      radii = r_circumcircle_triangle(a_pts, b_pts, c_pts)
   621         1        106.0    106.0      0.0      radii[np.isnan(radii)] = 0  # "Line" triangles to be kept for sure
   622         1          2.0      2.0      0.0      del triangles, a_pts, b_pts, c_pts
   623         1      12904.0  12904.0      0.0      radii_sorted_i = radii.argsort()
   624         1       3130.0   3130.0      0.0      triangles = triangulation.simplices[radii_sorted_i][::-1]
   625         1        491.0    491.0      0.0      radii = radii[radii_sorted_i][::-1]
   626         1     384223.0 384223.0      0.2      geoms_prev = alpha_geoms((1/radii.max())-EPS, triangles, radii, xys)
   627         1    1244812.0 1244812.0      0.8      points = [geom.Point(pnt) for pnt in xys]
   628         1          2.0      2.0      0.0      if verbose:
   629                                                   print('Step set to %i' % step)
   630        47         51.0      1.1      0.0      for i in range(0, len(radii), step):
   631        47         88.0      1.9      0.0          radi = radii[i]
   632        47        220.0      4.7      0.0          alpha = (1 / radi) - EPS
   633        47         42.0      0.9      0.0          if verbose:
   634                                                       print('%.2f%% | Trying a = %f'
   635                                                             % ((i+1)/radii.shape[0], alpha))
   636        47   17872101.0 380257.5     11.2          geoms = alpha_geoms(alpha, triangles, radii, xys)
   637        47  139706650.0 2972481.9     87.4          if _valid_hull(geoms, points):
   638        46        747.0     16.2      0.0              geoms_prev = geoms
   639        46         83.0      1.8      0.0              radi_prev = radi
   640                                                   else:
   641         1          2.0      2.0      0.0              break
   642         1          1.0      1.0      0.0      if verbose:
   643                                                   print(geoms_prev.shape)
   644         1          1.0      1.0      0.0      if return_radius:
   645         1         20.0     20.0      0.0          out = [geoms_prev[0], radi_prev]
   646         1          1.0      1.0      0.0          if return_circles:
   647         1        167.0    167.0      0.0              out.append(construct_bounding_circles(out[0], radi_prev))
   648         1          1.0      1.0      0.0          return out
   649                                               # Return a shapely polygon
   650                                               return geoms_prev[0]

@ljwolf
Copy link
Member Author

ljwolf commented Apr 9, 2020

For comparison, the timing table with pygeos:

n points time with circles time without circles diff. w/ circles diff. with pygeos
100 26.6 ms +/- 1.68 ms 25.7 ms +/- 1.14 ms +.9 ms (+3%) -8 ms (-22%)
1000 278 ms +/- 9.62 ms 270 ms +/- 12.5 ms +8 ms (+9%) -159 ms (-36%)
10000 3.67 s +/- 163 ms 3.53s +/- 32.7 ms +.14 s (+3%) -4.55 s (-55%)
100000 29.7 s +/- 731 ms 30.2 s +/- 994 ms -.5 s (-1%) -75.3 s (-71%)

the new lprun suggests that w/ pygeos, the breakdown is:

  1. verifying whether the hull is "valid" (51.5%)
  2. constructing the alpha shape itself (45.0% + 1.3% for first construction)
  3. delaunay triangulation (1.9%)

circle construction is still <.1% of total computation time.

@ljwolf ljwolf changed the title Return alpha option Return alpha option & use pygeos for alphashaping if available Apr 9, 2020
@darribas
Copy link
Member

darribas commented Apr 9, 2020

This is amazing!!! Super good report too! I think since it does not seem to affect much, and it's a nice feature, we can pull it in. The pygeos speed up is awesome!

Very minor thing for consideration: maybe better to add a gpkg file for the tests, instead of a shapefile?

@ljwolf
Copy link
Member Author

ljwolf commented Apr 9, 2020

I've migrated the data in the alpha shape tests to use geopackages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants