Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit MIT license to repo #1

Merged
merged 1 commit into from
Jun 14, 2020
Merged

Add explicit MIT license to repo #1

merged 1 commit into from
Jun 14, 2020

Conversation

Huite
Copy link
Contributor

@Huite Huite commented Jun 14, 2020

Very minor PR, mostly an excuse to get in touch!

I was looking at some KDTree implementations, especially one that works well with numba. I found your comment in this issue: numba/numba-scipy#36
... which obviously brought me here!

Some background:
I'm coming from a geospatial use case, I want to do some (2D for now) unstructured mesh regridding which involves computing some areal overlaps from overlapping cells. Most (vector-based) GIS stuff seems awfully slow or some existing solutions come with massive binary dependencies, so I figured I'd have a look at it myself. A KDTree with a radius query seems like a decent starting point to create some sort of short list of possibly overlapping cells. I greatly prefer numba since it's basically seamless to Python, and it's much nicer to distribute than e.g. Cython; the JIT'ing also allows arbritary area weighting functions at runtime, so it's great all around.

Anyway, numba-neighbors is looking pretty spiffy! I've been looking at these:

And for my ad hoc benchmark (but fairly realistic for the use case), numba-neighbors beats them by a fairly wide margin (> 30% on cKTree, more so on the others). Might be interesting to redo this analysis at some point: https://jakevdp.github.io/blog/2013/04/29/benchmarking-nearest-neighbor-searches-in-python/

Anyway, my main issue with the other methods is how they return the indices. I don't really want to be stuck with a fixed number of neighbors, so sklearn's radius_query is what I want... except it returns an array of arrays which probably isn't going to be great for further numba functions to work on. I could copy and edit some Cython stuff, but as mentioned numba is just much nicer to distribute. So with numba-neighbors it should be pretty easy to write a custom query function for my goals!

Wrapping up: numba-neigbors seems useful, I'd like to use it. I saw you did put license='MIT' in the setup.py -- but I had to look for it (if only briefly).

@jackd
Copy link
Owner

jackd commented Jun 14, 2020

Thanks for getting in touch / fixing this up - glad to hear you're finding it useful. This was very much a "I have code I developed for another project, might as well open source it" rather than "let's build a general-purpose KDTree library", and I opened it up in a bit of a rush. There'll no doubt be other things like this, and happy to hear thoughts on the interface. I'm a little tied up with other things at the moment to give this a thorough cleaning, but if you notice anything else dodgy/confusing I'll keep an eye on PRs :).

So with numba-neighbors it should be pretty easy to write a custom query function for my goals!

Note the BinaryTree class has a query_radius method. It's fairly similar to those other interfaces, except
(a) you need to specify a max_count for pre-allocation purposes (and if you under-estimate the returned indices are not necessarily the closest); and
(b) it doesn't sort the returned indices by distance.

I originally wrote wrappers that solved each of these issues, but they slowed things down and weren't necessary for the use case I developed it for.

@jackd jackd merged commit c8eca4f into jackd:master Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants