kd-trees are a compact data structure for answering orthogonal range and nearest neighbor queries on higher dimensional point data in linear time. While they are not as efficient at answering orthogonal range queries as range trees - especially in low dimensions - kdtrees consume exponentially less space, support k-nearest neighbor queries and are relatively cheap to construct. This makes them useful in small to medium dimensions for achieving a modest speed up over a linear scan.
Note that kd-trees are not the best data structure in all circumstances. If you want to do range searching, here is a chart to help you select one which is appropriate for a given dimension:
Dimension | Preferred Data Structure | Complexity | Size |
---|---|---|---|
1 | Binary search tree | O(log(n)) | O(n) |
2-3 | Range tree | O(log^(d-1)(n)) | O(n log^(d-1) (n)) |
Medium | kd-tree | O(d*n^(1-1/d)) | O(n) |
Big | Array | O(n) | O(n) |
And for nearest neighbor searching, here is a survey of some different options:
Dimension | Preferred Data Structure | Complexity | Size |
---|---|---|---|
1 | Binary search tree | O(log(n)) | O(n) |
2 | Voronoi diagram | O(log(n)) | O(n) |
Medium | kd-tree | O(n) (but maybe better) | O(n) |
Big | Array | O(n) | O(n) |
It is also worth mentioning that for approximate nearest neighbor queries or queries with a fixed size radius, grids and locality sensitive hashing are strictly better options. In these charts the transition between "Medium" and "Big" depends on how many points there are in the data structure. As the number of points grows larger, the dimension at which kdtrees become practical goes up.
This module works both in node.js and with browserify.
//Import library
var createKDTree = require("static-kdtree")
//Create a bunch of points
var points = [
[0, 1, 100],
[-5, 0.11, Math.PI],
[0, 10, -13],
// ...
[4, 3, 1]
]
//Create the tree
var tree = createKDTree(points)
//Iterate over all points in the bounding box
tree.range([-1, -1, -1], [10, 1, 2], function(idx) {
console.log("visit:", idx) //idx = index of point in points array
})
//Can also search in spheres
tree.rnn([0,0,0], 10, function(idx) {
console.log("point " + idx + " is in sphere at origin with radius=10")
})
//Nearest neighbor queries
console.log("index of closest point to [0,1,2] is ", tree.nn([0,1,2]))
//And k-nearest neighbor queries
console.log("index of 10 closest points to [0,1,2] are ", tree.knn([0,1,2], 10))
//For performance, be sure to delete tree when you are done with it
tree.dispose()
npm install static-kdtree
var createKDTree = require("static-kdtree")
By convention, let n
denote the number of points and d
denote the dimension of the kdtree.
Creates a kdtree from the given collection of points.
points
is either an array of arrays of lengthd
, or else anndarray
with shape[n,d]
Returns A kdtree data structure
Time Complexity This operation takes O(n log(n))
Restores a serialized kdtree.
data
is a JavaScript object as produced by callingkdt.serialize
Returns A kdtree data structure equivalent to the one which was serialized.
Time Complexity O(n)
The dimension of the tree
The number of points in the tree
Executes an orthogonal range query on the kdtree
lo
is a lower bound on the rangehi
is an upper boundvisit(idx)
is a visitor function which is called once for every point contained in the range[lo,hi]
. Ifvisit(idx)
returns any value!== undefined
, then termination is halted.
Returns The last returned value of visit
Time Complexity O(d*n^(1-1/d) + k)
, where k
is the number of points in the range.
Visit all points contained in the sphere of radius r
centered at point
point
is the center point for the query, represented by a lengthd
arrayradius
is the radius of the query spherevisit(idx)
is a function which is called once for every point contained in the ball. As in the case ofkdt.range
, ifvisit(idx)
returns a not undefined value, then iteration is terminated.
Returns The last returned value of visit
Time Complexity O(n + k)
, where k
is the number of points in the sphere, though perhaps much less than n
depending on the distribution of the points.
Returns the index of the closest point to point
point
is a query pointmaxDistance
is an upper bound on the distance to search for nearest points. DefaultInfinity
Returns The index of the closest point in the tree to point
, or -1
if the tree is empty.
Time Complexity O(n log(n))
in the worst case, but in practice much faster if the points are uniformly distributed.
Returns a list of the k closest points to point in the tree.
point
is the point which is being queriedk
is the number of points to querymaxDistance
bounds the distance of the returned points. Default isInfinity
Returns A list of indices for the k
closest points to point
in the tree
which are within distance < maxDistance
. The indices are ordered by distance to point
.
Time Complexity O((n + k) log(n + k))
, but may be faster if the points in the tree are uniformly distributed
Returns a serializable JSON object encoding the state of the kdtree. This can be passed to deserialize()
to restore the kdtree.
Release all resources associated with the kdtree. This is not necessary, but can reduce garbage collector pressure over time.
Time Complexity O(1)
To test the performance of this module, experiments were performed against two other kdtree libraries (Ubilabs kdtree and node-kdtree), as well as a naive brute force algorithm. Ubilabs kdtree is pure JavaScript, and supports only kNN queries and does not correctly implement rNN queries. node-kdtree is a wrapper over the native C++ library, libkdtree, and only supports rNN and NN queries. Neither library implements range queries. These libraries were tested in node.js 0.10.26 and Chrome 34 on a MacBook Pro, Core i7 2.3GHz with 8GB of RAM. The results from these experiments can be found here:
And the code for these experiments can be found in the bench/ subdirectory of this repository.
Up to 1000 points or so brute force searching is the fastest method for answering any query, so for small data sets it is probably better to not use a kdtree or any data structure in the first place.
The latest version of v8 in Chrome is strictly faster than node.js for all test cases and modules. Because of native C++ dependencies, node-kdtree cannot run in a browser, but even so the Chrome version of static-kdtree is 2-3x faster. static-kdtree is also up to an order of magnitude faster than Ubilabs kdtree at all operations, making it by far the best choice in the browser.
In node.js, the situation is slightly more ambiguous. node-kdtree has the fastest construction time, and also answers 1-nearest neighbor queries faster. Both Ubilabs kdtree and static-kdtree take about the same amount of time on nearest neighbors queries. On all other queries static-kdtree is again strictly faster. It is unclear why the performance of nearest neighbor queries is slightly slower in node.js, but perhaps it may be related to node.js' v8 engine being several versions behind Chrome. In future updates this situation may start to look more like Chrome, making static-kdtree likely to be the better option for long term.
(c) 2014 Mikola Lysenko. MIT License