notebooks: refactor basic - add n-dimensional (#46)

carsonfarmer · Jun 17, 2024 · c451f25 · c451f25
1 parent ee2c8cb
commit c451f25
Show file tree

Hide file tree

Showing 3 changed files with 1,178 additions and 463 deletions.
diff --git a/README.md b/README.md
@@ -15,13 +15,27 @@ In the limit of arbitrarily many subsets, each new addition or point moved by a
 all nearest neighbors, to keep the in-degree of each point low, and
 2. When we insert a point, we don't bother updating other points' neighbors.
 
-**Total space:** $20n$ bytes (could be reduced to $4n$ at some cost in update time).
-
-**Time per insertion or single distance update:** $O(n)$.
-
-**Time per deletion or point update:** $O(n)$ expected, $O(n^2)$ worst case.
-
-**Time per closest pair:** $O(n)$.
+<table>
+  <tr>
+    <td align="center" colspan="2"><b>Complexity</b></td>
+  </tr>
+  <tr>
+    <td><i>Total space</i></td>
+    <td>$20n$ bytes (could be reduced to $4n$ at some cost in update time)</td>
+  </tr>
+ <tr>
+    <td><i>Time per insertion or single distance update</i></td>
+    <td>$O(n)$ </td>
+  </tr>
+ <tr>
+    <td><i>Time per deletion or point update</i></td>
+    <td>$O(n)$ expected, $O(n^2)$ worst case</td>
+  </tr>
+ <tr>
+    <td><i>TTime per closest pair</i></td>
+    <td>$O(n)$</td>
+  </tr>
+</table>
 
 This `Python` version of the algorithm combines ideas and code from the [closest-pair data structure testbed (C++)](https://www.ics.uci.edu/~eppstein/projects/pairs/Source/testbed/) developed around a [series of papers](https://www.ics.uci.edu/~eppstein/projects/pairs/Papers/) by Eppstein *et al.*
 
@@ -62,88 +76,16 @@ pytest -v fastpair --cov fastpair
 
 Currently `fastpair` is tested against Python 3.{10,11,12}.
 
-## Features
-
-In the following examples we use the `random` module to generate data.
-
-```python
-from fastpair import FastPair, interact
-import random
-
-def rand_tuple(dim=2):
-    return tuple([random.random() for _ in range(dim)])
-```
-
-### Basics
-
-The simplest way to use a `FastPair` data-structure is to initialize one and then update it with data points (via the `+=` operator).
-In this first example, we create a sequence of $50 \times 10$ uniform random points and add them to a `FastPair` object:
-
-```python
-points = [rand_tuple(10) for _ in range(50)]
-# Create empty data-structure with `min_points=10` and
-# using a Euclidean distance metric
-fp = FastPair()
-fp.build(points)  # Add points all at once and build conga line to start
-```
-
-You can then add additional points, and start to query the data-structure for
-the closest pair of points. As points are added, the data-structure responds
-and updates accordingly
-(see [this paper](http://dl.acm.org/citation.cfm?id=351829) for details):
-
-```python
-fp += rand_tuple(10)
-fp += rand_tuple(10)
-
-# This is the 'FastPair' algorithm, should be fast for large n
-fp.closest_pair()
-# There is also a brute-force version, can be fast for smaller n
-fp.closest_pair_brute_force()
-```
-
-`FastPair` has several useful properties and methods, including checking the size of the data-structure (i.e., how many points are currently stored), testing for containment of a given point, various methods for computing the closest pair, finding the neighbor of a given point, computing multiple distances at once, and even merging points (clusters):
+## Utilizing `FastPair`
 
-```python
-len(fp)
-rando = rand_tuple(10)
-points[0] in fp  # True
-rando in fp  # False
-fp()  # Compute closest pair
-neigh = fp.find_neighbor(rando)  # Neighbor of 'outside' point
-fp.sdist(rando)  # Compute distances from rando to all points in fp
-```
+This notebooks linked below are designed as interactive, minimum tutorials in working with `fastpair` and require additional dependencies, which can be installed with:
 
-To illustrate the `merge`ing methods, here is a simple example of hierarchical clustering, treating `points` as the 'centroids' of various clusters:
-
-```python
-for i in range(len(fp)-1):
-    # First method... do it manually:
-    dist, (a, b) = fp.closest_pair()
-    c = interact(a, b)  # Compute mean centroid
-    fp -= b
-    fp -= a
-    fp += c
-    # Alternatively... do it all in one step:
-    # fp.merge_closest()
-len(fp)  # 1
+```bash
+pip install -e .[tests,notebooks]
 ```
 
-Finally, plotting should be pretty obvious to those familiar with `matplotlib` (or other `Python` plotting facilities).
-
-```python
-import matplotlib.pyplot as plt
-
-points = [rand_tuple(2) for _ in range(50)]  # 2D points
-fp = FastPair().build(points)
-dist, (a, b) = fp.closest_pair()
-
-plt.figure()
-plt.scatter(*zip(*fp.points))
-plt.scatter(*zip(a, b), color="red")
-plt.title("Closest pair is {:.2} units apart.".format(dist))
-plt.show()
-```
+* [`basics_usage.iypnb`](https://github.com/carsonfarmer/fastpair/notebooks/basics_usage.iypnb): Understanding the `fastpair` functionality and data structure
+* [`n-dimensional_pointsets`](https://github.com/carsonfarmer/fastpair/notebooks/n-dimensional_pointsets.iypnb): Querying point clouds
 
 ## License