Spatial bench #30

clbarnes · 2024-01-20T15:40:05Z

Crate for benchmarking various spatial lookup crates for our purposes.

Kiddo comes out on top, but will be easier to use once sdd/kiddo#135 is merged. Note that there are some questions about using kiddo in wasm: sdd/kiddo#130

@schlegelp may find this useful! Just run cargo bench from the spatial_bench directory. For each of bosque (fastcore-rs default), kiddo, nabo, and rstar (nblast-rs default), it benchmarks building 1000 trees by augmenting the example data, and running just the spatial query bit of 1 000 000 neuron pair lookups. These are all in serial, I don't have a reason to suspect they'll parallelise differently.

clbarnes · 2024-01-20T19:45:55Z

This now changes the default backend to kiddo, which ~halves the total query time. Kiddo has an "approximate" option too, which makes it about 2.5x faster still, but it is so approximate that it throws the results a long way off.

Also there are some ergonomic considerations, including some renames and a usage example up front.

schlegelp · 2024-01-23T15:53:43Z

Very cool, thanks. When I run it on my machine kiddo and bosque are very close together:

Running benches/spatial.rs (/Users/philipps/Google Drive/Cloudbox/Github/nblast-rs/target/release/deps/spatial-c86081616a09345a)
Gnuplot not found, using plotters backend
construction/bosque     time:   [26.221 ms 26.567 ms 26.976 ms]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
construction/kiddo      time:   [30.892 ms 31.108 ms 31.333 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
construction/nabo       time:   [38.613 ms 39.127 ms 39.723 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Benchmarking construction/rstar: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.9s, or reduce sample count to 20.
construction/rstar      time:   [194.42 ms 195.60 ms 197.01 ms]
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

Benchmarking pairwise query/bosque: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 25.3s, or reduce sample count to 10.
pairwise query/bosque   time:   [253.83 ms 254.54 ms 255.40 ms]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
Benchmarking pairwise query/kiddo: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 23.2s, or reduce sample count to 20.
pairwise query/kiddo    time:   [234.47 ms 237.50 ms 241.81 ms]
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
Benchmarking pairwise query/nabo: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 34.0s, or reduce sample count to 10.
pairwise query/nabo     time:   [339.43 ms 342.67 ms 347.12 ms]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
Benchmarking pairwise query/rstar: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 60.1s, or reduce sample count to 10.
pairwise query/rstar    time:   [602.08 ms 609.06 ms 618.91 ms]
Found 16 outliers among 100 measurements (16.00%)
  7 (7.00%) high mild
  9 (9.00%) high severe

Does this look similar to your results?

clbarnes · 2024-01-23T16:04:43Z

From my work laptop:

     Running benches/spatial.rs (/home/barnesc/work/code/nblast-rs/target/release/deps/spatial-6ca8776d186ae63e)
construction/bosque     time:   [19.157 ms 19.439 ms 19.753 ms]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
construction/kiddo      time:   [27.148 ms 27.457 ms 27.798 ms]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
construction/nabo       time:   [27.608 ms 27.672 ms 27.744 ms]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Benchmarking construction/rstar: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 12.4s, or reduce sample count to 40.
construction/rstar      time:   [121.95 ms 122.27 ms 122.63 ms]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking pairwise query/bosque: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 15.7s, or reduce sample count to 30.
pairwise query/bosque   time:   [153.20 ms 153.80 ms 154.50 ms]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking pairwise query/kiddo: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.8s, or reduce sample count to 40.
pairwise query/kiddo    time:   [104.89 ms 105.13 ms 105.42 ms]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Benchmarking pairwise query/nabo: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 24.9s, or reduce sample count to 20.
pairwise query/nabo     time:   [245.36 ms 246.06 ms 246.88 ms]
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe
Benchmarking pairwise query/rstar: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 63.5s, or reduce sample count to 10.
pairwise query/rstar    time:   [626.23 ms 628.63 ms 631.30 ms]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Kiddo takes about 2/3 the time that bosque does for the queries. I saw something similar on my desktop at home. Are you on apple silicon? Bosque optimises hard for cache locality AFAICT, so maybe different CPUs' cache sizes/ strategies are a major determinant here.

If they're about the same I'd lean heavily towards using kiddo; bosque would require a major refactor and force more of the logic into the arena type.

schlegelp · 2024-01-23T16:26:31Z

Interesting. I benched on x86 (2.2 GHz 6-Core Intel Core i7).

Not sure if that's even an issue but my understanding is that bosque modifies the point clouds in place - i.e. you can set things up such that there are no re-indexed copies or meta data for the built tree that need to be kept in memory. Does kiddo come with a memory overhead?

clbarnes · 2024-01-23T17:02:36Z

There is some memory overhead with kiddo but it's pretty minimal. Kiddo copies the coordinates on creation but at present you can't iterate over the points inside it, so you need to keep the original copy around for doing point matches - I have a PR for this here sdd/kiddo#135 (and the equivalent PR for nabo was merged earlier today enlightware/nabo-rs#3 ).

As memory scales with N and slowness of all-to-all scales with N^2, I'm more inclined to seek speed boosts. For the memory issue, kiddo's zero-copy serialisation form is also useful (it wouldn't be hard to apply the same with bosque, of course).

clbarnes · 2024-01-23T18:36:30Z

It turns out implementing a bosque neuron wasn't as refactor-y as I thought it would be, so I've got a branch with that in and you'll be able to select that backend if it's preferred! That will be the default for the wasm package as kiddo still has some issues there.

sdd · 2024-02-17T16:24:10Z

Hey Chris. I've merged your iteration PR in and have just released it as part of Kiddo v4.1.0 - thanks again!

It really puts a smile on my face to see Kiddo performing so well in your benchmarks here - things like this make all the effort worthwhile 😊

I'm gonna see if I can sort the WASM issues out for you now as well.

clbarnes · 2024-02-17T16:33:57Z

Thank you for taking an interest! We're not sure what the special sauce is but our data is different to a lot of point clouds - samples in a branching tree structure with long near-linear regions which I imagine doesn't fit very well into an R-tree's partitioning. Whatever's going on under the hood in kiddo, it seems to work for us :)

clbarnes · 2024-02-17T17:25:20Z

Using kiddo 4.1 to iterate through the tree's points directly means we save on having to store the points twice for tree to tree lookups, which I'd estimate cuts the RAM usage by around 30%. We lose a little performance because the point iteration gets a little more complicated, there's some data copies, and lookups into the tangent_alpha vec. In the raw spatial bench this is up to 10%, but once you roll in the rest of the NBLAST algorithm it's only 1-5%, which I think is acceptable.

test bench_all_to_all_serial_bosque         ... bench:  67,620,596 ns/iter (+/- 3,305,363)
test bench_all_to_all_serial_kiddo          ... bench:  55,202,232 ns/iter (+/- 9,325,367)
test bench_all_to_all_serial_nabo           ... bench: 110,531,114 ns/iter (+/- 7,601,588)
test bench_all_to_all_serial_rstar          ... bench: 108,958,262 ns/iter (+/- 5,820,787)

test bench_construction_bosque              ... bench:     532,666 ns/iter (+/- 24,725)
test bench_construction_kiddo               ... bench:     565,843 ns/iter (+/- 41,343)
test bench_construction_nabo                ... bench:     458,719 ns/iter (+/- 23,585)
test bench_construction_rstar               ... bench:     816,910 ns/iter (+/- 33,603)

test bench_query_bosque                     ... bench:     196,491 ns/iter (+/- 38,031)
test bench_query_kiddo                      ... bench:     161,415 ns/iter (+/- 19,431)
test bench_query_nabo                       ... bench:     284,113 ns/iter (+/- 18,260)
test bench_query_rstar                      ... bench:     331,206 ns/iter (+/- 21,634)

sdd · 2024-02-17T17:48:09Z

That's great! Also, the WASM fix was trivial in the end and is now released as 4.1.1 :-)

Just having a read through the NBLAST paper, out of curiosity. Looks fascinating stuff.

clbarnes added 2 commits January 20, 2024 16:57

spatial benchmarks wip

37b5870

Add benchmarks for spatial trees

9f15d70

clbarnes force-pushed the spatial_bench branch from f395a7a to 9f15d70 Compare January 20, 2024 16:57

clbarnes added 2 commits January 20, 2024 18:34

Approx variant for kiddo

f0d2377

Replace rstar with kiddo

c357420

clbarnes added 6 commits January 20, 2024 19:46

Remove unused error paths

9613db3

clippy

71cd93d

refer to scorematrixbuilder in code example

97f9da3

Switch back to rstar for nblast-js

74dd242

remove unnecessary expects

19261d7

Re-add error handling for neuron creation

30b094f

clbarnes merged commit a1b05d2 into master Jan 20, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spatial bench #30

Spatial bench #30

clbarnes commented Jan 20, 2024 •

edited

Loading

clbarnes commented Jan 20, 2024 •

edited

Loading

schlegelp commented Jan 23, 2024

clbarnes commented Jan 23, 2024 •

edited

Loading

schlegelp commented Jan 23, 2024

clbarnes commented Jan 23, 2024

clbarnes commented Jan 23, 2024 •

edited

Loading

sdd commented Feb 17, 2024

clbarnes commented Feb 17, 2024

clbarnes commented Feb 17, 2024 •

edited

Loading

sdd commented Feb 17, 2024

Spatial bench #30

Spatial bench #30

Conversation

clbarnes commented Jan 20, 2024 • edited Loading

clbarnes commented Jan 20, 2024 • edited Loading

schlegelp commented Jan 23, 2024

clbarnes commented Jan 23, 2024 • edited Loading

schlegelp commented Jan 23, 2024

clbarnes commented Jan 23, 2024

clbarnes commented Jan 23, 2024 • edited Loading

sdd commented Feb 17, 2024

clbarnes commented Feb 17, 2024

clbarnes commented Feb 17, 2024 • edited Loading

sdd commented Feb 17, 2024

clbarnes commented Jan 20, 2024 •

edited

Loading

clbarnes commented Jan 20, 2024 •

edited

Loading

clbarnes commented Jan 23, 2024 •

edited

Loading

clbarnes commented Jan 23, 2024 •

edited

Loading

clbarnes commented Feb 17, 2024 •

edited

Loading