Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial bench #30

Merged
merged 10 commits into from
Jan 20, 2024
Merged

Spatial bench #30

merged 10 commits into from
Jan 20, 2024

Conversation

clbarnes
Copy link
Owner

@clbarnes clbarnes commented Jan 20, 2024

Crate for benchmarking various spatial lookup crates for our purposes.

Kiddo comes out on top, but will be easier to use once sdd/kiddo#135 is merged. Note that there are some questions about using kiddo in wasm: sdd/kiddo#130

@schlegelp may find this useful! Just run cargo bench from the spatial_bench directory. For each of bosque (fastcore-rs default), kiddo, nabo, and rstar (nblast-rs default), it benchmarks building 1000 trees by augmenting the example data, and running just the spatial query bit of 1 000 000 neuron pair lookups. These are all in serial, I don't have a reason to suspect they'll parallelise differently.

@clbarnes
Copy link
Owner Author

clbarnes commented Jan 20, 2024

This now changes the default backend to kiddo, which ~halves the total query time. Kiddo has an "approximate" option too, which makes it about 2.5x faster still, but it is so approximate that it throws the results a long way off.

Also there are some ergonomic considerations, including some renames and a usage example up front.

@clbarnes clbarnes merged commit a1b05d2 into master Jan 20, 2024
20 checks passed
@schlegelp
Copy link

Very cool, thanks. When I run it on my machine kiddo and bosque are very close together:

Running benches/spatial.rs (/Users/philipps/Google Drive/Cloudbox/Github/nblast-rs/target/release/deps/spatial-c86081616a09345a)
Gnuplot not found, using plotters backend
construction/bosque     time:   [26.221 ms 26.567 ms 26.976 ms]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
construction/kiddo      time:   [30.892 ms 31.108 ms 31.333 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
construction/nabo       time:   [38.613 ms 39.127 ms 39.723 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Benchmarking construction/rstar: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.9s, or reduce sample count to 20.
construction/rstar      time:   [194.42 ms 195.60 ms 197.01 ms]
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

Benchmarking pairwise query/bosque: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 25.3s, or reduce sample count to 10.
pairwise query/bosque   time:   [253.83 ms 254.54 ms 255.40 ms]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
Benchmarking pairwise query/kiddo: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 23.2s, or reduce sample count to 20.
pairwise query/kiddo    time:   [234.47 ms 237.50 ms 241.81 ms]
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
Benchmarking pairwise query/nabo: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 34.0s, or reduce sample count to 10.
pairwise query/nabo     time:   [339.43 ms 342.67 ms 347.12 ms]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
Benchmarking pairwise query/rstar: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 60.1s, or reduce sample count to 10.
pairwise query/rstar    time:   [602.08 ms 609.06 ms 618.91 ms]
Found 16 outliers among 100 measurements (16.00%)
  7 (7.00%) high mild
  9 (9.00%) high severe

Does this look similar to your results?

@clbarnes
Copy link
Owner Author

clbarnes commented Jan 23, 2024

From my work laptop:

     Running benches/spatial.rs (/home/barnesc/work/code/nblast-rs/target/release/deps/spatial-6ca8776d186ae63e)
construction/bosque     time:   [19.157 ms 19.439 ms 19.753 ms]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
construction/kiddo      time:   [27.148 ms 27.457 ms 27.798 ms]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
construction/nabo       time:   [27.608 ms 27.672 ms 27.744 ms]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Benchmarking construction/rstar: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 12.4s, or reduce sample count to 40.
construction/rstar      time:   [121.95 ms 122.27 ms 122.63 ms]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking pairwise query/bosque: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 15.7s, or reduce sample count to 30.
pairwise query/bosque   time:   [153.20 ms 153.80 ms 154.50 ms]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking pairwise query/kiddo: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.8s, or reduce sample count to 40.
pairwise query/kiddo    time:   [104.89 ms 105.13 ms 105.42 ms]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Benchmarking pairwise query/nabo: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 24.9s, or reduce sample count to 20.
pairwise query/nabo     time:   [245.36 ms 246.06 ms 246.88 ms]
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe
Benchmarking pairwise query/rstar: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 63.5s, or reduce sample count to 10.
pairwise query/rstar    time:   [626.23 ms 628.63 ms 631.30 ms]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Kiddo takes about 2/3 the time that bosque does for the queries. I saw something similar on my desktop at home. Are you on apple silicon? Bosque optimises hard for cache locality AFAICT, so maybe different CPUs' cache sizes/ strategies are a major determinant here.

If they're about the same I'd lean heavily towards using kiddo; bosque would require a major refactor and force more of the logic into the arena type.

@schlegelp
Copy link

Interesting. I benched on x86 (2.2 GHz 6-Core Intel Core i7).

Not sure if that's even an issue but my understanding is that bosque modifies the point clouds in place - i.e. you can set things up such that there are no re-indexed copies or meta data for the built tree that need to be kept in memory. Does kiddo come with a memory overhead?

@clbarnes
Copy link
Owner Author

There is some memory overhead with kiddo but it's pretty minimal. Kiddo copies the coordinates on creation but at present you can't iterate over the points inside it, so you need to keep the original copy around for doing point matches - I have a PR for this here sdd/kiddo#135 (and the equivalent PR for nabo was merged earlier today enlightware/nabo-rs#3 ).

As memory scales with N and slowness of all-to-all scales with N^2, I'm more inclined to seek speed boosts. For the memory issue, kiddo's zero-copy serialisation form is also useful (it wouldn't be hard to apply the same with bosque, of course).

@clbarnes
Copy link
Owner Author

clbarnes commented Jan 23, 2024

It turns out implementing a bosque neuron wasn't as refactor-y as I thought it would be, so I've got a branch with that in and you'll be able to select that backend if it's preferred! That will be the default for the wasm package as kiddo still has some issues there.

@sdd
Copy link

sdd commented Feb 17, 2024

Hey Chris. I've merged your iteration PR in and have just released it as part of Kiddo v4.1.0 - thanks again!

It really puts a smile on my face to see Kiddo performing so well in your benchmarks here - things like this make all the effort worthwhile 😊

I'm gonna see if I can sort the WASM issues out for you now as well.

@clbarnes
Copy link
Owner Author

Thank you for taking an interest! We're not sure what the special sauce is but our data is different to a lot of point clouds - samples in a branching tree structure with long near-linear regions which I imagine doesn't fit very well into an R-tree's partitioning. Whatever's going on under the hood in kiddo, it seems to work for us :)

@clbarnes
Copy link
Owner Author

clbarnes commented Feb 17, 2024

Using kiddo 4.1 to iterate through the tree's points directly means we save on having to store the points twice for tree to tree lookups, which I'd estimate cuts the RAM usage by around 30%. We lose a little performance because the point iteration gets a little more complicated, there's some data copies, and lookups into the tangent_alpha vec. In the raw spatial bench this is up to 10%, but once you roll in the rest of the NBLAST algorithm it's only 1-5%, which I think is acceptable.

test bench_all_to_all_serial_bosque         ... bench:  67,620,596 ns/iter (+/- 3,305,363)
test bench_all_to_all_serial_kiddo          ... bench:  55,202,232 ns/iter (+/- 9,325,367)
test bench_all_to_all_serial_nabo           ... bench: 110,531,114 ns/iter (+/- 7,601,588)
test bench_all_to_all_serial_rstar          ... bench: 108,958,262 ns/iter (+/- 5,820,787)

test bench_construction_bosque              ... bench:     532,666 ns/iter (+/- 24,725)
test bench_construction_kiddo               ... bench:     565,843 ns/iter (+/- 41,343)
test bench_construction_nabo                ... bench:     458,719 ns/iter (+/- 23,585)
test bench_construction_rstar               ... bench:     816,910 ns/iter (+/- 33,603)

test bench_query_bosque                     ... bench:     196,491 ns/iter (+/- 38,031)
test bench_query_kiddo                      ... bench:     161,415 ns/iter (+/- 19,431)
test bench_query_nabo                       ... bench:     284,113 ns/iter (+/- 18,260)
test bench_query_rstar                      ... bench:     331,206 ns/iter (+/- 21,634)

@sdd
Copy link

sdd commented Feb 17, 2024

That's great! Also, the WASM fix was trivial in the end and is now released as 4.1.1 :-)

Just having a read through the NBLAST paper, out of curiosity. Looks fascinating stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants