Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 5: rewrite of ImmutableKdTree, rationalise feature names, fix long-standing nearest_n_within max_qty zero issue #186

Merged
merged 27 commits into from
Dec 4, 2024

Conversation

sdd
Copy link
Owner

@sdd sdd commented Nov 17, 2024

[5.0.0] - 2024-12-03

Version 5 bundles a complete re-write of ImmutableKdTree alongside some rationalization of feature names and a change of type of the max_qty parameter present in some query methods from usize to NonZero<usize>.

ImmutableKdTree rewrite

BREAKING CHANGE: For anyone that has been serializing ImmutableKdTree (using either serde or rkyv), version 5 constitutes a breaking change as serialized trees from prior versions will not be deserializable with v5 and vice-versa.

Quite a few people (#172, #158, #78) have previously unsuccessfully tried to use ImmutableKdTree with data containing many points that have the same value on one or more of their axes, for example point cloud data containing many points on a flat axis-aligned plane.
The v5 rewrite of ImmutableKdTree experiences none of these kinds of problems and can be safely used no matter what your data looks like.
Query performance is in many cases faster than the prior version, but sometimes slightly slower - your mileage may vary but differences in query performance is pretty small.
Construction performance is considerably improved, with up to a 2x speedup, with the improvement becoming more pronounced as the tree size increases.
Memory efficiency is slightly better also.

Behind-the-scenes, the structure of the ImmutableKdTree has changed from using a Vec of fixed-size array-based buckets to using a single array-of-vecs to store all the points, with per-bucket offsets being stored for each leaf. To avoid dynamic allocation at query time, a fixed slice that chunks the bucket is used, permitting autovectorisation to work well and giving the opportunity for manual SIMD to be used on the fixed-length slice. Trailing values beyond the last full slice are processed individually.

Modified van Emde Boas Stem Ordering

The experimental modified_van_emde_boas feature allows an alternative stem node ordering mode to be enabled.
When enabled, the ordering of stem nodes changes from using Eytzinger ordering to a modified van Emde Boas (can I call this a Donnelly ordering? :-p) order.
This is a novel implementation unique to Kiddo v5 that ensures that a cache line only needs to be retrieved at most once every three levels (on most CPUs when using f64), or every four levels (on most CPUs when using f32). This increases by an extra one level on CPU architectures with a 128-byte cache line width (this is quite rare at the moment but can be found on some Apple M3 and newer CPUs).
Previous literature has indicated that a standard van Emde Boas layout provided no advantage, but thanks to an efficient branchless implementation of the stem ordering logic, and a refinement to leave the last slot on each cache line empty, rather than straddling levels across cache lines, cache efficiency is improved to the point where gains can sometimes be seen over the previously-best Eytzinger layout.
Typically, performance varies from between 1% faster an 5% slower than Eytzinger, from what I've seen during testing, with the differences often being statistically insignificant.

ImmutableKdTree + rkyv

The v5 ImmutableKdTree uses an Aligned Vec internally for storing stem nodes. It is not possible to zero-copy deserialize
into an Aligned Vec with rkyv as there is no guarantee that the stem vec in the underlying buffer respects the alignment.
As such, unfortunately this means that ImmutableKdTree itself can't be fully zero-copy serialized / deserialized, but there
are some related types that are provided that allow zero-copy deserialization to be performed for all other parts of the tree
except for the stems, which themselves get copied into an aligned array from the buffer.
In practice this is still very fast as the stems are only a very small part of the overall tree.

See immutable-rkyv-serialize and immutable-rkyv-deserialize in the examples for how to do this.

Feature name changes

BREAKING CHANGE: It was pointed out in #159 that it was necessary to enable both rkyv and serialize_rkyv features to use Rkyv serialization. I took the opportunity of the major version bump to rationalize the feature names to make them easier to use.
serialize_rkyv has been removed and now only rkyv feature is needed to enable Rkyv serialization.
serialize has been renamed to serde in line with ecosystem conventions.
half has been renamed to f16 for clarity.

max_qty Changed to NonZero<usize>

BREAKING CHANGE: It was noted by @ezrasingh that specifying max_qty as 0 in version 4.2.1 alongside sorted = false resulted in a panic. Since requesting a max_qty of zero makes no sense, and to avoid adding a run-time check for users who have no possibility of specifying a max_qty of 0, the type of max_qty has been changed to NonZero<usize> to make this a compile-time check instead.

@sdd sdd force-pushed the feat/mod-van-emde-boas branch 3 times, most recently from 8744278 to 644b5e3 Compare November 22, 2024 23:52
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 94.94128% with 56 lines in your changes missing coverage. Please review.

Project coverage is 95.47%. Comparing base (460442f) to head (d1bd916).
Report is 26 commits behind head on master.

Files with missing lines Patch % Lines
src/custom_serde.rs 0.00% 33 Missing ⚠️
src/float_leaf_slice/leaf_slice.rs 96.08% 14 Missing and 2 partials ⚠️
src/fixed/kdtree.rs 33.33% 1 Missing and 1 partial ⚠️
src/float/kdtree.rs 33.33% 1 Missing and 1 partial ⚠️
src/common/generate_nearest_n_within_unsorted.rs 80.00% 1 Missing ⚠️
src/float_leaf_slice/fallback.rs 99.32% 0 Missing and 1 partial ⚠️
src/immutable/float/query/nearest_one.rs 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           master     #186     +/-   ##
=========================================
  Coverage   95.47%   95.47%             
=========================================
  Files          52       52             
  Lines        7528     6459   -1069     
  Branches     7528     6459   -1069     
=========================================
- Hits         7187     6167   -1020     
+ Misses        320      271     -49     
  Partials       21       21             

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sdd sdd force-pushed the feat/mod-van-emde-boas branch from 644b5e3 to 3f0065e Compare November 22, 2024 23:57
@sdd sdd force-pushed the feat/mod-van-emde-boas branch from 5249d03 to b8f3e9c Compare November 23, 2024 00:25
@sdd sdd force-pushed the feat/mod-van-emde-boas branch 4 times, most recently from b8a84e7 to 7ad0e92 Compare November 30, 2024 19:55
@sdd sdd force-pushed the feat/mod-van-emde-boas branch from 7ad0e92 to 6112843 Compare November 30, 2024 19:59
@sdd sdd force-pushed the feat/mod-van-emde-boas branch from 6112843 to 270e00b Compare November 30, 2024 20:12
@sdd sdd force-pushed the feat/mod-van-emde-boas branch from cfa21bf to 95ece58 Compare December 2, 2024 19:06
@sdd sdd force-pushed the feat/mod-van-emde-boas branch 2 times, most recently from 2654062 to bc4d850 Compare December 3, 2024 20:21
…ests for branchless and non-branchless mveb. Fix some new lints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants