-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 5: rewrite of ImmutableKdTree
, rationalise feature names, fix long-standing nearest_n_within
max_qty
zero issue
#186
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It makes no sense to ask for zero items here. Changing max_qty to NonZero means the check for zero can be compile-time rather than run-time in many use cases. BREAKING CHANGE: type of `nearest_n_within`'s `max_qty` parameter has changed from `usize` to `NonZero<usize>`
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
3 times, most recently
from
November 22, 2024 23:52
8744278
to
644b5e3
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #186 +/- ##
=========================================
Coverage 95.47% 95.47%
=========================================
Files 52 52
Lines 7528 6459 -1069
Branches 7528 6459 -1069
=========================================
- Hits 7187 6167 -1020
+ Misses 320 271 -49
Partials 21 21 ☔ View full report in Codecov by Sentry. |
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
from
November 22, 2024 23:57
644b5e3
to
3f0065e
Compare
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
from
November 23, 2024 00:25
5249d03
to
b8f3e9c
Compare
…mde_boas feature and default to Eytzinger. Update readme and changelog
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
4 times, most recently
from
November 30, 2024 19:55
b8a84e7
to
7ad0e92
Compare
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
from
November 30, 2024 19:59
7ad0e92
to
6112843
Compare
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
from
November 30, 2024 20:12
6112843
to
270e00b
Compare
…ecifiable parameter issue. Fix some lint issues and tests, and update docs
…ed traits modules to make things a bit neater, especially in the docs
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
from
December 2, 2024 19:06
cfa21bf
to
95ece58
Compare
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
2 times, most recently
from
December 3, 2024 20:21
2654062
to
bc4d850
Compare
…ests for branchless and non-branchless mveb. Fix some new lints
sdd
force-pushed
the
feat/mod-van-emde-boas
branch
from
December 3, 2024 20:22
bc4d850
to
d1bd916
Compare
This was referenced Dec 4, 2024
Closed
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[5.0.0] - 2024-12-03
Version 5 bundles a complete re-write of
ImmutableKdTree
alongside some rationalization of feature names and a change of type of themax_qty
parameter present in some query methods fromusize
toNonZero<usize>
.ImmutableKdTree
rewriteBREAKING CHANGE: For anyone that has been serializing
ImmutableKdTree
(using eitherserde
orrkyv
), version 5 constitutes a breaking change as serialized trees from prior versions will not be deserializable with v5 and vice-versa.Quite a few people (#172, #158, #78) have previously unsuccessfully tried to use
ImmutableKdTree
with data containing many points that have the same value on one or more of their axes, for example point cloud data containing many points on a flat axis-aligned plane.The v5 rewrite of
ImmutableKdTree
experiences none of these kinds of problems and can be safely used no matter what your data looks like.Query performance is in many cases faster than the prior version, but sometimes slightly slower - your mileage may vary but differences in query performance is pretty small.
Construction performance is considerably improved, with up to a 2x speedup, with the improvement becoming more pronounced as the tree size increases.
Memory efficiency is slightly better also.
Behind-the-scenes, the structure of the
ImmutableKdTree
has changed from using a Vec of fixed-size array-based buckets to using a single array-of-vecs to store all the points, with per-bucket offsets being stored for each leaf. To avoid dynamic allocation at query time, a fixed slice that chunks the bucket is used, permitting autovectorisation to work well and giving the opportunity for manual SIMD to be used on the fixed-length slice. Trailing values beyond the last full slice are processed individually.Modified van Emde Boas Stem Ordering
The experimental
modified_van_emde_boas
feature allows an alternative stem node ordering mode to be enabled.When enabled, the ordering of stem nodes changes from using Eytzinger ordering to a modified van Emde Boas (can I call this a Donnelly ordering? :-p) order.
This is a novel implementation unique to Kiddo v5 that ensures that a cache line only needs to be retrieved at most once every three levels (on most CPUs when using f64), or every four levels (on most CPUs when using f32). This increases by an extra one level on CPU architectures with a 128-byte cache line width (this is quite rare at the moment but can be found on some Apple M3 and newer CPUs).
Previous literature has indicated that a standard van Emde Boas layout provided no advantage, but thanks to an efficient branchless implementation of the stem ordering logic, and a refinement to leave the last slot on each cache line empty, rather than straddling levels across cache lines, cache efficiency is improved to the point where gains can sometimes be seen over the previously-best Eytzinger layout.
Typically, performance varies from between 1% faster an 5% slower than Eytzinger, from what I've seen during testing, with the differences often being statistically insignificant.
ImmutableKdTree
+rkyv
The v5
ImmutableKdTree
uses an Aligned Vec internally for storing stem nodes. It is not possible to zero-copy deserializeinto an Aligned Vec with
rkyv
as there is no guarantee that the stem vec in the underlying buffer respects the alignment.As such, unfortunately this means that
ImmutableKdTree
itself can't be fully zero-copy serialized / deserialized, but thereare some related types that are provided that allow zero-copy deserialization to be performed for all other parts of the tree
except for the stems, which themselves get copied into an aligned array from the buffer.
In practice this is still very fast as the stems are only a very small part of the overall tree.
See
immutable-rkyv-serialize
andimmutable-rkyv-deserialize
in the examples for how to do this.Feature name changes
BREAKING CHANGE: It was pointed out in #159 that it was necessary to enable both
rkyv
andserialize_rkyv
features to use Rkyv serialization. I took the opportunity of the major version bump to rationalize the feature names to make them easier to use.serialize_rkyv
has been removed and now onlyrkyv
feature is needed to enable Rkyv serialization.serialize
has been renamed toserde
in line with ecosystem conventions.half
has been renamed tof16
for clarity.max_qty
Changed toNonZero<usize>
BREAKING CHANGE: It was noted by @ezrasingh that specifying
max_qty
as0
in version 4.2.1 alongsidesorted = false
resulted in a panic. Since requesting amax_qty
of zero makes no sense, and to avoid adding a run-time check for users who have no possibility of specifying amax_qty
of0
, the type ofmax_qty
has been changed toNonZero<usize>
to make this a compile-time check instead.