Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defend against numerical instability when computing number of clusters #166

Merged
merged 1 commit into from
May 8, 2024

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented May 8, 2024

  • I agree to follow the project's code of conduct.
  • I added an entry to rstar/CHANGELOG.md if knowledge of this change could be valuable to users.

Apparently, it can happen that n.log(k) yields j+eps even though k^j=n exactly. After the call to ceil, this ends up with depth being one too large so that the number of clusters can also end up as one in which case the bulk-loading algorithm enters infinite recursion eventually overflowing the stack.

This change defends against this by ensuring that we always build at least two clusters per recursion step thereby actually reducing the number of objects the next steps has to consider.

Copy link
Member

@urschrei urschrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how you found this, but: great!

@adamreichold
Copy link
Member Author

I have no idea how you found this, but: great!

Every seventh or so rebuild of the current per-segment spatial indexes at https://md.umwelt.info led to a stack overflow starting a week ago. We first thought it was due to Rayon's work stealing, background threads and a continuously growing number of documents. But debugging turned out that this is due to 216.log(6) == 3.000 .. 04 triggering a number of clusters of one. So basically, bad luck...

@adamreichold
Copy link
Member Author

adamreichold commented May 8, 2024

So basically, bad luck...

Actually, now that I think of it, the trigger was probably an upgrade of the base OS to Ubuntu 24.04 which includes a newer glibc and thereby most likely updated transcendental function approximations like log.

@adamreichold adamreichold force-pushed the numerical-instability-infinite-recursion branch from ac6cefe to c5354cb Compare May 8, 2024 20:02
Apparently, it can happen that n.log(k) yields j+eps even though k^j=n exactly.
After the call to ceil, this ends up with `depth` being one too large so that
the number of clusters can also end up as one in which case the bulk-loading
algorithm enters infinite recursion eventually overflowing the stack.

This change defends against this by ensuring that we always build at least two
clusters per recursion step thereby actually reducing the number of objects the
next steps has to consider.
@adamreichold adamreichold force-pushed the numerical-instability-infinite-recursion branch from c5354cb to 6812c6f Compare May 8, 2024 20:04
@adamreichold adamreichold added this pull request to the merge queue May 8, 2024
Merged via the queue into master with commit 0139255 May 8, 2024
6 checks passed
@adamreichold adamreichold deleted the numerical-instability-infinite-recursion branch May 8, 2024 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants