Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use in-place radix sort instead of quicksort #31

Closed
wants to merge 2 commits into from
Closed

Conversation

mourner
Copy link
Owner

@mourner mourner commented Apr 1, 2020

An experiment that adapts MSD in-place radix sort from this article to Flatbush, replacing quick sort. Closes #30. cc @jbuckmccready

Makes indexing large amounts of data faster, while being slower on small amounts. A few quick results:

  • 10 million: ~40% faster
  • 1 million: ~15% faster
  • 100k: ~15% slower

Not sure why the smaller amounts are slower... Perhaps we could include both sorts and pick one that's faster based on the number of items. Although this would make this project's code quite a bite more complicated and verbose, and I'm not sure whether this is a good tradeoff.

@mourner mourner added the enhancement New feature or request label Apr 1, 2020
@jbuckmccready
Copy link
Collaborator

jbuckmccready commented Apr 1, 2020

I tried these changes in c++ and compared against quicksort + insertion sort when right - left < 8 as mentioned in #32.

I found the following:

  • Less than 1 million: always slower
  • 1 million: ~0%
  • 4.5 million: ~3% faster
  • 14.4 million: ~7% faster

@mourner
Copy link
Owner Author

mourner commented Apr 1, 2020

Hmm. Not sure if I did anything wrong in the implementation (let me know if you discover anything there), but currently leaning on not merging this provided mixed results and code complexity overhead.

@jbuckmccready
Copy link
Collaborator

Yeah I don't plan to add it to my c++ computation geometry code, the algorithms rarely deal with anything having more than a few thousand items anyways, certainly no more than 100k.

Nothing jumps out as being wrong in the implementation. However I did a quick check and the number of swaps is significantly higher.

The following is for a index with 4444 items:

  • Quick sort as originally implemented (no early return for partial sort): 7666 swaps
  • Quick sort with early return for partial sorted node: 6365 swaps
  • Radix sort: 10346 swaps

The following is for a index with 44.4k items:

  • Quick sort with early return for partial sorted node: 86.4k swaps
  • Radix sort: 572k swaps

This is mostly due to the aggressive use of insertion sort when !(i - j > 64), changing 64 to 16 brings the Radix sort down to 112.7k swaps, and it runs about 30% faster, but still not as fast as the quick sort in my tests even at 140k items.

Thanks for investigating this. I still feel as though there must be a better sort algorithm for doing this type of specialized partial sort, but I don't know what it is, maybe there is some kind variation of this Radix sort. At least the early return after partial sorting in the quick sort yielded some real benefit.

@leeoniya
Copy link

leeoniya commented Apr 2, 2020

there's also TimSort (dynamically chosen merge & insertion sorts) which has an n log n worst case compared to n^2 for quicksort, but i think the typical case for TimSort is slower, and the code size is larger (not sure by how much). TimSort also excels at partially or mostly-sorted lists and is also stable, but that might not matter here.

an improved variant of TimSort was made the default algo in Rust: rust-lang/rust#38192

@jbuckmccready
Copy link
Collaborator

@leeoniya Yeah, that's another consideration: items are likely to be inserted with some amount of spatial locality and therefore be patterned relative to their Hilbert values (not completely random, e.g. loading the items in row by row or column by column from a 2D grid). How much that can speed up the sort by using something other than quick sort I'm not sure.

Rust uses another algorithm for unstable sort, see here: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.sort_unstable

Link to a c++ implementation of the algorithm it's based on here: https://github.com/orlp/pdqsort

It also supports branchless sorting for arithmetic types which is intriguing.

@jbuckmccready
Copy link
Collaborator

I tried the pdqsort and std::sort in my c++ version. pdqsort was faster than std::sort (by a significant margin of around 10%) but still slower than the simple quick sort implemented in this repository by about 1% 😕. I'm only testing input items that are bounding boxes going around a circle perimeter so maybe with some different patterned input there is some advantage. I need to build out my benchmarks to see.

@mourner
Copy link
Owner Author

mourner commented Apr 2, 2020

One more thing we could try is an idea from RBush — use divide-and-conquer combined with quickselect for this kind of "sort an array into N unsorted groups" task specifically. I'm guessing it might be slower because we already avoid a lot of iterations with the "stop inside node" quicksort heuristic, but worth giving a shot anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate radix sorting for faster indexing
3 participants