Use in-place radix sort instead of quicksort #31

mourner · 2020-04-01T13:48:37Z

An experiment that adapts MSD in-place radix sort from this article to Flatbush, replacing quick sort. Closes #30. cc @jbuckmccready

Makes indexing large amounts of data faster, while being slower on small amounts. A few quick results:

10 million: ~40% faster
1 million: ~15% faster
100k: ~15% slower

Not sure why the smaller amounts are slower... Perhaps we could include both sorts and pick one that's faster based on the number of items. Although this would make this project's code quite a bite more complicated and verbose, and I'm not sure whether this is a good tradeoff.

makes indexing faster on large amounts of data

jbuckmccready · 2020-04-01T19:18:07Z

I tried these changes in c++ and compared against quicksort + insertion sort when right - left < 8 as mentioned in #32.

I found the following:

Less than 1 million: always slower
1 million: ~0%
4.5 million: ~3% faster
14.4 million: ~7% faster

mourner · 2020-04-01T20:49:50Z

Hmm. Not sure if I did anything wrong in the implementation (let me know if you discover anything there), but currently leaning on not merging this provided mixed results and code complexity overhead.

jbuckmccready · 2020-04-01T23:59:12Z

Yeah I don't plan to add it to my c++ computation geometry code, the algorithms rarely deal with anything having more than a few thousand items anyways, certainly no more than 100k.

Nothing jumps out as being wrong in the implementation. However I did a quick check and the number of swaps is significantly higher.

The following is for a index with 4444 items:

Quick sort as originally implemented (no early return for partial sort): 7666 swaps
Quick sort with early return for partial sorted node: 6365 swaps
Radix sort: 10346 swaps

The following is for a index with 44.4k items:

Quick sort with early return for partial sorted node: 86.4k swaps
Radix sort: 572k swaps

This is mostly due to the aggressive use of insertion sort when !(i - j > 64), changing 64 to 16 brings the Radix sort down to 112.7k swaps, and it runs about 30% faster, but still not as fast as the quick sort in my tests even at 140k items.

Thanks for investigating this. I still feel as though there must be a better sort algorithm for doing this type of specialized partial sort, but I don't know what it is, maybe there is some kind variation of this Radix sort. At least the early return after partial sorting in the quick sort yielded some real benefit.

leeoniya · 2020-04-02T00:24:17Z

there's also TimSort (dynamically chosen merge & insertion sorts) which has an n log n worst case compared to n^2 for quicksort, but i think the typical case for TimSort is slower, and the code size is larger (not sure by how much). TimSort also excels at partially or mostly-sorted lists and is also stable, but that might not matter here.

an improved variant of TimSort was made the default algo in Rust: rust-lang/rust#38192

jbuckmccready · 2020-04-02T01:36:48Z

@leeoniya Yeah, that's another consideration: items are likely to be inserted with some amount of spatial locality and therefore be patterned relative to their Hilbert values (not completely random, e.g. loading the items in row by row or column by column from a 2D grid). How much that can speed up the sort by using something other than quick sort I'm not sure.

Rust uses another algorithm for unstable sort, see here: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.sort_unstable

Link to a c++ implementation of the algorithm it's based on here: https://github.com/orlp/pdqsort

It also supports branchless sorting for arithmetic types which is intriguing.

jbuckmccready · 2020-04-02T02:27:52Z

I tried the pdqsort and std::sort in my c++ version. pdqsort was faster than std::sort (by a significant margin of around 10%) but still slower than the simple quick sort implemented in this repository by about 1% 😕. I'm only testing input items that are bounding boxes going around a circle perimeter so maybe with some different patterned input there is some advantage. I need to build out my benchmarks to see.

mourner · 2020-04-02T07:39:36Z

One more thing we could try is an idea from RBush — use divide-and-conquer combined with quickselect for this kind of "sort an array into N unsorted groups" task specifically. I'm guessing it might be slower because we already avoid a lot of iterations with the "stop inside node" quicksort heuristic, but worth giving a shot anyway.

mourner added 2 commits April 1, 2020 16:43

use in-place radix sort instead of quicksort

f266157

makes indexing faster on large amounts of data

fixup partial radix sorting

56a7336

mourner added the enhancement New feature or request label Apr 1, 2020

mourner mentioned this pull request Apr 2, 2020

Investigate radix sorting for faster indexing #30

Closed

mourner closed this Apr 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use in-place radix sort instead of quicksort #31

Use in-place radix sort instead of quicksort #31

mourner commented Apr 1, 2020 •

edited

Loading

jbuckmccready commented Apr 1, 2020 •

edited

Loading

mourner commented Apr 1, 2020

jbuckmccready commented Apr 1, 2020

leeoniya commented Apr 2, 2020

jbuckmccready commented Apr 2, 2020

jbuckmccready commented Apr 2, 2020

mourner commented Apr 2, 2020

Use in-place radix sort instead of quicksort #31

Use in-place radix sort instead of quicksort #31

Conversation

mourner commented Apr 1, 2020 • edited Loading

jbuckmccready commented Apr 1, 2020 • edited Loading

mourner commented Apr 1, 2020

jbuckmccready commented Apr 1, 2020

leeoniya commented Apr 2, 2020

jbuckmccready commented Apr 2, 2020

jbuckmccready commented Apr 2, 2020

mourner commented Apr 2, 2020

mourner commented Apr 1, 2020 •

edited

Loading

jbuckmccready commented Apr 1, 2020 •

edited

Loading