Skip to content

Commit

Permalink
Doc: The default sorting alg. is stable from 1.9 (#47579)
Browse files Browse the repository at this point in the history
* Update doc/src/base/sort.md

* Update docs: The default sorting alg. is stable

* Compat 1.9 for QuickSort to be stable

* Specify the default algorithm

* Use example from InlineStrings.jl

* Change example to jldoctest

* Remove "*appear* to be stable." as slightly misleading.

Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
  • Loading branch information
petvana and LilithHafner authored Nov 21, 2022
1 parent d18fd47 commit c5fe17b
Showing 1 changed file with 47 additions and 33 deletions.
80 changes: 47 additions & 33 deletions doc/src/base/sort.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,53 +141,67 @@ There are currently four sorting algorithms available in base Julia:
* [`PartialQuickSort(k)`](@ref)
* [`MergeSort`](@ref)

`InsertionSort` is an O(n^2) stable sorting algorithm. It is efficient for very small `n`, and
is used internally by `QuickSort`.
`InsertionSort` is an O(n²) stable sorting algorithm. It is efficient for very small `n`,
and is used internally by `QuickSort`.

`QuickSort` is an O(n log n) sorting algorithm which is in-place, very fast, but not stable –
i.e. elements which are considered equal will not remain in the same order in which they originally
appeared in the array to be sorted. `QuickSort` is the default algorithm for numeric values, including
integers and floats.
`QuickSort` is a very fast sorting algorithm with an average-case time complexity of
O(n log n). `QuickSort` is stable, i.e., elements considered equal will remain in the same
order. Notice that O(n²) is worst-case complexity, but it gets vanishingly unlikely as the
pivot selection is randomized.

`PartialQuickSort(k)` is similar to `QuickSort`, but the output array is only sorted up to index
`k` if `k` is an integer, or in the range of `k` if `k` is an `OrdinalRange`. For example:
`PartialQuickSort(k::OrdinalRange)` is similar to `QuickSort`, but the output array is only
sorted in the range of `k`. For example:

```julia
x = rand(1:500, 100)
k = 50
k2 = 50:100
s = sort(x; alg=QuickSort)
ps = sort(x; alg=PartialQuickSort(k))
qs = sort(x; alg=PartialQuickSort(k2))
map(issorted, (s, ps, qs)) # => (true, false, false)
map(x->issorted(x[1:k]), (s, ps, qs)) # => (true, true, false)
map(x->issorted(x[k2]), (s, ps, qs)) # => (true, false, true)
s[1:k] == ps[1:k] # => true
s[k2] == qs[k2] # => true
```jldoctest
julia> x = rand(1:500, 100);
julia> k = 50:100;
julia> s1 = sort(x; alg=QuickSort);
julia> s2 = sort(x; alg=PartialQuickSort(k));
julia> map(issorted, (s1, s2))
(true, false)
julia> map(x->issorted(x[k]), (s1, s2))
(true, true)
julia> s1[k] == s2[k]
true
```

!!! compat "Julia 1.9"
The `QuickSort` and `PartialQuickSort` algorithms are stable since Julia 1.9.

`MergeSort` is an O(n log n) stable sorting algorithm but is not in-place – it requires a temporary
array of half the size of the input array – and is typically not quite as fast as `QuickSort`.
It is the default algorithm for non-numeric data.

The default sorting algorithms are chosen on the basis that they are fast and stable, or *appear*
to be so. For numeric types indeed, `QuickSort` is selected as it is faster and indistinguishable
in this case from a stable sort (unless the array records its mutations in some way). The stability
property comes at a non-negligible cost, so if you don't need it, you may want to explicitly specify
your preferred algorithm, e.g. `sort!(v, alg=QuickSort)`.
The default sorting algorithms are chosen on the basis that they are fast and stable.
Usually, `QuickSort` is selected, but `InsertionSort` is preferred for small data.
You can also explicitly specify your preferred algorithm, e.g.
`sort!(v, alg=PartialQuickSort(10:20))`.

The mechanism by which Julia picks default sorting algorithms is implemented via the `Base.Sort.defalg`
function. It allows a particular algorithm to be registered as the default in all sorting functions
for specific arrays. For example, here are the two default methods from [`sort.jl`](https://github.com/JuliaLang/julia/blob/master/base/sort.jl):
The mechanism by which Julia picks default sorting algorithms is implemented via the
`Base.Sort.defalg` function. It allows a particular algorithm to be registered as the
default in all sorting functions for specific arrays. For example, here is the default
method from [`sort.jl`](https://github.com/JuliaLang/julia/blob/master/base/sort.jl):

```julia
defalg(v::AbstractArray) = DEFAULT_STABLE
```

You may change the default behavior for specific types by defining new methods for `defalg`.
For example, [InlineStrings.jl](https://github.com/JuliaStrings/InlineStrings.jl/blob/v1.3.2/src/InlineStrings.jl#L903)
defines the following method:
```julia
defalg(v::AbstractArray) = MergeSort
defalg(v::AbstractArray{<:Number}) = QuickSort
Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = InlineStringSort
```

As for numeric arrays, choosing a non-stable default algorithm for array types for which the notion
of a stable sort is meaningless (i.e. when two values comparing equal can not be distinguished)
may make sense.
!!! compat "Julia 1.9"
The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed
to be stable since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays.

## Alternate orderings

Expand Down

7 comments on commit c5fe17b

@nanosoldier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing the daily package evaluation, I will reply here when finished:

@nanosoldier runtests(ALL, isdaily = true, configuration=(rr=true,))

@vtjnash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nanosoldier runbenchmarks(ALL, isdaily = true)

@nanosoldier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@vtjnash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the ["union", "array", ("perf_sum4", *)] algorithm got a decent speedup. The rest seems like noise.

@maleadt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PkgEval run got canceled, so let's retry:

@nanosoldier runtests(isdaily = true)

@maleadt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nanosoldier runtests(isdaily = true)

@nanosoldier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

Please sign in to comment.