-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sorting iterators #46104
Support sorting iterators #46104
Conversation
Is the specific |
Without it, |
Outside of the |
See also #16853 for some historical context |
Thanks for the link @ararslan! I hadn't noticed the problematic error messages for My thoughts on the past conversation:
There is a precedent of returning the most specific type that can be reasonably sorted, which may not be the input type. For example, ranges sorted in forward and reverse orders produce ranges;
I agree. I feel called to implement a different sorting algorithm for every tuple length that returns a tuple, but for general tuples sorting would not be type stable, and sorting Most people want
This seems pretty sensible to me. The umlaut stays with the a, for example. @ScottPJones or others, do you have an example of a grapheme that is not represented as a single Char? julia> sort("亀 mäke ☃🙂 bc. she 💞🐢s")
21-element Vector{Char}:
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
'.': ASCII/Unicode U+002E (category Po: Punctuation, other)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
'h': ASCII/Unicode U+0068 (category Ll: Letter, lowercase)
'k': ASCII/Unicode U+006B (category Ll: Letter, lowercase)
'm': ASCII/Unicode U+006D (category Ll: Letter, lowercase)
's': ASCII/Unicode U+0073 (category Ll: Letter, lowercase)
's': ASCII/Unicode U+0073 (category Ll: Letter, lowercase)
'ä': Unicode U+00E4 (category Ll: Letter, lowercase)
'☃': Unicode U+2603 (category So: Symbol, other)
'亀': Unicode U+4E80 (category Lo: Letter, other)
'🐢': Unicode U+1F422 (category So: Symbol, other)
'💞': Unicode U+1F49E (category So: Symbol, other)
'🙂': Unicode U+1F642 (category So: Symbol, other) In summary, I stand by this PR as is (with better error messages) but can see an argument for making strings throw a MethodError instead. |
@gbaraldi, in light of the historical context from @ararslan's PR, do you still think this PR looks good as is? |
That's a tough one, I can see someone calling sort on a string as a way to make it alphabetical order so getting a string back makes sense. The first tutorial for sorting a string (https://www.geeksforgeeks.org/sorting-of-strings-in-julia/) collects it, sorts and then joins, which sounds a bit roundabout. |
Can we just keep this PR but add an error on
with a message like ambiguity or something? |
Seems reasonable. For almost all iterables this PR is clean; no reason to sweat on the String case unless someone really wants to sort strings: julia> sort("hello world")
ERROR: ArgumentError: sort(x::AbstractString) is ambiguous. Use sort!(collect(x)) or String(sort!(collect(x))) instead. |
How do folks feel about this PR now that we throw an error on |
base/sort.jl
Outdated
@@ -992,7 +994,13 @@ julia> v | |||
2 | |||
``` | |||
""" | |||
sort(v::AbstractVector; kws...) = sort!(copymutable(v); kws...) | |||
function sort(v; kws...) | |||
IteratorSize(v) == HasShape{0}() && throw(ArgumentError("$v cannot be sorted")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't infinite (and maybe unknown) also be errors? I see that people probably want an error for sort(1)
, so ok, but kind of strange that that of all things would be disallowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unknown sizes should be supported
sort(w for w in ["karma", "infracostalis", "postencephalon", "Elia"] if all(islowercase, w))
For almost all infinite iterators, we already throw inside copymutable
, but perhaps someone could define an infinite iterator that can be copymutable
ed, returning a sparse vector of infinite length. If they also define a sorting method for that sparse representation, then it would be a mistake to throw on infinite iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But sort
is an eager operation - surely if someone wants to sort their infinite iterator lazily, they have to implement it either way and not just rely on the generic fallback that's supposed to collect all elements of the iterator? Wouldn't it be better UX to throw early and make them aware where the actual problem lies, instead of having them chase down an (incidental) error from copymutable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For generic iterables, copymutable
collect
s, but it only needs to make a mutable copy, not necessarily instantiate every element. For example, I believe this is allowed:
struct WithTrailingZerosIterable
head::Int
tail::Union{Nothing, WithTrailingZerosIterable}
end
Base.iterate(x::WithTrailingZerosIterable) = iterate(x, x)
Base.iterate(::WithTrailingZerosIterable, x::WithTrailingZerosIterable) = x.head, x.tail
Base.iterate(::WithTrailingZerosIterable, ::Nothing) = 0, Nothing
Base.IteratorSize(::Type{<:WithTrailingZerosIterable}) = Base.SizeInfinite()
struct WithTrailingZerosVector <: AbstractVector{Int}
data::Vector{Int}
end
Base.size(x::WithTrailingZerosVector) = (typemax(Int),)
Base.getindex(x::WithTrailingZerosVector, i::Int) = i <= length(x.data) ? x.data[i] : 0
function Base.show(io::IO, ::MIME"text/plain", x::WithTrailingZerosVector)
println(io, "infinite-element WithTrailingZerosVector:")
for i in 1:5
x[i] >= 0 && print(io, ' ')
println(io, x[i])
end
println(" ⋮")
end
function Base.collect(x::WithTrailingZerosIterable)
data = Int[]
while x !== nothing
push!(data, x.head)
x = x.tail
end
WithTrailingZerosVector(data)
end
function Base.sort!(x::WithTrailingZerosVector)
filter!(x -> x < 0, x.data)
sort!(x.data)
return x
end
const X = WithTrailingZerosIterable(1, WithTrailingZerosIterable(-2, WithTrailingZerosIterable(3, nothing)))
display(collect(X))
#=
infinite-element WithTrailingZerosVector:
1
-2
3
0
0
⋮
=#
display(sort!(Base.copymutable(X)))
#=
infinite-element WithTrailingZerosVector:
-2
0
0
0
0
⋮
=#
Nevertheless, we could put this in the same camp as sort(::AbstractString)
: perhaps it might make sense in some way but for now just throw because it is rarely a good idea to sort an infinite iterable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing on an infinite iterator seems like the sensible choice. I've made the change.
Triage approves. |
Sorry, I believe this doesn't work, because:
yes, flag emojis can be long, at least 5
I chose a flag at random, from here: i.e. the transgender flag, and we do not want to piss off people, screwing it up.
I believe you want to get the same order back. I'm not completely sure it's sensitive to the order, though pretty confident. This doesn't look right when pasting into the REPL (a minor unrelated issue), so I'm not sure how to easily check after sorting. I didn't see this sooner, hope I was of help, and this can be fixed before the feature freeze. I knew knowing obscure Uncode (flag) emoji trivia would some day come in handy. Ok, not really. Such stuff was in the lead of the Wikipeda Unicode article, where I put it, at some point, before someone shortened it to a short summary:
I put such stuff in the lead, not (just) because it's trivia, but also to show people Unicode is in practice variable-length, even UTF-16 (and to be avoided, some think it's better since fixed-length, but that hasn't been true since UCS-2). |
This reverts commit 84bf67c.
Could I ask why there is a discrepancy here, so that I still can't get a sorted list of keys, but can take the keys from a sorted dictionary: julia> sort(keys(Dict(1=>2)))
ERROR: MethodError: no method matching sort!(::Set{Int64})
Closest candidates are:
sort!(::AbstractUnitRange)
@ Base range.jl:1392
sort!(::AbstractVector, ::Base.Sort.Algorithm, ::Base.Order.Ordering)
@ Base sort.jl:2245
sort!(::AbstractVector{T}, ::Integer, ::Integer, ::Base.Sort.MergeSortAlg, ::Base.Order.Ordering, ::Vector{T}) where T
@ Base sort.jl:2172
...
Stacktrace:
[1] sort(v::Base.KeySet{Int64, Dict{Int64, Int64}}; kws::@Kwargs{})
@ Base.Sort ./sort.jl:1503
[2] sort(v::Base.KeySet{Int64, Dict{Int64, Int64}})
@ Base.Sort ./sort.jl:1499
[3] top-level scope
@ REPL[7]:1
julia> first.(sort(Dict(1=>2)))
1-element Vector{Int64}:
1 |
`copymutable` is only defined to return an array for abstract arrays, but that is only what this method is never called with. For other types, it has a default of `collect`, but can be changed by other types (such as AbstractSet) to do something different. Refs #46104
For folks following along, the answer is because of #52086 (comment) |
Two chagnes wrapped into one `Base.copymutable` => `Base.copymutable` & `collect` and `Base.copymutable` => `similar` & words. Followup for #52086 and #46104; also fixes #51932 (though we still may want to make `copymutable` public at some point) --------- Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Two chagnes wrapped into one `Base.copymutable` => `Base.copymutable` & `collect` and `Base.copymutable` => `similar` & words. Followup for #52086 and #46104; also fixes #51932 (though we still may want to make `copymutable` public at some point) --------- Co-authored-by: Jameson Nash <vtjnash@gmail.com> (cherry picked from commit 42c088b)
This reverts commit 84bf67c.
Backported PRs: - [x] #51213 <!-- Wait for other threads to finish compiling before exiting --> - [x] #51520 <!-- Make allocopt respect the GC verifier rules with non usual address spaces --> - [x] #51598 <!-- Use a simple error when reporting sysimg load failures. --> - [x] #51757 <!-- fix parallel peakflop usage --> - [x] #51781 <!-- Don't make pkgimages global editable --> - [x] #51848 <!-- allow finalizers to take any locks and yield during exit --> - [x] #51847 <!-- add missing wait during Timer and AsyncCondition close --> - [x] #50824 <!-- Add some aliasing warnings to docstrings for mutating functions in Base --> - [x] #51885 <!-- remove chmodding the pkgimages --> - [x] #50207 <!-- [devdocs] Improve documentation about building external forks of LLVM --> - [x] #51967 <!-- further fix to the new promoting method for AbstractDateTime subtraction --> - [x] #51980 <!-- macroexpand: handle const/atomic struct fields correctly --> - [x] #51995 <!-- [Artifacts] Pass artifacts dictionary to `ensure_artifact_installed` dispatch --> - [x] #52098 <!-- Fix errors in `sort` docstring --> - [x] #52136 <!-- Bump JuliaSyntax to 0.4.7 --> - [x] #52140 <!-- Make c func `abspath` consistent on Windows. Fix tracking path conversion. --> - [x] #52009 <!-- fix completion that resulted in startpos of 0 for `\\ --> - [x] #52192 <!-- cap the number of GC threads to number of cpu cores --> - [x] #52206 <!-- Make have_fma consistent between interpreter and compiled --> - [x] #52027 <!-- fix Unicode.julia_chartransform for Julia 1.10 --> - [x] #52217 <!-- More helpful error message for empty `cpu_target` in `Base.julia_cmd` --> - [x] #51371 <!-- Memoize `cwstring` when used for env lookup / modification on Windows --> - [x] #52214 <!-- Turn Method Overwritten Error into a PrecompileError -- turning off caching --> - [x] #51895 <!-- Devdocs on fixing precompile hangs, take 2 --> - [x] #51596 <!-- Reland "Don't mark nonlocal symbols as hidden"" --> - [x] #51834 <!-- [REPLCompletions] allow symbol completions within incomplete macrocall expression --> - [x] #52010 <!-- Revert "Support sorting iterators (#46104)" --> - [x] #51430 <!-- add support for async backtraces of Tasks on any thread --> - [x] #51471 <!-- Fix segfault if root task is NULL --> - [x] #52194 <!-- Fix multiversioning issues caused by the parallel llvm work --> - [x] #51035 <!-- refactor GC scanning code to reflect jl_binding_t are now first class --> - [x] #52030 <!-- Bump Statistics --> - [x] #52189 <!-- codegen: ensure i1 bool is widened to i8 before storing --> - [x] #52228 <!-- Widen diagonal var during `Type` unwrapping in `instanceof_tfunc` --> - [x] #52182 <!-- jitlayers: replace sharedbytes intern pool with one that respects alignment --> Contains multiple commits, manual intervention needed: - [ ] #51092 <!-- inference: fix bad effects for recursion --> Non-merged PRs with backport label: - [ ] #52196 <!-- Fix creating custom log level macros --> - [ ] #52170 <!-- fix invalidations related to `ismutable` --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
Why "throw on other Tuples"? Seems very unnatural to limit to |
IMO the only natural choice is to:
|
Because sorting non-homogonous tuples is not type stable. |
That's just the nature of a It also seems to be a precedent, I don't think there are any other such artificial restrictions to |
I'm just reading this PR superficially, and I'm trying to wrap my head around this. What's different about a non-homogeneous tuple? Wouldn't it be possible to support eg. |
That's already supported, and has been for a long time. We're not discussing sorting a collection whose elements are tuples, rather the collection is a tuple. |
All right, I was about sorting iterators of tuples ( |
Closes #38328