-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map, collect and similar for custom arrays #36106
Comments
Ref:
Collect only uses Line 628 in cfb9b55
Lines 643 to 645 in cfb9b55
Edit: Oh, except that Line 2151 in cfb9b55
|
Wait, my edit there isn't quite right either, because it's explicitly using Line 2151 in cfb9b55
Line 630 in cfb9b55
Lines 632 to 633 in cfb9b55
Lines 601 to 602 in cfb9b55
Why isn't that hitting |
Woops, you're right. So looking at the history I think I actually added these Maybe I could have defined Line 877 in cfb9b55
and passed to copyto_nonleaf! , which has to widen it when a CategoricalValue . But it doesn't use the Broadcasted object when calling similar :Line 1032 in cfb9b55
So in that case defining Base.similar(bc::Broadcasted...) isn't enough. Do you think that line could be changed to Base.similar(bc, promote_typejoin(T, typeof(val)), axes(dest)) ?
|
🏴☠️ |
Technically it's not type piracy, as CategoricalArrays own |
I've already mentioned it in #34478 that @mbauman linked, but it'd be nice if In an ideal world, using BangBang: append!!, collector, finish!
using MicroCollections: UndefVector
map(f, xs) = finish!(append!!(materializer(xs), (f(x) for x in xs)))
filter(f, xs) = finish!(append!!(materializer(xs), (x for x in xs if f(x))))
materializer(xs::AbstractVector) = collector(UndefVector(size(xs)), true) # unsafe = true
materializer(xs) = haslength(xs) ? collector(UndefVector(size(xs)), true) : collector()
materializer(::Tuple) = ()
materializer(::NamedTuple) = NamedTuple()
materializer(::AbstractDict) = error("nope")
|
By the way, why would you want to implement It would be nice if we can do it at least with For generic cases like |
Yeah, that's an optimization that I should definitely implement at some point. But I wanted to raise the general design issue, as it is relevant for mixed-types multiple-input The |
The approach I took in Transducers+BangBang is based on the first principle that you treat everything as monoid (-ish). It's actually very simple and, as a result, it is composable, works with immutables, and is usable as a building block of parallel computation. There are obviously tons of overlap in terms of functionality to what Base does since they are doing the same thing for basic cases. So, I understand that it looks like it's "reinventing the wheel." However, if you are going to extend and expose low-level
What about
|
"without reinventing the wheel" wasn't a criticism of BangBang, quite the contrary actually: I mean that it would to write custom implementations with less code duplication. |
Oh, I see. Sorry that I reacted rather strongly based on my miss-interpretation. |
The current design of
map
,collect
andsimilar
seems problematic for custom arrays which are associated with a custom element type, likeCategoricalArray
and itsCategoricalValue
type.In CategoricalArrays.jl, I would like
map(f, ::CategoricalArray)
to return aCategoricalArray
iff
returns onlyUnion{CategoricalValue, Missing}
values. This is natural in particular so thatmap(identity, ::CategoricalArray)
returns aCategoricalArray
.To achieve this, I could define
Base.map(f, ::CategoricalArray)
, but that would require duplicating tricky code from Base since it needs to handle eltype widening and so on. So I tried to definesimilar
so thatcollect
, thatmap
uses under the hood, returns aCategoricalArray
when appropriate. Butcollect
usessimilar(1:1, T, axes(itr))
, so I have to overridesimilar(::AbstractRange, ::Type{<:Union{CategoricalValue, Missing}})
. For consistency I also have to define similar methods forAbstractArray
,Array
,Vector
andMatrix
(due to ambiguities).Doing that has two consequences:
collect(::CategoricalArray)
always returns aCategoricalArray
. This makes sense actually sinceArray{CategoricalValue}
is an inefficient type. But that seems to go against the docstring forcollect
which says that it returns anArray
.getindex(::Array{<:CategoricalValue}, ::Array)
also returns aCategoricalArray
. This doesn't sound correct.This leads me to raise two questions/proposals:
collect
docstring be made less strict? It sounds useful to be able to collect the contents of an interator into the most natural/efficient array type. If one really wants anArray
better doArray(itr)
-- otherwisecollect
is redundant.collect
use a new system likesimilar(AbstractArray, T, axes(itr))
instead ofsimilar(1:1, T, axes(itr))
? That would allow specifying that the most appropriateAbstractArray{<:T}
is requested rather thanArray
. That would differ fromgetindex
which really wants the type of the input array. This could be introduced without breakage by having it fall back tosimilar(Array, ...)
.The text was updated successfully, but these errors were encountered: