Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayPartition violates the documentation of eltype #397

Open
bvdmitri opened this issue Jul 25, 2024 · 8 comments
Open

ArrayPartition violates the documentation of eltype #397

bvdmitri opened this issue Jul 25, 2024 · 8 comments
Assignees
Labels

Comments

@bvdmitri
Copy link
Contributor

bvdmitri commented Jul 25, 2024

Describe the bug 🐞

In some situations ArrayPartition violates the documentation of eltype:

eltype(type)

  Determine the type of the elements generated by iterating a collection...

But this is not always the case (see MWE below).

As I understand the ArrayPartition declares eltype as a type that all udnerlying elements can be promoted to.
This behaviour might be by design, but it has a real practical implication.
Perhaps can be fixed on ForwardDiff side by extra promoting, but this can have other weird edge cases.
At the very least this behaviour can be documented as desired?

Minimal Reproducible Example 👇

julia> v = ArrayPartition([ 0.0 ], [ 1 ])
([0.0], [1])

julia> for e in v
           @show e isa eltype(v)
       end
e isa eltype(v) = true
e isa eltype(v) = false # would expect `true` here

Environment (please complete the following information):

  • Output of using Pkg; Pkg.status()
(jl_ZYBD4c) pkg> st
Status `/private/var/folders/kf/1t2_wf7n49sgxmzzmmgsb1j40000gn/T/jl_ZYBD4c/Project.toml`
  [731186ca] RecursiveArrayTools v3.26.0
  • Output of versioninfo()
julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 10 × Apple M2 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)
@bvdmitri bvdmitri added the bug label Jul 25, 2024
@mcabbott
Copy link
Contributor

As I understand the ArrayPartition declares eltype as a type that all udnerlying elements can be promoted to.

I don't think this is quite true, although I'm not sure what the real rule is:

julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])

julia> eltype(ap)
Int64

julia> [typeof(x) for x in ap]
4-element Vector{DataType}:
 Vector{Int64} (alias for Array{Int64, 1})
 Int64
 Int64
 Tuple{Int64, Int64}

The simplest fix would be to declare a union eltype, possibly via Base.promote_typejoin. Alternatively, make getindex call convert(eltype(ap), val), although this will fail on read for the example above -- it would be better to fail on construction, if such things aren't supported.

@ChrisRackauckas
Copy link
Member

The reasoning behind this behavior is because there is a common use case with Unitful where a system of second order ODEs is split in two, but some of the values are positions while the others are velocities. This in fact is what started ArrayPartition and still tends to be it's most widely used case in the wild, just without Unitful. Because of this, ArrayPartition operations tend to be specialized to not require the global index. For example, broadcast will not use it, which is why there's not a performance hit for it's uses in SciML. The downside of course is that it cannot guarantee performance on the abstractvector form.

That said, the linear index is never particularly fast because it has branching to consider and this removes SIMD optimizations. So you pretty much only want to use it as a fallback anyways. That makes ArrayPartition an odd AbstractVector: it has all of the interface defined, but most functions like iterators have a faster dispatch by avoiding treating it like a Vector. Thus the Abstractvector piece is more for convenience than performance. The Array interface overloads then confirm this by setting traits to say this does not have fast linear indexing, telling downstream codes to use alternative code paths.

With all of that in mind, eltype is then somewhat not well-defined in this case, though what would make it interface compliant would be to ensure it is matching the iteration definition correctly. That may need a bit of a bug fix and a bit more testing which shouldn't be too hard. But, I would still stress that even if you get a concrete eltype from this, you still likely want to specialize functions as possible if you need performance, and most common array functions (any, all, etc.) already have these specializations defined

@bvdmitri
Copy link
Contributor Author

Everything makes sense, but I don’t think performance is a significant concern here. Linear indexing may not be the fastest, but that’s acceptable. If eltype is not frequently used and specialized dispatch is more common, it might be worthwhile to adjust the definition to be more compatible with AbstractVector.

Also the example provided from @mcabbott is really bad imo which is a gun waiting to blew up someones leg:

julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])

julia> eltype(ap)
Int64

julia> collect(ap)
ERROR: MethodError: Cannot `convert` an object of type Vector{Int64} to an object of type Int64

Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number
   @ Base number.jl:6
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:84
  convert(::Type{T}, ::Number) where T<:Number
   @ Base number.jl:7
  ...

Stacktrace:
  [1] setindex!(A::Vector{Int64}, x::Vector{Int64}, i1::Int64)
    @ Base ./array.jl:1021
  [2] setindex!

Interestingly enough, this works fine (why the previous one fails then?)

julia> ap = ArrayPartition([1], ["string"])
([1], ["string"])

julia> collect(ap)
2-element Vector{Any}:
 1
  "string"

julia> eltype(ap)
Any

@ChrisRackauckas
Copy link
Member

Okay I see where the confusion is coming from. ArrayPartition has functionality for being nested (like all array types in this package). In that case, the eltype is the eltype found on the bottom, and it walks using the bottom of the tree.

julia> ap = ArrayPartition((ArrayPartition([1,2]), ArrayPartition([3,4]),))
(ArrayPartition{Int64, Tuple{Vector{Int64}}}(([1, 2],)), ArrayPartition{Int64, Tuple{Vector{Int64}}}(([3, 4],)))

julia> eltype(ap)
Int64

julia> ap = ArrayPartition((ArrayPartition([1,2]), ArrayPartition([3.0,4]),))
(ArrayPartition{Int64, Tuple{Vector{Int64}}}(([1, 2],)), ArrayPartition{Float64, Tuple{Vector{Float64}}}(([3.0, 4.0],)))

julia> eltype(ap)
Float64

That is using recursive_bottom_eltype. The vector indexing is then defined on the bottom walk, so this would be length 4.

I think the eltype recursion is mixing up with tuples in there. It's this line of code:

https://github.com/SciML/RecursiveArrayTools.jl/blob/master/src/array_partition.jl#L35

I think the issue here is we need to more cleanly define what the bottom is: i.e. is ap[4] == 5 or ap[4] == (5,6)?

julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])

julia> ap[4]
(5, 6)

Currently it recurses the eltype definition there but does not recurse the vector definition, so that's the disconnect. I.e. should ap = ArrayPartition([[1,2]], [3,4], [ArrayPartition(5,6)]) give a different vector form? Currently this case is left undefined in ArrayPartition's definition so it does something but it doesn't do something sensible.

There's a few solutions I see for this:

  1. Change the indexing to implicitly recurse collections. This is naturally hard though because "what is a collection" can be hard to asess.
  2. Keep the indexing the same and fix the element type.
  3. Error if a collection is passed in as an element, telling people that if they wish to recurse collections they should make them ArrayPartitions.

It sounds like 2 would be the least breaking here, and what people would expect?

@bvdmitri
Copy link
Contributor Author

I believe that support for better recursive arrays/nesting is a separate issue and deserves its own discussion. This particular issue is about the discrepancy between the eltype declaration and the actual type generated by iterating over the collection. Specifically, the eltype declares a different type than what is produced during iteration. As demonstrated in the original example where eltype declares Float64, but iteration returns Int. Nested collections were just an extreme example of such discrepancy, but this issue occurs even without nested collections.

This violation of the assumption that iterating the collection returns elements of eltype can cause errors (or silently produce incorrect results??) in other libraries (e.g., ForwardDiff).

If this behavior is intentional, it should at least be documented. However, it might be a bug, and eltype should perhaps be Union{Float64, Int} instead or Real. Alternatively, the iteration (and getindex) could explicitly perform type conversion with convert.

@ChrisRackauckas
Copy link
Member

@oscardssmith do you have thoughts on this?

@oscardssmith
Copy link
Contributor

I think the simplest solution would be to do promotion on access or setindex. not fully sure what the implications would be though

@ChrisRackauckas
Copy link
Member

I think that's reasonable. Can you give it a shot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants