ArrayPartition violates the documentation of eltype #397

bvdmitri · 2024-07-25T18:53:29Z

Describe the bug 🐞

In some situations ArrayPartition violates the documentation of eltype:

eltype(type)

  Determine the type of the elements generated by iterating a collection...

But this is not always the case (see MWE below).

As I understand the ArrayPartition declares eltype as a type that all udnerlying elements can be promoted to.
This behaviour might be by design, but it has a real practical implication.
Perhaps can be fixed on ForwardDiff side by extra promoting, but this can have other weird edge cases.
At the very least this behaviour can be documented as desired?

Minimal Reproducible Example 👇

julia> v = ArrayPartition([ 0.0 ], [ 1 ])
([0.0], [1])

julia> for e in v
           @show e isa eltype(v)
       end
e isa eltype(v) = true
e isa eltype(v) = false # would expect `true` here

Environment (please complete the following information):

Output of using Pkg; Pkg.status()

(jl_ZYBD4c) pkg> st
Status `/private/var/folders/kf/1t2_wf7n49sgxmzzmmgsb1j40000gn/T/jl_ZYBD4c/Project.toml`
  [731186ca] RecursiveArrayTools v3.26.0

Output of versioninfo()

julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 10 × Apple M2 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)

The text was updated successfully, but these errors were encountered:

mcabbott · 2024-07-25T19:36:32Z

As I understand the ArrayPartition declares eltype as a type that all udnerlying elements can be promoted to.

I don't think this is quite true, although I'm not sure what the real rule is:

julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])

julia> eltype(ap)
Int64

julia> [typeof(x) for x in ap]
4-element Vector{DataType}:
 Vector{Int64} (alias for Array{Int64, 1})
 Int64
 Int64
 Tuple{Int64, Int64}

The simplest fix would be to declare a union eltype, possibly via Base.promote_typejoin. Alternatively, make getindex call convert(eltype(ap), val), although this will fail on read for the example above -- it would be better to fail on construction, if such things aren't supported.

ChrisRackauckas · 2024-07-26T15:25:17Z

The reasoning behind this behavior is because there is a common use case with Unitful where a system of second order ODEs is split in two, but some of the values are positions while the others are velocities. This in fact is what started ArrayPartition and still tends to be it's most widely used case in the wild, just without Unitful. Because of this, ArrayPartition operations tend to be specialized to not require the global index. For example, broadcast will not use it, which is why there's not a performance hit for it's uses in SciML. The downside of course is that it cannot guarantee performance on the abstractvector form.

That said, the linear index is never particularly fast because it has branching to consider and this removes SIMD optimizations. So you pretty much only want to use it as a fallback anyways. That makes ArrayPartition an odd AbstractVector: it has all of the interface defined, but most functions like iterators have a faster dispatch by avoiding treating it like a Vector. Thus the Abstractvector piece is more for convenience than performance. The Array interface overloads then confirm this by setting traits to say this does not have fast linear indexing, telling downstream codes to use alternative code paths.

With all of that in mind, eltype is then somewhat not well-defined in this case, though what would make it interface compliant would be to ensure it is matching the iteration definition correctly. That may need a bit of a bug fix and a bit more testing which shouldn't be too hard. But, I would still stress that even if you get a concrete eltype from this, you still likely want to specialize functions as possible if you need performance, and most common array functions (any, all, etc.) already have these specializations defined

bvdmitri · 2024-07-26T17:37:50Z

Everything makes sense, but I don’t think performance is a significant concern here. Linear indexing may not be the fastest, but that’s acceptable. If eltype is not frequently used and specialized dispatch is more common, it might be worthwhile to adjust the definition to be more compatible with AbstractVector.

Also the example provided from @mcabbott is really bad imo which is a gun waiting to blew up someones leg:

julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])

julia> eltype(ap)
Int64

julia> collect(ap)
ERROR: MethodError: Cannot `convert` an object of type Vector{Int64} to an object of type Int64

Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number
   @ Base number.jl:6
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:84
  convert(::Type{T}, ::Number) where T<:Number
   @ Base number.jl:7
  ...

Stacktrace:
  [1] setindex!(A::Vector{Int64}, x::Vector{Int64}, i1::Int64)
    @ Base ./array.jl:1021
  [2] setindex!

Interestingly enough, this works fine (why the previous one fails then?)

julia> ap = ArrayPartition([1], ["string"])
([1], ["string"])

julia> collect(ap)
2-element Vector{Any}:
 1
  "string"

julia> eltype(ap)
Any

ChrisRackauckas · 2024-07-28T13:40:24Z

Okay I see where the confusion is coming from. ArrayPartition has functionality for being nested (like all array types in this package). In that case, the eltype is the eltype found on the bottom, and it walks using the bottom of the tree.

julia> ap = ArrayPartition((ArrayPartition([1,2]), ArrayPartition([3,4]),))
(ArrayPartition{Int64, Tuple{Vector{Int64}}}(([1, 2],)), ArrayPartition{Int64, Tuple{Vector{Int64}}}(([3, 4],)))

julia> eltype(ap)
Int64

julia> ap = ArrayPartition((ArrayPartition([1,2]), ArrayPartition([3.0,4]),))
(ArrayPartition{Int64, Tuple{Vector{Int64}}}(([1, 2],)), ArrayPartition{Float64, Tuple{Vector{Float64}}}(([3.0, 4.0],)))

julia> eltype(ap)
Float64

That is using recursive_bottom_eltype. The vector indexing is then defined on the bottom walk, so this would be length 4.

I think the eltype recursion is mixing up with tuples in there. It's this line of code:

https://github.com/SciML/RecursiveArrayTools.jl/blob/master/src/array_partition.jl#L35

I think the issue here is we need to more cleanly define what the bottom is: i.e. is ap[4] == 5 or ap[4] == (5,6)?

julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])

julia> ap[4]
(5, 6)

Currently it recurses the eltype definition there but does not recurse the vector definition, so that's the disconnect. I.e. should ap = ArrayPartition([[1,2]], [3,4], [ArrayPartition(5,6)]) give a different vector form? Currently this case is left undefined in ArrayPartition's definition so it does something but it doesn't do something sensible.

There's a few solutions I see for this:

Change the indexing to implicitly recurse collections. This is naturally hard though because "what is a collection" can be hard to asess.
Keep the indexing the same and fix the element type.
Error if a collection is passed in as an element, telling people that if they wish to recurse collections they should make them ArrayPartitions.

It sounds like 2 would be the least breaking here, and what people would expect?

bvdmitri · 2024-07-28T21:29:00Z

I believe that support for better recursive arrays/nesting is a separate issue and deserves its own discussion. This particular issue is about the discrepancy between the eltype declaration and the actual type generated by iterating over the collection. Specifically, the eltype declares a different type than what is produced during iteration. As demonstrated in the original example where eltype declares Float64, but iteration returns Int. Nested collections were just an extreme example of such discrepancy, but this issue occurs even without nested collections.

This violation of the assumption that iterating the collection returns elements of eltype can cause errors (or silently produce incorrect results??) in other libraries (e.g., ForwardDiff).

If this behavior is intentional, it should at least be documented. However, it might be a bug, and eltype should perhaps be Union{Float64, Int} instead or Real. Alternatively, the iteration (and getindex) could explicitly perform type conversion with convert.

ChrisRackauckas · 2024-10-25T05:25:37Z

@oscardssmith do you have thoughts on this?

oscardssmith · 2024-10-25T11:58:46Z

I think the simplest solution would be to do promotion on access or setindex. not fully sure what the implications would be though

ChrisRackauckas · 2024-10-27T15:11:02Z

I think that's reasonable. Can you give it a shot?

bvdmitri added the bug label Jul 25, 2024

ChrisRackauckas assigned jClugstor Sep 3, 2024

ChrisRackauckas assigned oscardssmith and unassigned jClugstor Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArrayPartition violates the documentation of eltype #397

ArrayPartition violates the documentation of eltype #397

bvdmitri commented Jul 25, 2024 •

edited

Loading

mcabbott commented Jul 25, 2024

ChrisRackauckas commented Jul 26, 2024

bvdmitri commented Jul 26, 2024

ChrisRackauckas commented Jul 28, 2024

bvdmitri commented Jul 28, 2024

ChrisRackauckas commented Oct 25, 2024

oscardssmith commented Oct 25, 2024

ChrisRackauckas commented Oct 27, 2024

ArrayPartition violates the documentation of eltype #397

ArrayPartition violates the documentation of eltype #397

Comments

bvdmitri commented Jul 25, 2024 • edited Loading

mcabbott commented Jul 25, 2024

ChrisRackauckas commented Jul 26, 2024

bvdmitri commented Jul 26, 2024

ChrisRackauckas commented Jul 28, 2024

bvdmitri commented Jul 28, 2024

ChrisRackauckas commented Oct 25, 2024

oscardssmith commented Oct 25, 2024

ChrisRackauckas commented Oct 27, 2024

bvdmitri commented Jul 25, 2024 •

edited

Loading