Re-associating matrix-chain multiplication yields unpredictable results #1043

mikmoore · 2023-12-04T17:50:52Z

Version 1.9.4.

*(A, B::AbstractMatrix, C)
  A * B * C * D

  Chained multiplication of 3 or 4 matrices is done in the most efficient sequence, based on the sizes of the
  arrays. That is, the number of scalar multiplications needed for (A * B) * C (with 3 dense matrices) is
  compared to that for A * (B * C) to choose which of these to execute.

  If the last factor is a vector, or the first a transposed vector, then it is efficient to deal with these
  first. In particular x' * B * y means (x' * B) * y for an ordinary column-major B::Matrix. Unlike dot(x, B,
  y), this allocates an intermediate array.

  If the first or last factor is a number, this will be fused with the matrix multiplication, using 5-arg mul!.

  See also muladd, dot.

  │ Julia 1.7
  │
  │  These optimisations require at least Julia 1.7.

Obviously, this transformation slightly changes results for floating point numbers due to their mild non-associativity. However, it is also applied in contexts with mathematics that are decisively non-associative. For example:

julia> using Octonions

julia> x = Octonion(1,2,3,4,5,6,7,8); y = Octonion(1,-2,3,4,5,6,7,8); z = Octonion(1,2,-3,4,5,6,7,8);

julia> [x;;] * [y;;] * [z;;] # left-associative evaluation
1×1 Matrix{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -860, -1128, -1036, -1712)

julia> [x;;] * [y;;] * [z;;] * [1;] # trailing vector causes right-associativity
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -1148, -888, -1420, -1376)

We like to compose packages in Julia. Arbitrary re-assocation is incompatible with this. Since "standard" associativity is undefined, it becomes necessary for a "correct" package to explicitly force every association with parentheses. N-ary evaluation cannot be correctly applied without tight control of the context that ensures re-association is mathematically valid. The presence of re-association in any insufficiently-controlled context (like the above matrix-chain specialization) will lead to correctness issues across the ecosystem.

Spiritually related to JuliaLang/julia#52333.

The text was updated successfully, but these errors were encountered:

thofma · 2023-12-05T15:36:17Z

Why is this a correctness bug? Writing A * B * C is ambiguous if the operation is not associative, so returning either is fine. Doesn't this fall under the category of undefined behaviour?

antoine-levitt · 2023-12-05T16:06:46Z

This method could be specialized to Real and Complex. It would lose some extensibility for user-defined associative types that are not real or complex, but are there any?

thofma · 2023-12-05T16:54:45Z

Yes, quaternions for example.

adienes · 2023-12-05T17:18:01Z

this is not necessarily what Julia should nor, nor is it necessarily a reasonable assumption

but purely as a matter of practicality, I suspect most users would expect that in the absence of parens, a * b * c will evaluate left-to-right as (a * b) * c

maybe an isassociative trait is warranted?

mikmoore · 2023-12-05T17:41:03Z

Writing A * B * C is ambiguous if the operation is not associative

Is this really behavior we want to have undefined? If you go by https://docs.julialang.org/en/v1/manual/mathematical-operations/#Operator-Precedence-and-Associativity then * should be left-associative "by default" but is actually non-associative. What use is a default if it isn't even clear when it applies? If you go by

help?> *
search: *

<lines omitted>

  *(A, B::AbstractMatrix, C)
  A * B * C * D

  Chained multiplication of 3 or 4 matrices is done in the most efficient sequence, based on the sizes of the
  arrays. That is, the number of scalar multiplications needed for (A * B) * C (with 3 dense matrices) is
  compared to that for A * (B * C) to choose which of these to execute.

  If the last factor is a vector, or the first a transposed vector, then it is efficient to deal with these
  first. In particular x' * B * y means (x' * B) * y for an ordinary column-major B::Matrix. Unlike dot(x, B,
  y), this allocates an intermediate array.

  If the first or last factor is a number, this will be fused with the matrix multiplication, using 5-arg mul!.

  See also muladd, dot.

  │ Julia 1.7
  │
  │  These optimisations require at least Julia 1.7.

(note that this docstring does not appear in the online Julia docs, which should probably be resolved)
then the association is explicitly undefined.

I'm here to argue that an order that is undefined and results in mathematical errors (i.e., more than just roundoff) is something we should consider not having because it makes generic code very difficult to write. You cannot write any syntactically-chainable * (on purpose or accident) unless you absolutely know that re-association is fine for any input type that can reach that code. Most users will miss this out of ignorance but even the best will sometimes miss it out of forgetfulness. Due to the rarity of non-associative types, these are likely to be missed in tests.

Personally, I always use parentheses to express my chained matrix multiplication because I didn't even know about these specializations until this week (and they've only existed since v1.7 and don't appear in the online docs). I never found it particularly onerous and there are packages that handle this anyway (and much more generally) by introducing dedicated functions for re-associative * on matrices.

Another fun consequence of our special-casing of 3-4 argument chained matmul is this:

julia> [x;;] * [y;;] * [z;;] * [1;] # right-associative
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -1148, -888, -1420, -1376)

julia> [1;;] * [x;;] * [y;;] * [z;;] * [1;] # chain another matrix and it reverts to left-associative
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -860, -1128, -1036, -1712)

I find this behavior very difficult to defend.

mcabbott · 2023-12-05T19:53:56Z

Maybe the first question is whether * is intended to be the multiplication for objects with a non-associative binary operation. Has this been discussed anywhere? (I don't mean the docs telling you about a fallback vararg method.) I suppose this is another kind of interface question.

The assumption that LinearAlgebra deals with associative multiplication also exists in matrix powers, and in dot, and perhaps elsewhere:

using Octonions
begin
  a = Octonion(rand(1:99, 8)...)
  M = [Octonion(rand(1:99, 8)...) for _ in 1:2, _ in 1:2]
  v = [Octonion(rand(1:99, 8)...) for _ in 1:2]
  w = [Octonion(rand(1:99, 8)...) for _ in 1:2]
end;

@assert M^3 == M*(M*M) != (M*M)*M            # power_by_squaring, since before Julia 1.0
@assert M^4 == (M^2) * (M^2) != ((M*M)*M)*M

@assert dot(v,M,w) == (v'*M)*w == v'*M*w != dot(v, M*w)  # disagrees with docstring, Julia 1.4
@assert dot(v,M',w) == v'*(M'*w) == dot(v, M'*w)         # calls adjoint(dot(w, M.parent, v))

[Edit, I misunderstood a comment above]. Quaternions are associative (but non-commutative). The methods for fused * should all be restricted to Union{Real,Complex} when they commute arguments, e.g. @which a*M*v is the fallback not the fused one. (Arrays of matrices are the obvious way to get non-commutative elements.)

What examples are there besides Octonions? Are there some which are useful in the wild? Useful to feed into generic code? (We have a vector cross product × which is non-associative. Lie algebras are useful & non-associative, but surely nobody would write their binary operaion as a*b.)

mikmoore changed the title ~~Optimizing matrix-chain multiplication yields wrong results~~ Re-associating matrix-chain multiplication yields unpredictable results Dec 4, 2023

dkarrasch added the correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Dec 5, 2023

dkarrasch removed the correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Dec 6, 2023

mikmoore mentioned this issue May 13, 2024

Parse 1 + 2 + 3 as +(1, +(2, 3)) JuliaLang/julia#34999

Open

KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-associating matrix-chain multiplication yields unpredictable results #1043

Re-associating matrix-chain multiplication yields unpredictable results #1043

mikmoore commented Dec 4, 2023

thofma commented Dec 5, 2023 •

edited

Loading

antoine-levitt commented Dec 5, 2023

thofma commented Dec 5, 2023

adienes commented Dec 5, 2023

mikmoore commented Dec 5, 2023 •

edited

Loading

mcabbott commented Dec 5, 2023 •

edited

Loading

Re-associating matrix-chain multiplication yields unpredictable results #1043

Re-associating matrix-chain multiplication yields unpredictable results #1043

Comments

mikmoore commented Dec 4, 2023

thofma commented Dec 5, 2023 • edited Loading

antoine-levitt commented Dec 5, 2023

thofma commented Dec 5, 2023

adienes commented Dec 5, 2023

mikmoore commented Dec 5, 2023 • edited Loading

mcabbott commented Dec 5, 2023 • edited Loading

thofma commented Dec 5, 2023 •

edited

Loading

mikmoore commented Dec 5, 2023 •

edited

Loading

mcabbott commented Dec 5, 2023 •

edited

Loading