Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-associating matrix-chain multiplication yields unpredictable results #1043

Open
mikmoore opened this issue Dec 4, 2023 · 6 comments
Open

Comments

@mikmoore
Copy link
Contributor

mikmoore commented Dec 4, 2023

Version 1.9.4.

*(A, B::AbstractMatrix, C)
  A * B * C * D

  Chained multiplication of 3 or 4 matrices is done in the most efficient sequence, based on the sizes of the
  arrays. That is, the number of scalar multiplications needed for (A * B) * C (with 3 dense matrices) is
  compared to that for A * (B * C) to choose which of these to execute.

  If the last factor is a vector, or the first a transposed vector, then it is efficient to deal with these
  first. In particular x' * B * y means (x' * B) * y for an ordinary column-major B::Matrix. Unlike dot(x, B,
  y), this allocates an intermediate array.

  If the first or last factor is a number, this will be fused with the matrix multiplication, using 5-arg mul!.

  See also muladd, dot.

  │ Julia 1.7
  │
  │  These optimisations require at least Julia 1.7.

Obviously, this transformation slightly changes results for floating point numbers due to their mild non-associativity. However, it is also applied in contexts with mathematics that are decisively non-associative. For example:

julia> using Octonions

julia> x = Octonion(1,2,3,4,5,6,7,8); y = Octonion(1,-2,3,4,5,6,7,8); z = Octonion(1,2,-3,4,5,6,7,8);

julia> [x;;] * [y;;] * [z;;] # left-associative evaluation
1×1 Matrix{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -860, -1128, -1036, -1712)

julia> [x;;] * [y;;] * [z;;] * [1;] # trailing vector causes right-associativity
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -1148, -888, -1420, -1376)

We like to compose packages in Julia. Arbitrary re-assocation is incompatible with this. Since "standard" associativity is undefined, it becomes necessary for a "correct" package to explicitly force every association with parentheses. N-ary evaluation cannot be correctly applied without tight control of the context that ensures re-association is mathematically valid. The presence of re-association in any insufficiently-controlled context (like the above matrix-chain specialization) will lead to correctness issues across the ecosystem.

Spiritually related to JuliaLang/julia#52333.

@mikmoore mikmoore changed the title Optimizing matrix-chain multiplication yields wrong results Re-associating matrix-chain multiplication yields unpredictable results Dec 4, 2023
@dkarrasch dkarrasch added the correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Dec 5, 2023
@thofma
Copy link

thofma commented Dec 5, 2023

Why is this a correctness bug? Writing A * B * C is ambiguous if the operation is not associative, so returning either is fine. Doesn't this fall under the category of undefined behaviour?

@antoine-levitt
Copy link
Collaborator

This method could be specialized to Real and Complex. It would lose some extensibility for user-defined associative types that are not real or complex, but are there any?

@thofma
Copy link

thofma commented Dec 5, 2023

Yes, quaternions for example.

@adienes
Copy link

adienes commented Dec 5, 2023

this is not necessarily what Julia should nor, nor is it necessarily a reasonable assumption

but purely as a matter of practicality, I suspect most users would expect that in the absence of parens, a * b * c will evaluate left-to-right as (a * b) * c

maybe an isassociative trait is warranted?

@mikmoore
Copy link
Contributor Author

mikmoore commented Dec 5, 2023

Writing A * B * C is ambiguous if the operation is not associative

Is this really behavior we want to have undefined? If you go by https://docs.julialang.org/en/v1/manual/mathematical-operations/#Operator-Precedence-and-Associativity then * should be left-associative "by default" but is actually non-associative. What use is a default if it isn't even clear when it applies? If you go by

help?> *
search: *

<lines omitted>

  *(A, B::AbstractMatrix, C)
  A * B * C * D

  Chained multiplication of 3 or 4 matrices is done in the most efficient sequence, based on the sizes of the
  arrays. That is, the number of scalar multiplications needed for (A * B) * C (with 3 dense matrices) is
  compared to that for A * (B * C) to choose which of these to execute.

  If the last factor is a vector, or the first a transposed vector, then it is efficient to deal with these
  first. In particular x' * B * y means (x' * B) * y for an ordinary column-major B::Matrix. Unlike dot(x, B,
  y), this allocates an intermediate array.

  If the first or last factor is a number, this will be fused with the matrix multiplication, using 5-arg mul!.

  See also muladd, dot.

  │ Julia 1.7
  │
  │  These optimisations require at least Julia 1.7.

(note that this docstring does not appear in the online Julia docs, which should probably be resolved)
then the association is explicitly undefined.

I'm here to argue that an order that is undefined and results in mathematical errors (i.e., more than just roundoff) is something we should consider not having because it makes generic code very difficult to write. You cannot write any syntactically-chainable * (on purpose or accident) unless you absolutely know that re-association is fine for any input type that can reach that code. Most users will miss this out of ignorance but even the best will sometimes miss it out of forgetfulness. Due to the rarity of non-associative types, these are likely to be missed in tests.

Personally, I always use parentheses to express my chained matrix multiplication because I didn't even know about these specializations until this week (and they've only existed since v1.7 and don't appear in the online docs). I never found it particularly onerous and there are packages that handle this anyway (and much more generally) by introducing dedicated functions for re-associative * on matrices.

Another fun consequence of our special-casing of 3-4 argument chained matmul is this:

julia> [x;;] * [y;;] * [z;;] * [1;] # right-associative
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -1148, -888, -1420, -1376)

julia> [1;;] * [x;;] * [y;;] * [z;;] * [1;] # chain another matrix and it reverts to left-associative
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -860, -1128, -1036, -1712)

I find this behavior very difficult to defend.

@mcabbott
Copy link
Collaborator

mcabbott commented Dec 5, 2023

Maybe the first question is whether * is intended to be the multiplication for objects with a non-associative binary operation. Has this been discussed anywhere? (I don't mean the docs telling you about a fallback vararg method.) I suppose this is another kind of interface question.

The assumption that LinearAlgebra deals with associative multiplication also exists in matrix powers, and in dot, and perhaps elsewhere:

using Octonions
begin
  a = Octonion(rand(1:99, 8)...)
  M = [Octonion(rand(1:99, 8)...) for _ in 1:2, _ in 1:2]
  v = [Octonion(rand(1:99, 8)...) for _ in 1:2]
  w = [Octonion(rand(1:99, 8)...) for _ in 1:2]
end;

@assert M^3 == M*(M*M) != (M*M)*M            # power_by_squaring, since before Julia 1.0
@assert M^4 == (M^2) * (M^2) != ((M*M)*M)*M

@assert dot(v,M,w) == (v'*M)*w == v'*M*w != dot(v, M*w)  # disagrees with docstring, Julia 1.4
@assert dot(v,M',w) == v'*(M'*w) == dot(v, M'*w)         # calls adjoint(dot(w, M.parent, v))

[Edit, I misunderstood a comment above]. Quaternions are associative (but non-commutative). The methods for fused * should all be restricted to Union{Real,Complex} when they commute arguments, e.g. @which a*M*v is the fallback not the fused one. (Arrays of matrices are the obvious way to get non-commutative elements.)

What examples are there besides Octonions? Are there some which are useful in the wild? Useful to feed into generic code? (We have a vector cross product × which is non-associative. Lie algebras are useful & non-associative, but surely nobody would write their binary operaion as a*b.)

@dkarrasch dkarrasch removed the correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Dec 6, 2023
@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants