Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplying Adjoint with Symmetric uses slow generic matmul #850

Open
olof3 opened this issue May 22, 2021 · 3 comments · May be fixed by #1191
Open

Multiplying Adjoint with Symmetric uses slow generic matmul #850

olof3 opened this issue May 22, 2021 · 3 comments · May be fixed by #1191
Labels
performance Must go faster

Comments

@olof3
Copy link
Contributor

olof3 commented May 22, 2021

Multiplying an Adjoint with a Symmetric matrix falls back to a very slow generic matmul.

n = 500
A = randn(n, n)
Q = Symmetric(randn(n,n))

@btime $A'*$Q
277.406 ms (2 allocations: 1.91 MiB)

Converting A' to Matrix makes it a lot faster

@btime Matrix($A')*$Q
  4.281 ms (4 allocations: 3.81 MiB)

The break-even poirnt for when converting A' to Matrix is faster is n = 5.

This seems a bit slow? Should this be fixed in Base or should one convert to Matrix explicitly?

Perhaps fallback conversion to Matrix should be used more widely when eltype(A) <: BlasFloat ?

@mcabbott
Copy link
Collaborator

@less mul!(similar(A), A, Q, true, false) leads you to BLAS.symm!('R', Q.uplo, true, Q.data, A, false, similar(A)). It doesn't look like that has a version which accepts transposed A.

Since A' * Q ≈ (Q' * A)', returning that would be one option. How surprising would it be for * to return an Adjoint matrix?

@dkarrasch dkarrasch added the performance Must go faster label May 28, 2021
@olof3
Copy link
Contributor Author

olof3 commented May 29, 2021

Since A' * Q ≈ (Q' * A)', returning that would be one option. How surprising would it be for * to return an Adjoint matrix?

Not, sure, a bit surprising, but also nice with some more performance. I'm primarily worried about avoiding generic matmul, though.

Here is another example with common types where generic matmul strikes again.

julia> n = 500; A = randn(n,n); B = rand(Bool, n, n); @btime $A*$B;
  301.147 ms (8 allocations: 1.91 MiB)

@mcabbott
Copy link
Collaborator

Some eltype mismatches are copied to avoid this, e.g. there's a method (*)(A::StridedMatrix{<:BlasReal}, B::StridedMatrix{<:BlasFloat}) which helps the first of these, maybe you could extend it to capture Bool without producing ambiguities?

julia> A = rand(Float32,100,100); B = rand(100,100);

julia> @btime $A * $B;  # special method which copies
  36.000 μs (4 allocations: 156.41 KiB)

julia> @btime Float64.($A) * $B;  # copy by hand, same speed & same alloc.
  34.875 μs (4 allocations: 156.41 KiB)

julia> A = rand(Bool,100,100); B = rand(100,100);

julia> @btime $A * $B;  # generic matmul
  490.875 μs (8 allocations: 78.53 KiB)

julia> @btime Float64.($A) * $B;
  38.500 μs (4 allocations: 156.41 KiB)

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
@dkarrasch dkarrasch marked this as a duplicate of #1189 Feb 3, 2025
@dkarrasch dkarrasch linked a pull request Feb 3, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
3 participants