Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Gram matrices to speed up opnorm #1185

Open
njericha opened this issue Jan 27, 2025 · 8 comments
Open

Using Gram matrices to speed up opnorm #1185

njericha opened this issue Jan 27, 2025 · 8 comments

Comments

@njericha
Copy link

In Julia version 1.11.2, it appears as though I can speed up calculation of opnorm by taking advantage of the identity

$$ \left\lVert A^\top A \right\rVert_{op} = \left\lVert A A^\top \right\rVert_{op} = \left\lVert A \right\rVert_{op}^2. $$

See https://en.wikipedia.org/wiki/Operator_norm#Operators_on_a_Hilbert_space .

In the following test, we can speed up the calculation of the operator norm for tall matrices $A$ by using $\sqrt{\left\lVert A^\top A \right\rVert_{op}}$ (and presumably wide matrices $A$ by using $\sqrt{\left\lVert AA^\top \right\rVert_{op}}$). Note that wrapping the Gram matrix with Symmetric also reduces the memory needed.

using Random: randn
using LinearAlgebra: opnorm, Symmetric
using BenchmarkTools

fopnorm1(X) = opnorm(X)
fopnorm2(X) = sqrt(opnorm(X'X))
fopnorm3(X) = sqrt(opnorm(Symmetric(X'X)))

dimensions = (10000, 10)

b = @benchmark fopnorm1(X) setup=(X=randn(dimensions))
display(b)
b = @benchmark fopnorm2(X) setup=(X=randn(dimensions))
display(b)
b = @benchmark fopnorm3(X) setup=(X=randn(dimensions))
display(b)
BenchmarkTools.Trial: 6522 samples with 1 evaluation per sample.
 Range (min … max):  391.800 μs …   5.152 ms  ┊ GC (min … max): 0.00% … 84.40%
 Time  (median):     463.000 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   503.166 μs ± 153.189 μs  ┊ GC (mean ± σ):  4.63% ± 10.26%

   ▄▆▇██▇▇▆▄▃▂▂▁▁▁ ▁▂▂▁▁  ▁                                     ▂
  ▇█████████████████████████▇▇▇▆▅▆▆▅▄▆▃▅▃▅▄▄▄▁▃▄▄▅▅▆▇▇█▇▇▇██▇██ █
  392 μs        Histogram: log(frequency) by time       1.14 ms <

 Memory estimate: 787.58 KiB, allocs estimate: 13.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  82.300 μs … 483.100 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     97.400 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   99.002 μs ±  16.227 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▄▅▆▇▆▆▅▅▅▄▆▅█▅▆▅▃▃▂▂▁ ▁                                     ▂
  ███████████████████████████▆▇▇██▆█▅▇▇▇▇▆▇▇▇▇▇▆▆▇█▇█▇▆▇▇▇▅▅▄▅ █
  82.3 μs       Histogram: log(frequency) by time       163 μs <

 Memory estimate: 8.09 KiB, allocs estimate: 14.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  79.000 μs … 376.400 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     95.700 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   95.234 μs ±  12.877 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                █
  ▂▂▃▄▂▄▃▃▄▃▃▂▆▃█▅▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  79 μs           Histogram: frequency by time          149 μs <

 Memory estimate: 6.08 KiB, allocs estimate: 19.

This trick also works for square matrices, although with possibly more memory requirements.

dimensions = (100, 100)

b = @benchmark fopnorm1(X) setup=(X=randn(dimensions))
display(b)
b = @benchmark fopnorm2(X) setup=(X=randn(dimensions))
display(b)
b = @benchmark fopnorm3(X) setup=(X=randn(dimensions))
display(b)
BenchmarkTools.Trial: 7594 samples with 1 evaluation per sample.
 Range (min … max):  531.600 μs …   5.870 ms  ┊ GC (min … max): 0.00% … 88.93%
 Time  (median):     597.600 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   623.509 μs ± 133.170 μs  ┊ GC (mean ± σ):  1.44% ±  5.50%

     ▂█▇▃
  ▂▂▄█████▇▅▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▁▁▂▁▁▂▂▂▁▁▂▂▂▂▂ ▃
  532 μs           Histogram: frequency by time         1.21 ms <

 Memory estimate: 137.97 KiB, allocs estimate: 14.
BenchmarkTools.Trial: 6847 samples with 1 evaluation per sample.
 Range (min … max):  574.900 μs …   8.099 ms  ┊ GC (min … max): 0.00% … 91.36%
 Time  (median):     669.000 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   695.524 μs ± 163.002 μs  ┊ GC (mean ± σ):  2.15% ±  6.82%

     ▃▄▆█▇▄
  ▂▃████████▆▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  575 μs           Histogram: frequency by time         1.41 ms <

 Memory estimate: 216.17 KiB, allocs estimate: 17.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  308.100 μs …   5.089 ms  ┊ GC (min … max): 0.00% … 90.75%
 Time  (median):     366.400 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   383.650 μs ± 123.907 μs  ┊ GC (mean ± σ):  2.79% ±  7.16%

    ▃▄█▄
  ▄▇████▇▆▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▂▁▁▂▂▂▂▂▂▂▂ ▃
  308 μs           Histogram: frequency by time          1.1 ms <

 Memory estimate: 194.64 KiB, allocs estimate: 24.

More testing might be needed to see if this performance boost still works for other types of arrays like sparce matrices, and element types like complex. But it may be worth redefining opnorm if this the performance can be reliably achieved.

@ajinkya-k
Copy link
Contributor

ajinkya-k commented Jan 29, 2025

this would only be true for opnorm(X, p = 2) right? The opnorm function also supports other values of p namely p=1 and p=Inf

@njericha
Copy link
Author

this would only be true for opnorm(X, p = 2) right? The opnorm function also supports other values of p namely p=1 and p=Inf

Yes. The opnorm for other ps are arguably easier to compute https://en.wikipedia.org/wiki/Operator_norm#Table_of_common_operator_norms

@dkarrasch
Copy link
Member

One potential concern could be accuracy. If you insert additional steps, then each comes with its own condition number and adds to the overall error accumulation. What comes to my mind is solving tall linear systems Ax = b, which you could solve via A^T Ax=A^T b, but we don't do that, because the multiplication A^T A may be poorly conditioned, and the final result will be worse than solving it with QR. I'm not sure this transfers directly, but just a thought.

@andreasnoack
Copy link
Member

andreasnoack commented Jan 30, 2025

I think the extra error is less of a concern here. The use of A'A ruins the smaller singular values but doesn't affect the larger values much.

julia> Q, _ = qr(randn(100,100));

julia> B = Q*Diagonal(exp10.(range(-200, stop=0, length = size(Q, 1))))*Q';

julia> sqrt(opnorm(B'B))
0.9999999999999997

I've tried tried a few things and couldn't make this version break down. In contrast, this approach would be fatal for cond

julia> sqrt(cond(B'B))
1.259066867835988e10

@ajinkya-k
Copy link
Contributor

I ran this:

lopn = zeros(0)
lopg = zeros(0)

@showprogress for p in 100:100:10000
    dimensions = (10000, p)
    X = randn(dimensions) 
    append!(lopn, opnorm(X))
    append!(lopg, fopnorm3(X))
end

and this is what i got for the absolute difference between the two (i cut off the loop around 45 because it was taking too long):

Image

@stevengj
Copy link
Member

stevengj commented Jan 30, 2025

If you want to run this kind of empirical test, I would use X = randn(Float32, dimensions) to do the calculation in single precision, and compare to opnorm(Float64.(X)) as the "exact" values. (But the problem with testing on randn matrices is that they are all well-conditioned, so they aren't representative of cases where $A^T A$ is potentially problematic.)

@ajinkya-k
Copy link
Contributor

ajinkya-k commented Jan 30, 2025

@stevengj Thanks for the insight! I will keep that in mind! What's a good library to get ill conditioned matrices? The only one I know is this: https://github.com/JuliaLinearAlgebra/MatrixDepot.jl

@stevengj
Copy link
Member

stevengj commented Jan 30, 2025

I've used this code in the past:

# generate a random (Haar-uniform) m×n real orthogonal-column matrix
# using algorithm adapted from https://arxiv.org/abs/math-ph/0609050
function randQ(m::Integer,n::Integer)
    m  n || throw(ArgumentError("matrix must be tall"))
    QR = qr(randn(m,n))
    return QR.Q * Diagonal(sign.(diag(QR.R)))
end

# random m×n matrix with condition number κ and power-law singular values.
function randcond(m::Integer,n::Integer, κ::Real)
    κ  1 || throw(ArgumentError("κ= should be ≥ 1"))
    r = min(m,n)
    σ = exp.(range(0, -log(κ), length=r))
    U = randQ(m,r)
    V = randQ(n,r)
    return U * Diagonal(σ) * V'
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants