Skip to content

Add Angular Central Gaussian family #8

Open
@sethaxen

Description

@sethaxen

Definition

Suppose $z \sim \mathbb{F}\mathcal{N}_{n,k}(0, c \Sigma, I_k)$ is a matrix-normally distributed parameter in $\mathbb{F}^{n \times k}$, where $\Sigma$ is a positive-definite matrix in $\mathbb{F}^{n \times n}$ and $c>0$, and suppose $z=xp$ is the unique polar decomposition of $z$, where $x \in \mathrm{St}(n, k, \mathbb{F})$ and $p$ is a positive-definite matrix in $\mathbb{F}^{k \times k}$. Then $x$ follows the angular central Gaussian (ACG) distribution on $\mathrm{St}(n, k, \mathbb{F})$.

ACG was originally defined on the real sphere (and projective space) in 1 and generalized to the complex Sphere in 2 and the real Stiefel/Grassmann manifolds in 3. I believe 4 were the first to generalize to the complex Grassmann manifold. The generalization to quaternionic spaces here is I think new but follows naturally from the same derivations.

The ACG family has the density wrt the normalized invariant measure on the Grassmann and Stiefel manifolds of5

$$p(x; \Sigma) = c_{k,\mathbb{F}}(\Sigma)^{-1} |x^\mathrm{H}\Sigma^{-1} x|^{-n\mathrm{dim}_\mathbb{F}/2}$$

Note for $w \in \mathrm{U}(k, \mathbb{F})$

$$p(xw; \Sigma) = p(x; \Sigma),$$

so the distribution is invariant to right unitary transformations and is thus a distribution on the Grassmann manifold.

Note

Representation of points on the Grassmann manifold
Points on the Grassmann manifold can also be expressed as a unique projector matrix $y=xx'$, and in this representation, the density wrt the normalized invariant measure on the Grassmann manifold is written

$$p(y; \Sigma) = c_{k,\mathbb{F}}(\Sigma)^{-1} |y \Sigma^{-1} y|_+^{-n\mathrm{dim}_\mathbb{F}/2},$$

where $|\cdot|_+$ is the pseudodeterminant.
I haven't seen this covered in any of the papers, and for $n \gg k$ this is much less efficient to work with than the representation using Stiefel points.

ACG is mainly useful when one requires:

  1. efficient exact sampling OR
  2. efficient evaluation of the normalization constant (e.g. hierarchical Bayesian models where $\Sigma$ is also inferred)

Parameterizations

Scale

The scale parameterization (as used above) uses a positive-definite scale matrix $\Sigma$.
Note that $\Sigma$ is only unique up to a positive scalar, that is. $p(x; a\Sigma) = p(x; \Sigma)$ for all $a > 0$.
Therefore, the useful information in $\Sigma$ is the principal axes and their relative scales.
Furthermore, when $\Sigma=aI_n$, ACG reduces to the normalized invariant measure on the manifold.

$\Sigma$ is interesting in its own right. In some domains it is called the scattering matrix or Tyler's transformation matrix.

Precision

The precision parameterization $P=\Sigma^{-1}$ is more efficient for density evaluation because one can avoid the explicit matrix inversion.

Properties

Closure

For any $A \in \mathrm{U}(n, \mathbb{F})$, we have that if $x \sim \mathrm{ACG}(\Sigma)$, then $Ax \sim \mathrm{ACG}(A \Sigma A^\mathrm{H})$, so the family is closed under left-unitary actions.

Normalization constant

The normalization constant in the two parameterizations is

  • $c_{k,\mathbb{F}}(\Sigma) = |\Sigma|^{k\mathrm{dim}_\mathbb{F}/2}$
  • $c_{k,\mathbb{F}}(P) = |P|^{-k\mathrm{dim}_\mathbb{F}/2}$

Mode

Let $\Sigma^{-1} = U D U'$ be the eigendecomposition with $0 < d_1 \le d_2 \le \ldots \le d_n$.
Then using Cauchy interlacing theorem (or its generalization to Hermitian quaternion matrices 6), one can show that the density is globally maximized when $x = UI_{n,k}$.
If the first $k+1$ eigenvalues are unique, then this maximizer is unique on the Grassmann manifold.
On the Stiefel manifold, ACG is of course not unimodal.

Moments

Empirically, the Riemannian mean on $\mathrm{Gr}(n, k, \mathbb{F})$ is the mode and is unique under the same conditions that the mode is unique.
On $\mathrm{St}(n, k, \mathbb{F})$ the minimizer of the Riemannian variance will obviously never be unique, but it's not clear empirically what the solution corresponds to.
On $\mathbb{F}\mathbb{S}^n$ the minimizer is never unique, but the set of minimizers is $\{z u_i \mid d_i = d_n, z \in \mathrm{U}(1, \mathbb{F}) \}$.

Of course, this is just empirical. I haven't seen any works derive the Riemannian mean and was not successful in deriving it myself.

Median

Unknown

Fitting

MLE

It suffices to maximize the likelihood on the Grassmann manifold.
Given an IID sample $x_1, \ldots, x_m \sim \mathrm{ACG}(\Sigma)$, the MLE of $\Sigma$ is the solution to the equation

$$\hat{\Sigma} = \frac{n}{km} \sum_{i=1}^m x_i (x_i' \hat{\Sigma}^{-1}x_i)^{-1} x_i',$$

which can be solved with an expectation maximization approach:

$$\begin{aligned} (\hat{v}_i)_t &= \frac{n}{k} (x_i' \hat{\Sigma}_t^{-1} x_i)^{-1}\\\ \hat{\Sigma}_{t+1} &= \frac{1}{m} \sum_{i=1}^m x_i (\hat{v}_i)_t x_i', \qquad t \in 1, 2\ldots. \end{aligned}$$

For $\mathbb{RP}^{n-1}$, this has been proven to converge for almost all samples when $m > n$.1
4 proved that the log-likelihood on the real and complex Grassmann manifolds is convex on the geodesics of SPD matrices with unit determinant and found that for almost all samples of size $m$, $\hat{\Sigma}$ is unique when $m > \frac{n^2}{k (n-k)}$. They proposed two algorithms for maximizing the likelihood, though their chosen metric and its geodesics don't really make sense to me. 7 proved convexity of the log-likelihood for the affine-invariant metric and gave two more algorithms.

The above algorithms should probably be benchmarked to compare performance and accuracy, but it's probably better to have an implementation that defaults to using the expectation-maximization algorithm but allows the user to specify a Manopt optimization method, in which case optimization on SPDFixedDeterminant with the specified algorithm is performed.

References/Notes

Footnotes

  1. Tyler, David E. “Statistical Analysis for the Angular Central Gaussian Distribution on the Sphere.” Biometrika 74, no. 3 (1987): 579–89. https://doi.org/10.2307/2336697. 2

  2. Kent, John T. "Data analysis for shapes and images." Journal of statistical planning and inference 57.2 (1997): 181-193. https://doi.org/10.1016/S0378-3758(96)00043-2

  3. Chikuse, Yasuko. "The matrix angular central Gaussian distribution." Journal of Multivariate Analysis 33.2 (1990): 265-274. https://doi.org/10.1016/0047-259X(90)90050-R

  4. Auderset, Claude, Christian Mazza, and Ernst Ruh. "Grassmannian estimation." arXiv preprint arXiv:0809.3697 (2008). https://doi.org/10.48550/arXiv.0809.3697 2

  5. Because $\Sigma$ is unique only up to a scale factor, some references require it have unit determinant and therefore unit normalization constant.

  6. Tam, Tin-Yau. "Interlacing inequalities and Cartan subspaces of classical real simple Lie algebras." SIAM Journal on Matrix Analysis and Applications 21.2 (2000): 581-592. https://doi.org/10.1137/S0895479898343528

  7. Ciobotaru, Corina, and Christian Mazza. "Consistency and asymptotic normality of M-estimates of scatter on Grassmann manifolds." Journal of Multivariate Analysis 190 (2022): 104998. https://doi.org/10.1016/j.jmva.2022.104998

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions