Add Angular Central Gaussian family

## Definition

Suppose $z \sim \mathbb{F}\mathcal{N}_{n,k}(0, c \Sigma, I_k)$ is a matrix-normally distributed parameter in $\mathbb{F}^{n \times k}$, where $\Sigma$ is a positive-definite matrix in $\mathbb{F}^{n \times n}$ and $c>0$, and suppose $z=xp$ is the unique polar decomposition of $z$, where $x \in \mathrm{St}(n, k, \mathbb{F})$ and $p$ is a positive-definite matrix in $\mathbb{F}^{k \times k}$. Then $x$ follows the angular central Gaussian (ACG) distribution on $\mathrm{St}(n, k, \mathbb{F})$.

ACG was originally defined on the real sphere (and projective space) in [^Tyler1987] and generalized to the complex Sphere in [^Kent1997] and the real Stiefel/Grassmann manifolds in [^Chikuse1990]. I believe [^Auderset2008] were the first to generalize to the complex Grassmann manifold. The generalization to quaternionic spaces here is I think new but follows naturally from the same derivations.

The ACG family has the density wrt the normalized invariant measure on the Grassmann and Stiefel manifolds of[^note1]
```math
p(x; \Sigma) = c_{k,\mathbb{F}}(\Sigma)^{-1} |x^\mathrm{H}\Sigma^{-1} x|^{-n\mathrm{dim}_\mathbb{F}/2}
```

Note for $w \in \mathrm{U}(k, \mathbb{F})$
```math
p(xw; \Sigma) = p(x; \Sigma),
```
so the distribution is invariant to right unitary transformations and is thus a distribution on the Grassmann manifold.

> [!NOTE]
> **Representation of points on the Grassmann manifold**
> Points on the Grassmann manifold can also be expressed as a unique projector matrix $y=xx'$, and in this representation, the density wrt the normalized invariant measure on the Grassmann manifold is written
> ```math
> p(y; \Sigma) = c_{k,\mathbb{F}}(\Sigma)^{-1} |y \Sigma^{-1} y|_+^{-n\mathrm{dim}_\mathbb{F}/2},
> ```
> where $|\cdot|_+$ is the pseudodeterminant.
> I haven't seen this covered in any of the papers, and for $n \gg k$ this is much less efficient to work with than the representation using Stiefel points.

ACG is mainly useful when one requires:
1. efficient exact sampling OR
2. efficient evaluation of the normalization constant (e.g. hierarchical Bayesian models where $\Sigma$ is also inferred)

## Parameterizations

### Scale

The scale parameterization (as used above) uses a positive-definite scale matrix $\Sigma$.
Note that $\Sigma$ is only unique up to a positive scalar, that is. $p(x; a\Sigma) = p(x; \Sigma)$ for all $a > 0$.
Therefore, the useful information in $\Sigma$ is the principal axes and their *relative* scales.
Furthermore, when $\Sigma=aI_n$, ACG reduces to the normalized invariant measure on the manifold.

$\Sigma$ is interesting in its own right. In some domains it is called the scattering matrix or Tyler's transformation matrix.

### Precision

The precision parameterization $P=\Sigma^{-1}$ is more efficient for density evaluation because one can avoid the explicit matrix inversion.

## Properties

### Closure

For any $A \in \mathrm{U}(n, \mathbb{F})$, we have that if $x \sim \mathrm{ACG}(\Sigma)$, then $Ax \sim \mathrm{ACG}(A \Sigma A^\mathrm{H})$, so the family is closed under left-unitary actions.

### Normalization constant

The normalization constant in the two parameterizations is
- $c_{k,\mathbb{F}}(\Sigma) = |\Sigma|^{k\mathrm{dim}_\mathbb{F}/2}$
- $c_{k,\mathbb{F}}(P) = |P|^{-k\mathrm{dim}_\mathbb{F}/2}$

### Mode

Let $\Sigma^{-1} = U D U'$ be the eigendecomposition with $0 < d_1 \le d_2 \le \ldots \le d_n$.
Then using Cauchy interlacing theorem (or its generalization to Hermitian quaternion matrices [^Tam2000]), one can show that the density is globally maximized when $x = UI_{n,k}$.
If the first $k+1$ eigenvalues are unique, then this maximizer is unique on the Grassmann manifold.
On the Stiefel manifold, ACG is of course not unimodal.

### Moments

Empirically, the Riemannian mean on $\mathrm{Gr}(n, k, \mathbb{F})$ is the mode and is unique under the same conditions that the mode is unique.
On $\mathrm{St}(n, k, \mathbb{F})$ the minimizer of the Riemannian variance will obviously never be unique, but it's not clear empirically what the solution corresponds to.
On $\mathbb{F}\mathbb{S}^n$ the minimizer is never unique, but the set of minimizers is $`\{z u_i \mid d_i = d_n, z \in \mathrm{U}(1, \mathbb{F}) \}`$.

Of course, this is just empirical. I haven't seen any works derive the Riemannian mean and was not successful in deriving it myself.

### Median

Unknown

### Fitting

#### MLE

It suffices to maximize the likelihood on the Grassmann manifold.
Given an IID sample $x_1, \ldots, x_m \sim \mathrm{ACG}(\Sigma)$, the MLE of $\Sigma$ is the solution to the equation
```math
\hat{\Sigma} = \frac{n}{km} \sum_{i=1}^m x_i (x_i' \hat{\Sigma}^{-1}x_i)^{-1} x_i',
```
which can be solved with an expectation maximization approach:
```math
\begin{aligned}
(\hat{v}_i)_t &= \frac{n}{k} (x_i' \hat{\Sigma}_t^{-1} x_i)^{-1}\\
\hat{\Sigma}_{t+1} &= \frac{1}{m} \sum_{i=1}^m x_i (\hat{v}_i)_t x_i', \qquad t \in 1, 2\ldots.
\end{aligned}
```
For $\mathbb{RP}^{n-1}$, this has been proven to converge for almost all samples when $m > n$.[^Tyler1987]
[^Auderset2008] proved that the log-likelihood on the real and complex Grassmann manifolds is convex on the geodesics of SPD matrices with unit determinant and found that for almost all samples of size $m$, $\hat{\Sigma}$ is unique when $m > \frac{n^2}{k (n-k)}$. They proposed two algorithms for maximizing the likelihood, though their chosen metric and its geodesics don't really make sense to me.  [^CiobotaruNazza2022] proved convexity of the log-likelihood for the affine-invariant metric and gave two more algorithms.

The above algorithms should probably be benchmarked to compare performance and accuracy, but it's probably better to have an implementation that defaults to using the expectation-maximization algorithm but allows the user to specify a Manopt optimization method, in which case optimization on `SPDFixedDeterminant` with the specified algorithm is performed.

## References/Notes

[^note1]: Because $\Sigma$ is unique only up to a scale factor, some references require it have unit determinant and therefore unit normalization constant.
[^note2]: I don't understand their definition of geodesics at all. Other papers have reproduced their convexity result for standard metrics.
[^Tyler1987]: Tyler, David E. “Statistical Analysis for the Angular Central Gaussian Distribution on the Sphere.” Biometrika 74, no. 3 (1987): 579–89. https://doi.org/10.2307/2336697.
[^Kent1997]: Kent, John T. "Data analysis for shapes and images." Journal of statistical planning and inference 57.2 (1997): 181-193. https://doi.org/10.1016/S0378-3758(96)00043-2
[^Chikuse1990]: Chikuse, Yasuko. "The matrix angular central Gaussian distribution." Journal of Multivariate Analysis 33.2 (1990): 265-274. https://doi.org/10.1016/0047-259X(90)90050-R
[^Tam2000]: Tam, Tin-Yau. "Interlacing inequalities and Cartan subspaces of classical real simple Lie algebras." SIAM Journal on Matrix Analysis and Applications 21.2 (2000): 581-592. https://doi.org/10.1137/S0895479898343528
[^Auderset2008]: Auderset, Claude, Christian Mazza, and Ernst Ruh. "Grassmannian estimation." arXiv preprint arXiv:0809.3697 (2008). https://doi.org/10.48550/arXiv.0809.3697
[^CiobotaruNazza2022]: Ciobotaru, Corina, and Christian Mazza. "Consistency and asymptotic normality of M-estimates of scatter on Grassmann manifolds." Journal of Multivariate Analysis 190 (2022): 104998. https://doi.org/10.1016/j.jmva.2022.104998

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Angular Central Gaussian family #8

Definition

Parameterizations

Scale

Precision

Properties

Closure

Normalization constant

Mode

Moments

Median

Fitting

MLE

References/Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Angular Central Gaussian family #8

Description

Definition

Parameterizations

Scale

Precision

Properties

Closure

Normalization constant

Mode

Moments

Median

Fitting

MLE

References/Notes

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions