Description
The von Mises-Fisher distribution on Stiefel(n, k, 𝔽)
and submanifolds is an exponential family distribution with natural parameters
The names usually given to this family on different submanifolds are:
- Circle: von Mises
- Sphere: von Mises-Fisher/Fisher/Langevin
- Stiefel: Matrix von Mises-Fisher/matrix Fisher/matrix Langevin
Note
Some authors1 define the "matrix Langevin" distribution on the Grassmann manifold as distribution with the above density when the point is a projector matrix. However, upon changing the point to a subspace, one gets the Bingham distribution, so we here keep the distinction that vMF is for Stiefel and its submanifolds and Bingham is for Grassmann and its submanifolds.
Note
For
Parameterizations
Canonical
In the canonical parameterization, we have a single
This corresponds to the
This parameterization is most convenient if the parameter is fixed, and one just needs the unnormalized density.
SVD
Let
It can be made unique by choice of sign/phase convention if all singular values are distinct and non-zero.
This parameterization is most convenient if one needs the normalization constant and/or mode.
For the Sphere and Circle,
For
Thus the sign of the determinant of
Polar/Mode-Concentration (not planned)
The polar decomposition is
It is unique under the same conditions as the SVD of
When the SVD is unique,
Note that
We could call either
This parameterization is most convenient if one needs the mode, but for Stiefel in general, it's really no more convenient than the SVD parameterization. Also, on submanifolds of the Stiefel,
In conclusion, I don't think we should include this parameterization except for the sphere/circle.
But it's useful for talking about some properties of vMF.
Properties
Closure
If
As an example, let
Then
Rotational Symmetry
Let
Normalization constants
The normalization constant
-
Stiefel(n, k, 𝔽)
:$c_{n,k,\mathbb{F}}(D) = {}_0F_1^{(2/\mathrm{dim}_\mathbb{F})}(\frac{1}{2}n \mathrm{dim}_\mathbb{F}; \frac{1}{4} D^2)$ 3 -
Sphere(n)
:
-
Circle()
:$c(\kappa) = c_{2,1,\mathbb{R}}(\kappa) = {}_0F_1(1; \frac{1}{4} \kappa^2) = I_{0}(\kappa)$ -
$\mathrm{SO}(2), \mathrm{SU}(2)$ :$c(\eta) = c_{2\mathrm{dim}_\mathbb{F}}(\sqrt{\lVert \eta\rVert^2 + 2\Re(\det(\eta))})$ , where$c_n(\cdot)$ is the normalization constant of$\mathbb{S}^n$ . -
$\mathrm{SO}(3)$ :$c(D) = {}_1F_1^{(2)}(\frac{1}{2}, 2; \Lambda(D))$ , where$\Lambda(D)$ is given in 4. This result is derived using the connection between vMF on$\mathrm{SO(3)}$ and the Bingham distribution on$\mathbb{S}^3$ . (note that because of the double-cover they have an extra factor of$\frac{1}{2}$ , but since we're writing densities wrt the normalized invariant measure here, this factor is eliminated in the normalization.)
where${}_pF_q^{(\alpha)}$ is a (real,scalar-valued) hypergeometric function of matrix argument,$A_{n}$ is the surface area of$\mathbb{S}^{n}$ ,$C_n(\kappa)$ is defined in the wikipedia article, and$I_\nu$ is the modified Bessel function of the first kind.
Note
In directional statistics, the density on the sphere is conventionally written wrt the Hausdorff measure (or Lebesgue measure on the circle), which is related to the normalized invariant measure by the constant factor
The standard algorithm for computing matrix-argument hypergeometric functions is described in 5. HypergeoMat.jl implements this algorithm. It works through computing a truncated series and becomes more expensive for large
I haven't yet found expressions for the normalization constant for other special cases of
Mode
Wrt the invariant/Hausdorff measures, the mode in the SVD representation is
Moments
For the sphere/circle, the Riemannian mean is known to be the mode
The intrinsic (Riemannian) variance and higher moments reduce to 1-dimensional integrals but in general have no solution.
Using well-known properties of the exponential family, the extrinsic mean and covariance can be obtained through differentiation of the logarithm of the normalization constant, which we can do here using autodiff. For the sphere, the extrinsic statistics are well-studied; however, Manifolds currently only has an API for estimating extrinsic statistics, not for computing them exactly, so for now, these are excluded.
Median
I haven't seen this given anywhere, though for the sphere/circle it stands to reason it's the same as the Riemannian mean when
Fitting
Maximum-likelihood estimates
Given
The key obstacle in MLE for vMF is the (in)tractibility of
Stiefel
Let
Then
The MLE of
For small
Empirically, an even better approximation for small
which of course has the same first-order approximation.
In both cases, we get
In general the MLE
However, some authors have proposed other approximations for the normalization constant that could be used in an objective to obtain approximate MLE estimates.
Sphere
Note that for the sphere and the complex circle,
Here we're helped by properties of the derivative of the hypergeometric function, namely
so
So the MLE
6 gives a nice review of the different techniques that have been employed.
Special Orthogonal
7 described how to modify the MLE on the Stiefel manifold to obtain MLE on
Here the challenge posed by the intractability of the normalization constant is even worse, and they propose an approach using holonomic gradient descent.
Random generation
Algorithms for exact random generation have been worked out for some of the submanifolds:
- Circle: 8
- Sphere: 9
-
Stiefel(n, k, ℝ)
: The best option seems to be the rejection sampling method of 10, using a proposal derived from 9 for von Mises-Fisher on the sphere. It's worth investigating whether Hoff's algorithm straightforwardly generalizes to complex Stiefel. -
$\mathrm{SO}(2)$ : This is equivalent to the von Mises distribution on the circle, so the same algorithm can be used. -
$\mathrm{SO}(3)$ /$\mathrm{SU}(2)$: Using the connections between$\mathbb{S}^3$ and$\mathrm{SO}(3)$ , one finds that a Bingham random variable on$\mathbb{S}^3$ maps to a vMF random variable on$\mathrm{SO}(3)$ . Thus one can use a Bingham sampler on$\mathbb{S}^2$ and map to$\mathrm{SO}(3)$ .
Notes
11 and 12 seemingly independently proposed the complex generalization of matrix-vMF. While many of its properties included here follow from the same derivations for the real case, the complex version seems much less well studied.
References
Footnotes
-
Chikuse, Yasuko, and Geoffrey S. Watson. "Large sample asymptotic theory of tests for uniformity on the Grassmann manifold." Journal of multivariate analysis 54.1 (1995): 18-31. https://doi.org/10.1006/jmva.1995.1043 ↩
-
Mardia, Kanti V., and Peter E. Jupp. Directional statistics. John Wiley & Sons, 2009. https://doi.org/10.1002/9780470316979 ↩
-
For
\mathbb{F}=\mathbb{H}
, I'm pretty sure this is the correct definition from reading Theorem 4.3 of 13, but before implementing it one should verify that their definition of matrix-argument hypergeometric functions really match to this definition used elsewhere. ↩ -
Wood, Andrew TA. "Estimation of the Concentration Parameters of the Fisher Matrix Distribution on 50 (3) and the Bingham Distribution on Sq, q≥ 2." Australian Journal of Statistics 35.1 (1993): 69-79. ↩
-
Koev, Plamen, and Edelman, Alan. "The efficient evaluation of the hypergeometric function of a matrix argument." Mathematics of Computation 75.254 (2006): 833-846. https://doi.org/10.1090/S0025-5718-06-01824-2 Reference implementation ↩
-
Hornik, K., & Grün, B. (2014). movMF: An R Package for Fitting Mixtures of von Mises-Fisher Distributions. Journal of Statistical Software, 58(10), 1–31. https://doi.org/10.18637/jss.v058.i10 ↩
-
Sei, Tomonari, et al. "Properties and applications of Fisher distribution on the rotation group." Journal of Multivariate Analysis 116 (2013): 440-455. https://doi.org/10.1016/j.jmva.2013.01.010 ↩
-
D. J. Best, N. I. Fisher, Efficient Simulation of the von Mises Distribution, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 28, Issue 2, June 1979, Pages 152–157, https://doi.org/10.2307/2346732 ↩
-
Wood, A. T. A. (1994). Simulation of the von mises fisher distribution. Communications in Statistics - Simulation and Computation, 23(1), 157–164. https://doi.org/10.1080/03610919408813161 ↩ ↩2
-
Hoff, P. D. (2009). Simulation of the Matrix Bingham–von Mises–Fisher Distribution, With Applications to Multivariate and Relational Data. Journal of Computational and Graphical Statistics, 18(2), 438–456. https://doi.org/10.1198/jcgs.2009.07177 ↩
-
Bingham, Christopher, Ted Chang, and Donald Richards. "Approximating the matrix Fisher and Bingham distributions: Applications to spherical regression and Procrustes analysis." Journal of Multivariate Analysis 41.2 (1992): 314-337. https://doi.org/10.1016/0047-259X(92)90072-N ↩
-
Chikuse, Yasuko. "Hermite and Laguerre polynomials with complex matrix arguments." Linear algebra and its applications 388 (2004): 91-105. https://doi.org/10.1016/j.laa.2004.02.028 ↩
-
Li, F., & Xue, Y. (2009). Zonal Polynomials and Hypergeometric Functions of Quaternion Matrix Argument. Communications in Statistics - Theory and Methods, 38(8), 1184–1206. https://doi.org/10.1080/03610920802379185 ↩