Geometry of transforms #9

mjhajharia · 2022-07-06T06:43:48Z

understanding the geometry of transforms better:

behavior of tail
convexity
(needs more thought and discussion)

adamhaber · 2022-07-15T18:30:07Z

One thing I thought about while trying to compare Stan and TFP's Cholesky bijectors was to try and visualize this geometry. Something along these lines:

For a 3x3 correlation matrix, the unconstrained Cholesky factor is 3 numbers - let's call them x,y,z. We could compute, for example, lkj_corr_cholesky_lpdf(f(x,y,z) | eta) and lkj_corr_cholesky_lpdf(g(x,y,z) | eta) where f,g are different transforms, and we can play with different values of eta. If we evaluate these on a 3x3x3 grid (say, x,y,z going from -10 to 10 in jumps of 0.1), we'll get a cube which might capture interesting properties of these different geometries; might be interesting to visualize different 2d projections of this cube, as well as the ratio between the cube for g and the cube for f.

Hope the explanation makes sense! What do you think @mjhajharia @bob-carpenter ?

bob-carpenter · 2022-07-18T17:00:05Z

@adamhaber: I think anything we can do to help illustrate these transforms would be great.

My thinking is that we want to evaluate geometry in the tail, body, and head of the density. Maybe the geometry is well-behaved around the mode, but not in the tail.

Tail: leapfrogs until hit body (draw with log density in central 99% interval of posterior log density); this measures how well the transform works to remove transient bias

Body: ESS/leapfrog to see how well it samples after adaptation

Mode: we can evaluate number of leapfrogs to body

I think we also want to look at a couple of other things in all of these places. One is the norm of the gradient

$$ f(x) = \big|\big| \nabla_x \log \pi(x) \big|\big| $$

Not interesting at mode. In the body, we can get a distribution over this. In the tail?

I think it'd also be interesting to test positive definiteness of Hessian and if positive definite, compute its condition number. This we can do at mode, and the in the body we can get a distribution again.

adamhaber · 2022-07-19T08:41:14Z

I've pushed a notebook with some examples of the kind of stuff I think we can do, here:

https://github.com/mjhajharia/transforms/blob/feature/corr-cholesky-geometry/transforms/cholesky/visualize%20geometry.ipynb

This is just for specific values of K and eta, but can easily be generalized... let me know what you think!

bob-carpenter · 2022-07-19T15:01:32Z

Thanks, @adamhaber. Is there a rendered form somewhere?

adamhaber · 2022-07-19T15:11:40Z

Of the notebook? It's rendered on GitHub for me...

…

On Tue, Jul 19, 2022, 6:01 PM Bob Carpenter ***@***.***> wrote: Thanks, @adamhaber <https://github.com/adamhaber>. Is there a rendered form somewhere? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3BBEULJ3UV4XY7QSN7MK3VU27NRANCNFSM52YR3LNQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bob-carpenter · 2022-07-19T15:54:40Z

:-). I was expecting graphics given the notebook title included "visualization". This time I actually read the text.

Negative definite is good. The negative of the inverse Hessian of the log density is what we want to be positive definite, right? I always get tripped up with negations and inversions and log/exp---anything that can go either way.

Isotropic is an issue for simplex transforms, too. There's an isotropic log ratio transform that I still don't understand yet.

How much does the K = 5 value affect the geometry?

adamhaber · 2022-07-19T18:46:08Z

The negative of the inverse Hessian of the log density is what we want to be positive definite, right? I always get tripped up with negations and inversions and log/exp---anything that can go either way.

Why inverse Hessian? My intuition here is since the mode is the maximum of the log prob function, it should be negative definite (so tiny movements from the mode are always decreasing the log prob, in second order since the first order is always zero).

Isotropic is an issue for simplex transforms, too. There's an isotropic log ratio transform that I still don't understand yet.

Interesting! Do you have any intuition regarding how this might affect the sampler?

How much does the K = 5 value affect the geometry?

What do you mean?

sethaxen · 2022-07-21T09:41:48Z

The negative of the inverse Hessian of the log density is what we want to be positive definite, right? I always get tripped up with negations and inversions and log/exp---anything that can go either way.

Why inverse Hessian? My intuition here is since the mode is the maximum of the log prob function, it should be negative definite (so tiny movements from the mode are always decreasing the log prob, in second order since the first order is always zero).

You're both right! The Hessian of the log density is negative definite at the mode (i.e. locally convex), and the inverse of a negative/positive definite matrix is also negative/positive definite. I think what we want is for the Hessian to be negative definite everywhere (globally convex), which implies unimodality.

I suspect more strongly, we would prefer negative definite Hessian < negative diagonal Hessian < negative scalar Hessian, each of which takes us closer to simple multivariate normal geometry for which metric adaptation is ideal.

Here's a thought. For augmented/expanded distributions like simplex augmented softmax we have an extra DOF (e.g. $r$) and a transform $f: y \mapsto (x, r)$. The uniform density on unconstrained space is $p_Y(y) = |J_f(y)| p_R(r(y) | x(y))$, and we need to pick a proper prior $p_R(r(y) | x(y))$ according to some heuristics. The Hessian of the log density is the sum of the Hessians of its two components:
$$H_y [\log p_Y(\cdot)] = H_y [\log |J_f(\cdot)|] + H_y [\log p_R(r(\cdot) | x(\cdot))].$$

If this is analytically tractable, would it make sense to pick a $p_R$ form that guarantees negative definiteness of $H_y [\log p_Y(\cdot)]$ and also tries to bring its off-diagonal terms to zero (either at every point, the mode, or the prior mean of $H_y [\log p_Y(\cdot)]$)?

bob-carpenter · 2022-07-21T16:03:26Z

Thanks, @sethaxen. I think we're all on the same page up to some of us (me!) being sloppy with signs. By "negative scalar Hessian" do you mean a scalar product of the identity matrix? That would be nice, but even a simple log transform on a vector of positive values doesn't match this.

If we could pick a $p_R$ that guarantees negative definiteness of the Hessian, that'd be great. I have no idea how to do that.

How much does the K = 5 value affect the geometry?

What I mean is how much does pulling the probability mass toward the unit matrix help condition the Hessian? Empirically, how much easier is it to sample K = 0.1 (pushes mass to corners) vs. K = 1 (uniform) vs. K = 10 (pushes mass toward unit matrix)?

mjhajharia changed the title ~~Discussion~~ Geometry of transforms Jul 6, 2022

mjhajharia added the discussion label Jul 6, 2022

sethaxen mentioned this issue Jul 23, 2022

parametrization of softmax-augmented #43

Open

sethaxen mentioned this issue Aug 31, 2022

stuff to sample/plot #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geometry of transforms #9

Geometry of transforms #9

mjhajharia commented Jul 6, 2022

adamhaber commented Jul 15, 2022

bob-carpenter commented Jul 18, 2022

adamhaber commented Jul 19, 2022

bob-carpenter commented Jul 19, 2022

adamhaber commented Jul 19, 2022 via email

bob-carpenter commented Jul 19, 2022

adamhaber commented Jul 19, 2022

sethaxen commented Jul 21, 2022

bob-carpenter commented Jul 21, 2022

Geometry of transforms #9

Geometry of transforms #9

Comments

mjhajharia commented Jul 6, 2022

adamhaber commented Jul 15, 2022

bob-carpenter commented Jul 18, 2022

adamhaber commented Jul 19, 2022

bob-carpenter commented Jul 19, 2022

adamhaber commented Jul 19, 2022 via email

bob-carpenter commented Jul 19, 2022

adamhaber commented Jul 19, 2022

sethaxen commented Jul 21, 2022

bob-carpenter commented Jul 21, 2022