-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geometry of transforms #9
Comments
One thing I thought about while trying to compare Stan and TFP's Cholesky bijectors was to try and visualize this geometry. Something along these lines: For a 3x3 correlation matrix, the unconstrained Cholesky factor is 3 numbers - let's call them x,y,z. We could compute, for example, Hope the explanation makes sense! What do you think @mjhajharia @bob-carpenter ? |
@adamhaber: I think anything we can do to help illustrate these transforms would be great. My thinking is that we want to evaluate geometry in the tail, body, and head of the density. Maybe the geometry is well-behaved around the mode, but not in the tail. Tail: leapfrogs until hit body (draw with log density in central 99% interval of posterior log density); this measures how well the transform works to remove transient bias Body: ESS/leapfrog to see how well it samples after adaptation Mode: we can evaluate number of leapfrogs to body I think we also want to look at a couple of other things in all of these places. One is the norm of the gradient Not interesting at mode. In the body, we can get a distribution over this. In the tail? I think it'd also be interesting to test positive definiteness of Hessian and if positive definite, compute its condition number. This we can do at mode, and the in the body we can get a distribution again. |
I've pushed a notebook with some examples of the kind of stuff I think we can do, here: This is just for specific values of K and eta, but can easily be generalized... let me know what you think! |
Thanks, @adamhaber. Is there a rendered form somewhere? |
Of the notebook? It's rendered on GitHub for me...
…On Tue, Jul 19, 2022, 6:01 PM Bob Carpenter ***@***.***> wrote:
Thanks, @adamhaber <https://github.com/adamhaber>. Is there a rendered
form somewhere?
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3BBEULJ3UV4XY7QSN7MK3VU27NRANCNFSM52YR3LNQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
:-). I was expecting graphics given the notebook title included "visualization". This time I actually read the text. Negative definite is good. The negative of the inverse Hessian of the log density is what we want to be positive definite, right? I always get tripped up with negations and inversions and log/exp---anything that can go either way. Isotropic is an issue for simplex transforms, too. There's an isotropic log ratio transform that I still don't understand yet. How much does the K = 5 value affect the geometry? |
Why inverse Hessian? My intuition here is since the mode is the maximum of the log prob function, it should be negative definite (so tiny movements from the mode are always decreasing the log prob, in second order since the first order is always zero).
Interesting! Do you have any intuition regarding how this might affect the sampler?
What do you mean? |
You're both right! The Hessian of the log density is negative definite at the mode (i.e. locally convex), and the inverse of a negative/positive definite matrix is also negative/positive definite. I think what we want is for the Hessian to be negative definite everywhere (globally convex), which implies unimodality. I suspect more strongly, we would prefer negative definite Hessian < negative diagonal Hessian < negative scalar Hessian, each of which takes us closer to simple multivariate normal geometry for which metric adaptation is ideal. Here's a thought. For augmented/expanded distributions like simplex augmented softmax we have an extra DOF (e.g. If this is analytically tractable, would it make sense to pick a |
Thanks, @sethaxen. I think we're all on the same page up to some of us (me!) being sloppy with signs. By "negative scalar Hessian" do you mean a scalar product of the identity matrix? That would be nice, but even a simple If we could pick a
What I mean is how much does pulling the probability mass toward the unit matrix help condition the Hessian? Empirically, how much easier is it to sample |
understanding the geometry of transforms better:
The text was updated successfully, but these errors were encountered: