How would I reparametrize `nnx.Module` parameters? #4546

aniquetahir · 2025-02-13T23:52:55Z

aniquetahir
Feb 13, 2025

In jax, it's easy to re-parametrize a neural network using something similar to the following:

def loss_fn(params, model, ...):
    ...
    return loss
    
def reparametrized_loss(other_params, model, ....):
    params = sample(other_params)
    loss_fn(params, model, ...)

How do I achieve something similar using nnx since params are part of the model. Ofcourse I can use something like:

def reparametrized_loss(model, other_params, ....):
    _, state, _ = nnx.split(model, nnx.Param)
    # some state kung-fu here?
    nnx.merge(model, new_state)
    ...
    return loss

Now if i want to get the grads w.r.t. other_params, its not possible since nnx.grad works on the model's parameters and not other_params.

Answered by cgarciae

Feb 19, 2025

@aniquetahir can you create a separate optimizer for new_model (maybe call it sampled_model) at the begging and then simply update it after sampling its params? E.g.

# on init
model = ...
sampled_model = nnx.clone(model)
sampled_optimizer = nnx.Optimizer(sampled_model, ...)
... # later
logits, params = model(data)
nnx.update(sampled_model, params)

View full answer

aniquetahir · 2025-02-16T17:07:43Z

aniquetahir
Feb 16, 2025
Author

I also looked into the LoRA implementation to see how this issue might be handled. Here's what happens:

  def __call__(self, x: jax.Array):
    out = x @ self.lora_a @ self.lora_b
    if self.base_module is not None:
      if not callable(self.base_module):
        raise ValueError('`self.base_module` must be callable.')
      out += self.base_module(x)
    return out

But the problem here is that it is essentially calculating $(Wx+b) + ABx$, not $(W+AB)x + b$, i.e., the LoRA is also applied to the bias.

So essentially, my question could be simplified as follows. Given a reparametrization function, $f(.)$, is it possible to get the gradients w.r.t. $f(\theta)$ instead of $\theta$. Given that $\theta$ is the set of trainable nnx.Module parameters.

0 replies

cgarciae · 2025-02-17T16:49:16Z

cgarciae
Feb 17, 2025
Maintainer

Hi @aniquetahir, to get a gradient wrt to any substate you can pass a nnx.DiffState with a filter to nnx.grad or nnx.value_and_grad, that same filter can be passed to nnx.Optimizer if you're using it. Here is an example: #4533 (comment)

7 replies

aniquetahir Feb 17, 2025
Author

@cgarciae You can see here how I implemented reparametrization for BNN using jax/haiku. I wanted to port the code to NNX where the underlying network is nnx.Module.
https://github.com/aniquetahir/GAIA/blob/a4d15b2d2ca781a39219000940e407439138141d/utils/jax/models/bnn.py#L222

cgarciae Feb 18, 2025
Maintainer

@aniquetahir thanks for sharing the code. Lets take a look at this:

logits, params = apply_fn(key, b_params, data)
params = destructure(params, jax.tree_util.tree_structure(params))

So here it seems that you want the network to output a new set of parameters for a new model. There's two things you can do here, you can either use split and merge and keep treating everything functional, or you can create a new model and set its parameters to the given values. The latter could be translated to something like:

logits, params = model(data)

new_model = nnx.clone(model)
nnx.update(new_model, params)

# do stuff with new_model

Is this close to what you are looking for?

aniquetahir Feb 18, 2025
Author

Thanks. The problem is that when gradient is applied, it is applied to the parameters inside the nnx.Module, not the parameters which were used to sample.
e.g. lets say I sample a parameter, $p$, from a normal distribution, $\mathcal{N}(\mu, \sigma)$. I want the gradients for $\mu$ and $\sigma$, not $p$. I hope that makes sense?

Maybe I can set a filter on the internal network params? I'll see what happens in that scenario.

cgarciae Feb 19, 2025
Maintainer

@aniquetahir can you create a separate optimizer for new_model (maybe call it sampled_model) at the begging and then simply update it after sampling its params? E.g.

# on init
model = ...
sampled_model = nnx.clone(model)
sampled_optimizer = nnx.Optimizer(sampled_model, ...)
... # later
logits, params = model(data)
nnx.update(sampled_model, params)

Answer selected by aniquetahir

aniquetahir Feb 20, 2025
Author

I created a small program to see the difference in outputs. Ideally, jax grad should match nnx grads but they are different. Expected value: 2, {a:4, b:4}, Return value: 4, 4:

import jax
from jax import numpy as jnp
from typing import Callable
from flax import nnx


def reparam(p):
    a, b = p
    return 1 * a * b                                                                                                                                                                                           

def pred(param, x):
    return param * x

def pred_reparam(param, x):
    return pred(reparam(param), x)

def main():
    input = jnp.array(2.)
    param = jnp.array(4.)
    g = jax.grad(pred)(param, input)
    r_g = jax.grad(pred_reparam)((2., 2.), input)
    r_g2 = jax.grad(pred_reparam)((4., 1.), input)
    print(f'regular grad: {g}, reparam grad: {r_g}, reparam grad 2: {r_g2}')
                                                                                                                                                                                                             inner = Inner()
    outer = Outer()
    nnx_grad = nnx.grad(lambda x: inner(x))(input)
    nnx_reparam_grad = nnx.grad(lambda x: outer(x, inner))(input)

    print(f'nnx grad: {nnx_grad}, nnx reparam grad: {nnx_reparam_grad}')


class Inner(nnx.Module):
    def __init__(self):
        self.param = nnx.Param(4.)

    def __call__(self, x):
        return self.param * x

class Outer(nnx.Module):
    def __init__(self):
        self.a, self.b = nnx.Param(2.), nnx.Param(2.)
        # self.inner = Inner()

    def __call__(self, x, inner: Callable):
        # reparametrize
        gd, state, _ = nnx.split(inner, nnx.Param, ...)
        state = jax.tree_util.tree_map(lambda x: reparam((self.a, self.b,)), state)
        inner = nnx.merge(gd, state)
        return inner(x)


if __name__ == "__main__":
    main()

Output:

regular grad: 2.0, reparam grad: (Array(4., dtype=float32, weak_type=True), Array(4., dtype=float32, weak_type=True)), reparam grad 2: (Array(2., dtype=float32, weak_type=True), Array(8., dtype=float32, weak_type=True))
nnx grad: 4.0, nnx reparam grad: 4.0

aniquetahir Feb 20, 2025
Author

@cgarciae I figured it out! Thanks for the pointers!
The mistake was calling nnx.grad(...)(input) rather than nnx.grad(...)(model). Also using the inner network as a parameter for the call to the outer network makes sense to me. Let me know if you have any further comments.

How would I reparametrize nnx.Module parameters? #4546

Uh oh!

Uh oh!

aniquetahir Feb 13, 2025

Replies: 2 comments · 7 replies

Uh oh!

Uh oh!

aniquetahir Feb 16, 2025 Author

Uh oh!

cgarciae Feb 17, 2025 Maintainer

Uh oh!

Uh oh!

aniquetahir Feb 17, 2025 Author

Uh oh!

cgarciae Feb 18, 2025 Maintainer

Uh oh!

Uh oh!

aniquetahir Feb 18, 2025 Author

Uh oh!

Uh oh!

cgarciae Feb 19, 2025 Maintainer

Uh oh!

Uh oh!

aniquetahir Feb 20, 2025 Author

Uh oh!

Uh oh!

aniquetahir Feb 20, 2025 Author

How would I reparametrize `nnx.Module` parameters? #4546

aniquetahir
Feb 13, 2025

Replies: 2 comments 7 replies

aniquetahir
Feb 16, 2025
Author

cgarciae
Feb 17, 2025
Maintainer

aniquetahir Feb 17, 2025
Author

cgarciae Feb 18, 2025
Maintainer

aniquetahir Feb 18, 2025
Author

cgarciae Feb 19, 2025
Maintainer

aniquetahir Feb 20, 2025
Author

aniquetahir Feb 20, 2025
Author