Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayerList的写法与梯度异常 #69174

Open
johnyanccer opened this issue Nov 5, 2024 · 3 comments
Open

LayerList的写法与梯度异常 #69174

johnyanccer opened this issue Nov 5, 2024 · 3 comments
Assignees
Labels

Comments

@johnyanccer
Copy link

johnyanccer commented Nov 5, 2024

请提出你的问题 Please ask your question

grad

纵轴是参数梯度L2范数,横轴是训练步数,input数据依次经过layer0到layer9,然后通过classifier 得到输出,并计算loss,没有设置共享参数

如图呈现靠近输入的梯度大,靠近输出的梯度小,怀疑是LayerList写法不对,但是按照文档和transformer.py写法结果仍然一样

目前是按照layerlist文档的写法

class MyLayer(paddle.nn.Layer):

    def __init__(self):
        super().__init__()
        self.linears = paddle.nn.LayerList(
            [paddle.nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # LayerList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

但是nn.layer.transformer文件中的写法如下:

    def __init__(self, encoder_layer, num_layers, norm=None):
        super().__init__()
        self.layers = LayerList(
            [
                (
                    encoder_layer
                    if i == 0
                    else type(encoder_layer)(**encoder_layer._config)
                )
                for i in range(num_layers)
            ]
        )
@johnyanccer
Copy link
Author

调整了layerlist写法,梯度依旧

@xiaoguoguo626807
Copy link
Contributor

再详细描述一下问题吧,这个图横,纵轴图例没有,靠近输入,靠近输出的意思是什么, 是不是layer 共享了参数,反向传播累加了梯度

@johnyanccer
Copy link
Author

再详细描述一下问题吧,这个图横,纵轴图例没有,靠近输入,靠近输出的意思是什么, 是不是layer 共享了参数,反向传播累加了梯度

纵轴是参数梯度L2范数,横轴是训练步数,input数据依次经过layer0到layer9,然后通过classifier 得到输出,并计算loss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants