Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems in reproducing Differential Transformer code #1

Open
YTianZHU opened this issue Oct 8, 2024 · 1 comment
Open

Problems in reproducing Differential Transformer code #1

YTianZHU opened this issue Oct 8, 2024 · 1 comment

Comments

@YTianZHU
Copy link

YTianZHU commented Oct 8, 2024

Hi, thanks for reproducing Differential Transformer. It seems there are some problems in your reproducing code. You should split q and k in n_head dimension, do re-parameterization for lambda, and add GN with gamma. You can refer to the official code (https://github.com/microsoft/unilm/blob/master/Diff-Transformer/multihead_diffattn.py) for details.

@Jaykef
Copy link
Owner

Jaykef commented Oct 9, 2024

You have a point but the goal of this implementation is not to reproduce official code - aims at implementing core components of the architecture for compute-constrained educational purpose. I went through the official code (cited it in the notebook), I will update the code to include some of the original later. Thank you for pointing out.

@YTianZHU YTianZHU closed this as completed Oct 9, 2024
@YTianZHU YTianZHU reopened this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants