-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Heterogeneous Code Part 1: Add Model and Module Code for Chameleon Lumina #377
base: develop
Are you sure you want to change the base?
Conversation
hidden_states = self.norm(hidden_states) | ||
|
||
if hasattr(self, "output"): | ||
hidden_states = self.output(hidden_states).float() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该不需要把输出额外转fp32类型,我们NaiveAMPModel里有一个参数output_to_fp32来负责做这件事
k_norm_out = self.k_norm(k_all) | ||
k = split_forward_gather_backward(k_norm_out, ParallelMode.TENSOR, dim=-2) | ||
|
||
v = rearrange(v, "b s (h d) -> b s h d", d=self.head_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qk_norm这块可能要额外区分一下,如果is_using_isp()wp并行算法的话,就不需要走gather的逻辑了,因为isp算法forward时权重是完整的,算出来的qkv head也是完整的,不需要gather
k_norm_out = self.k_norm(k_all) | ||
k = split_forward_gather_backward(k_norm_out, ParallelMode.TENSOR, dim=-2) | ||
|
||
v = rearrange(v, "b s (h d) -> b s h d", d=self.head_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块同上
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
This PR and next PR series will be the code adding heterogeneous support and Chameleon model for InternEvo.
We plan to merge those PRs:
This PR is the first one: adding Chameleon Model code.
Modification
As described above, this PR is the first of a series of PRs: Adding Chameleon model code.
BC-breaking (Optional)
None
Use cases (Optional)
We will add use case after the configuration PR so that we can show the training use case.
Checklist
Before PR:
After PR: