Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Heterogeneous Code Part 1: Add Model and Module Code for Chameleon Lumina #377

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

zhhsplendid
Copy link

@zhhsplendid zhhsplendid commented Nov 28, 2024

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

This PR and next PR series will be the code adding heterogeneous support and Chameleon model for InternEvo.

We plan to merge those PRs:

  1. Adding Chameleon model code.
  2. Adding Chameleon DataLoader code.
  3. Adding some training code for Chameleon, for example, z_loss, discard large grad norm, etc.
  4. Configuration of Chameleon and some tests for integrating above.
  5. Heterogeneous support, which is flag-controlled cpu+gloo p2p communication and unbalanced pipeline parallelism.
  6. More tests if needed.

This PR is the first one: adding Chameleon Model code.

Modification

As described above, this PR is the first of a series of PRs: Adding Chameleon model code.

BC-breaking (Optional)

None

Use cases (Optional)

We will add use case after the configuration PR so that we can show the training use case.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
  • CLA has been signed and all committers have signed the CLA in this PR.

@huangting4201 huangting4201 self-requested a review November 29, 2024 06:45
internlm/model/modeling_chameleon.py Outdated Show resolved Hide resolved
internlm/model/modeling_chameleon.py Outdated Show resolved Hide resolved
internlm/model/modeling_chameleon.py Outdated Show resolved Hide resolved
hidden_states = self.norm(hidden_states)

if hasattr(self, "output"):
hidden_states = self.output(hidden_states).float()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该不需要把输出额外转fp32类型,我们NaiveAMPModel里有一个参数output_to_fp32来负责做这件事

k_norm_out = self.k_norm(k_all)
k = split_forward_gather_backward(k_norm_out, ParallelMode.TENSOR, dim=-2)

v = rearrange(v, "b s (h d) -> b s h d", d=self.head_dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qk_norm这块可能要额外区分一下,如果is_using_isp()wp并行算法的话,就不需要走gather的逻辑了,因为isp算法forward时权重是完整的,算出来的qkv head也是完整的,不需要gather

k_norm_out = self.k_norm(k_all)
k = split_forward_gather_backward(k_norm_out, ParallelMode.TENSOR, dim=-2)

v = rearrange(v, "b s (h d) -> b s h d", d=self.head_dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块同上

internlm/model/ops/norm.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants