feat(comm/attn_offload.py): support selective ckpt and cpu offload #383

huangting4201 · 2024-12-03T09:59:01Z

Motivation

Support selective checkpoint and cpu offload asynchronously to improve performance.

When enabling checkpointing, if selective_checkpoint is set to True, for layers that are recomputed, storing the intermediate activations of the attention part allows the attention part to be skipped during recomputation, thereby enhancing training performance.
If selective_checkpoint_offload is further set to True, the intermediate activations of the attention part for the recomputed layers will be asynchronously offloaded to the CPU to save GPU memory.

However, it should be noted that current testing has revealed that when selective_checkpoint_offload is set to True, the DtoH and HtoD operations compete with allgather and other communications for bandwidth, leading to increased allgather communication times and a consequent decline in overall performance. Therefore, it is advisable to avoid enabling selective_checkpoint_offload when it is not necessary.

Modification

internlm/core/parallel/comm/attn_offload.py: AttnOffloadManager, a manager for attention output CPU offloading and GPU prefetch loading.

Use cases (Optional)

example config：

selective_checkpoint = True
selective_checkpoint_offload = False
model = dict(
    num_chunks=1,  # if num_chunks > 1, interleaved pipeline scheduler is used.
    checkpoint=1,  # The proportion of layers for activation aheckpointing, the optional value are True/False/[0-1]
    ......
)

note：should be used with isp, and only GQA is supported now

loss accuracy checking

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

internlm/core/parallel/comm/attn_offload.py

internlm/core/parallel/comm/isp.py

internlm/core/parallel/comm/attn_offload.py

internlm/model/ops/_flash_attn.py

huangting4201 · 2024-12-05T08:18:18Z

依赖pr #381

feat(comm/attn_offload.py): support selective ckpt and cpu offload

88a08d0

mm-assistant bot assigned yhcc Dec 3, 2024

huangting4201 assigned sunpengsdu and unassigned yhcc Dec 3, 2024

huangting4201 requested a review from yingtongxiong December 3, 2024 10:02

feat(comm/attn_offload.py): fix ci lint err

230d81d

huangting4201 closed this Dec 3, 2024

huangting4201 reopened this Dec 3, 2024

yingtongxiong reviewed Dec 5, 2024

View reviewed changes

internlm/core/parallel/comm/attn_offload.py Show resolved Hide resolved

internlm/core/parallel/comm/isp.py Show resolved Hide resolved

internlm/core/parallel/comm/attn_offload.py Outdated Show resolved Hide resolved

yingtongxiong reviewed Dec 5, 2024

View reviewed changes

internlm/model/ops/_flash_attn.py Show resolved Hide resolved

huangting4201 added 3 commits December 10, 2024 19:31

feat(attn_offload.py): update attn offload manager

d195810

Merge branch 'develop' into feat/selective-ckpt-cpu-offload

132f34c

fix(conflicts): fix conflicts from merging develop

850dec6

huangting4201 mentioned this pull request Dec 17, 2024

feat(cpu_offload.py): support selective layers' activation cpu offload async #391

Merged

6 tasks

yingtongxiong approved these changes Dec 18, 2024

View reviewed changes

sunpengsdu approved these changes Dec 31, 2024

View reviewed changes

sunpengsdu merged commit e3f5001 into InternLM:develop Dec 31, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(comm/attn_offload.py): support selective ckpt and cpu offload #383

feat(comm/attn_offload.py): support selective ckpt and cpu offload #383

Uh oh!

huangting4201 commented Dec 3, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

huangting4201 commented Dec 5, 2024

Uh oh!

Uh oh!

Uh oh!

feat(comm/attn_offload.py): support selective ckpt and cpu offload #383

feat(comm/attn_offload.py): support selective ckpt and cpu offload #383

Uh oh!

Conversation

huangting4201 commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Use cases (Optional)

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

huangting4201 commented Dec 5, 2024

Uh oh!

Uh oh!

Uh oh!

huangting4201 commented Dec 3, 2024 •

edited

Loading