Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which version of transformer engine should I use, when I try to open ub_tp_comm_overlap? #11683

Open
sallyjunjun opened this issue Dec 20, 2024 · 0 comments
Assignees

Comments

@sallyjunjun
Copy link

I am using NeMo with version v2.0.0rc0. When I set ub_tp_comm_overlap to true with tp and sp 2, I met the following error:
Image

The version of transformer engine 1.6.0+c81733f. Should I update to newer te version?

When I update transformer engine to 1.13.0+e5edd6c. There occurs another error in NeMo:
Image

CUDA 11.8 should be used in NeMo. But transformer engine in 1.13.0+e5edd6c version requires CUDA newer than 12.0.

I'm stuck in these version issues.
Could you please tell me which version of NeMo and TE and CUDA should I use to enable tp_comm_overlap feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants