-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing experimental gradient accumulation API #8584
base: master
Are you sure you want to change the base?
Conversation
@rpsilva-aws do you plan on merging this into r2.6? |
@tengyifei Ideally, yes. It's perfectly fine for the 3-layer MLP, but we're seeing a small difference for Llama runs (difference being, from a previous local patch set that was just before cleaning some of the code), so we're just quickly identifying what it is. |
Okay, please aim to sort out all critical issues by Jan 21 if you're aiming for 2.6 so that we could review and cherrypick it by Jan 22. 2.6 release is quicking drawing in and I would like a few days to test all the builds. |
720d1e6
to
d6bfdd1
Compare
08831d6
to
567ccb5
Compare
4589eb2
to
dfbef15
Compare
dfbef15
to
83f0af1
Compare
In this PR, we introduce
experimental.gradient_accumulation
which leverages XLA'sWhile
op to accumulate gradients.