-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize DoRA in eval
and no dropout
#2122
Merged
BenjaminBossan
merged 17 commits into
huggingface:main
from
ariG23498:aritra/optimize-dora
Oct 16, 2024
Merged
Changes from 6 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
9593abf
whether dropout
ariG23498 ba60f0b
chore: adding optimized code for dora
ariG23498 1a01895
formating
ariG23498 4b3af88
chore: adding bias
ariG23498 e61fb97
chore: refactor code
ariG23498 3506f39
chore: adding comments
ariG23498 5b71f6e
change formulation
ariG23498 1f0888c
changes
c8a2700
variable naming
09a1958
chore: review suggestions
24d53c1
chore: fix variable name
ariG23498 4f76aa5
Merge branch 'main' into aritra/optimize-dora
aab5e19
chore: adding dora changed to bnb and hqq
ariG23498 78136f7
chore: style
ariG23498 109cd89
chore: add dora changes to tp and add docs
ariG23498 9f39c51
adding benchmark results to docs
ariG23498 0e68aba
update docs
ariG23498 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, wait, should this not be the exact same calculation as in line 103? I.e. we should leave the condition after calculating the
base_result
and then do the same calculation ofdora_result
for both cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that I follow. With the
base_result
in place:dora_result
where the scale thebase_result
withmag_norm_scale
But without the
base_result
:base_result
with the linear forwarddora_result
where we scale thebase_result
with(1 - mag_norm_scale)
Aren't they going to be different for each case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I'm a bit confused, let's try to resolve this.
In the old code, we basically have:
dora_result = (mag_norm_scale - 1) * base_result + mag_norm_scale * lora_result * lora_scale
variable names slightly changed for clarity
My thinking is that the
base_result
is either calculated right there (old code) or we use thebase_result
that is being passed as an argument, but the basic equation stays the same.Of course, as you correctly noted, the bias needs to be subtracted first and then added back in the latter case.
In the currently proposed code, in one case we calculate
mag_norm_scale * base_result
and in the other(mag_norm_scale - 1) * base_result
. This looks inconsistent to me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @BenjaminBossan
I have made the changes as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change is still not correct. Here is my suggestion:
This way, if
base_result = None
, the computation is exactly the same as it was previously.I believe the confusion may stem from my comment:
# result = mag_norm_scale * result + mag_norm_scale * lora_B(lora_A(x)) * scaling
This comment should have been:
# result = (mag_norm_scale - 1) * result + mag_norm_scale * lora_B(lora_A(x)) * scaling
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense!
About the unit test -- do you want me to add a new test file? Or add a test somewhere?