disable rope scaling for training, add yarn during export#917
disable rope scaling for training, add yarn during export#917h-guo18 wants to merge 1 commit intohaoguo/eagle-exportfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: h-guo18 <[email protected]>
dc86aca to
cb2086a
Compare
| template_config["rope_scaling"] = { | ||
| "rope_type": "yarn", | ||
| "rope_theta": 10000, | ||
| "factor": 32.0, |
There was a problem hiding this comment.
I'm not sure these are the best choices for rope theta and factor. I think these might depend on how long max_position_embeddings actually is.
Some testing may be required. Gpt Oss uses rope theta 150k, for example. This may be some tradeoff between short-context and long-context accuracy
There was a problem hiding this comment.
theta=10k is the default from HF: ref
Actually my guess is that it should match the theta used in training.
factor should be a tradeoff I think.
| # and set yarn during export for inference. | ||
| template_config["rope_scaling"] = { | ||
| "rope_type": "yarn", | ||
| "rope_theta": 10000, |
There was a problem hiding this comment.
Pretty sure rope theta goes on the main config and not the rope scaling, and should be set the same for training/inference. Where did this template come from?
There was a problem hiding this comment.
This is also a version difference. In transformers 4.x it stays outside, while in transformers 5 it has to be in "rope_scaling" field.
Ref:
transforemr 4.55: https://github.com/huggingface/transformers/blob/v4.55-release/src/transformers/modeling_rope_utils.py#L110
5.x:
https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_rope_utils.py#L634
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L111
| "rope_type": "llama3", | ||
| }, | ||
| "rope_theta": 500000.0, | ||
| "rope_scaling": {"rope_type": "default", "rope_theta": 10000}, |
There was a problem hiding this comment.
I think you can go further and actually just set rope scaling to null. Not sure if there's a difference in HF
There was a problem hiding this comment.
Setting it to None triggers an error. We are using Llama definition from transformers 5.0 and it requires a rope type. "rope_type":"default" here will use the traditional rope without scaling.
There was a problem hiding this comment.
This is a transforemr verfsion difference. In transforemr 4.x it's ok to leave it None. We are using transformer 5 here.
What does this PR do?
Type of change: ?
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information