Skip to content

Comments

disable rope scaling for training, add yarn during export#917

Draft
h-guo18 wants to merge 1 commit intohaoguo/eagle-exportfrom
haoguo/eagle-rope
Draft

disable rope scaling for training, add yarn during export#917
h-guo18 wants to merge 1 commit intohaoguo/eagle-exportfrom
haoguo/eagle-rope

Conversation

@h-guo18
Copy link
Contributor

@h-guo18 h-guo18 commented Feb 23, 2026

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 23, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch haoguo/eagle-rope

Comment @coderabbitai help to get the list of available commands and usage tips.

@h-guo18 h-guo18 requested a review from benchislett February 23, 2026 06:35
template_config["rope_scaling"] = {
"rope_type": "yarn",
"rope_theta": 10000,
"factor": 32.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these are the best choices for rope theta and factor. I think these might depend on how long max_position_embeddings actually is.

Some testing may be required. Gpt Oss uses rope theta 150k, for example. This may be some tradeoff between short-context and long-context accuracy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theta=10k is the default from HF: ref

Actually my guess is that it should match the theta used in training.

factor should be a tradeoff I think.

# and set yarn during export for inference.
template_config["rope_scaling"] = {
"rope_type": "yarn",
"rope_theta": 10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure rope theta goes on the main config and not the rope scaling, and should be set the same for training/inference. Where did this template come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rope_type": "llama3",
},
"rope_theta": 500000.0,
"rope_scaling": {"rope_type": "default", "rope_theta": 10000},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can go further and actually just set rope scaling to null. Not sure if there's a difference in HF

Copy link
Contributor Author

@h-guo18 h-guo18 Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting it to None triggers an error. We are using Llama definition from transformers 5.0 and it requires a rope type. "rope_type":"default" here will use the traditional rope without scaling.

ref: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L85

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a transforemr verfsion difference. In transforemr 4.x it's ok to leave it None. We are using transformer 5 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants