disable rope scaling for training, add yarn during export by h-guo18 · Pull Request #917 · NVIDIA/Model-Optimizer

h-guo18 · 2026-02-23T06:29:17Z

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2026-02-23T06:29:20Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-23T06:29:27Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch haoguo/eagle-rope

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: h-guo18 <[email protected]>

benchislett · 2026-02-23T14:35:01Z

modelopt/torch/export/plugins/hf_spec_export.py

+        template_config["rope_scaling"] = {
+            "rope_type": "yarn",
+            "rope_theta": 10000,
+            "factor": 32.0,


I'm not sure these are the best choices for rope theta and factor. I think these might depend on how long max_position_embeddings actually is.

Some testing may be required. Gpt Oss uses rope theta 150k, for example. This may be some tradeoff between short-context and long-context accuracy

theta=10k is the default from HF: ref

Actually my guess is that it should match the theta used in training.

factor should be a tradeoff I think.

benchislett · 2026-02-23T14:35:44Z

modelopt/torch/export/plugins/hf_spec_export.py

+        # and set yarn during export for inference.
+        template_config["rope_scaling"] = {
+            "rope_type": "yarn",
+            "rope_theta": 10000,


Pretty sure rope theta goes on the main config and not the rope scaling, and should be set the same for training/inference. Where did this template come from?

This is also a version difference. In transformers 4.x it stays outside, while in transformers 5 it has to be in "rope_scaling" field.

Ref:
transforemr 4.55: https://github.com/huggingface/transformers/blob/v4.55-release/src/transformers/modeling_rope_utils.py#L110

5.x:
https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_rope_utils.py#L634
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L111

benchislett · 2026-02-23T14:36:49Z

modelopt/torch/speculative/eagle/default_config.py

-        "rope_type": "llama3",
-    },
-    "rope_theta": 500000.0,
+    "rope_scaling": {"rope_type": "default", "rope_theta": 10000},


I think you can go further and actually just set rope scaling to null. Not sure if there's a difference in HF

Setting it to None triggers an error. We are using Llama definition from transformers 5.0 and it requires a rope type. "rope_type":"default" here will use the traditional rope without scaling.

ref: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L85

This is a transforemr verfsion difference. In transforemr 4.x it's ok to leave it None. We are using transformer 5 here.

h-guo18 requested a review from benchislett February 23, 2026 06:35

disable rope scaling for training, add yarn during export

cb2086a

Signed-off-by: h-guo18 <[email protected]>

h-guo18 force-pushed the haoguo/eagle-rope branch from dc86aca to cb2086a Compare February 23, 2026 06:50

benchislett reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

disable rope scaling for training, add yarn during export#917

disable rope scaling for training, add yarn during export#917
h-guo18 wants to merge 1 commit intohaoguo/eagle-exportfrom
haoguo/eagle-rope

h-guo18 commented Feb 23, 2026

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026

Review skipped

Uh oh!

benchislett Feb 23, 2026

Uh oh!

h-guo18 Feb 24, 2026

Uh oh!

benchislett Feb 23, 2026

Uh oh!

h-guo18 Feb 24, 2026

Uh oh!

benchislett Feb 23, 2026

Uh oh!

h-guo18 Feb 24, 2026 •

edited

Loading

Uh oh!

h-guo18 Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

h-guo18 commented Feb 23, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026

Review skipped

Uh oh!

benchislett Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

h-guo18 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

h-guo18 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

h-guo18 Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-guo18 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

h-guo18 Feb 24, 2026 •

edited

Loading