example demonstrating how to train CosmosReason2 Eagle3 by skierat · Pull Request #920 · NVIDIA/Model-Optimizer

skierat · 2026-02-23T15:16:08Z

What does this PR do?

Type of change: New example

Overview:
Adds examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb, a step-by-step Jupyter notebook that walks through the full EAGLE3 draft-head training workflow for nvidia/Cosmos-Reason2-8B. The notebook covers:

Installing dependencies
Authenticating with Hugging Face
Preparing training data from the Nemotron-Post-Training-Dataset-v2 (chat split) using a curated row-selection mapping (guides/nemotron_mapping.csv)
Inspecting the bundled EAGLE3 config (guides/CR2_eagle_config.json) tuned for Cosmos-Reason2 (YaRN RoPE, FlexAttention, reduced draft vocabulary)
(Optional) Calibrating the draft vocabulary to 32k tokens for faster training and inference
Launching training via launch_train.sh with FSDP2 multi-GPU support
Exporting the checkpoint to HF format and serving with vLLM

Also includes guides/nemotron_mapping.csv and guides/CR2_eagle_config.json as companion files.

Usage

Open and run examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb cell by cell. After training, serve the exported checkpoint with:

vllm serve nvidia/Cosmos-Reason2-8B \    --host 0.0.0.0 \    --port 8000 \    --speculative-model export/cosmos-reason2-8b-eagle3 \    --num-speculative-tokens 3 \    --dtype bfloat16

Testing

Tested end-to-end on a 4xB100 GPUs. The exported checkpoint was validated with specdec_bench

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Yes (the notebook is self-documenting)
Did you update Changelog?: No

Additional Information

Cosmos-Reason2-8B requires at least one 80 GB GPU (H100/A100)

Summary by CodeRabbit

Release Notes

New Features
- Added comprehensive training guide and configuration for speculative decoding on Cosmos-Reason2-8B models
- Introduced dataset preparation utility to convert Nemotron chat conversations into standardized format
- Included example configuration and training instructions for EAGLE3 draft model acceleration

copy-pr-bot · 2026-02-23T15:16:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-23T15:16:30Z

📝 Walkthrough

Walkthrough

Introduces a complete training workflow for EAGLE3 speculative decoding, including a configuration file, Jupyter notebook documenting the training process, and a data preparation utility for ingesting Nemotron chat conversations into a standardized format.

Changes

Cohort / File(s)	Summary
Configuration & Training Guide `examples/speculative_decoding/guides/CR2_eagle_config.json`, `examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb`	New Eagle model configuration with hyperparameters (vocab size, RoPE scaling, attention implementation) and comprehensive Jupyter notebook documenting end-to-end training workflow including dependencies, data preparation, model configuration inspection, calibration, training execution via accelerate/FSDP, and checkpoint export.
Data Preparation Utility `examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py`	New script to ingest Nemotron-Post-Training-Dataset-v2 chat data with dual access modes (mapping-based and streaming), conversation parsing with role/content field handling, unique conversation ID generation, and CLI with configurable output options.

Sequence Diagram

sequenceDiagram
    actor User
    participant Notebook as Training Notebook
    participant DataPrep as add_nemotron_chat.py
    participant HF as Hugging Face Hub
    participant Training as launch_train.sh
    participant Model as Eagle Model

    User->>Notebook: Execute training workflow
    Notebook->>DataPrep: Prepare input conversations
    DataPrep->>HF: Resolve Nemotron dataset URLs
    HF-->>DataPrep: Return parquet shard URLs
    DataPrep->>DataPrep: Parse conversations (mapping/streaming mode)
    DataPrep-->>Notebook: Output standardized conversations
    Notebook->>Notebook: Inspect CR2_eagle_config.json
    Notebook->>Notebook: Optional: calibrate draft vocabulary
    Notebook->>Training: Launch training with accelerate/FSDP
    Training->>Model: Train Eagle head on Cosmos-Reason2-8B
    Model-->>Training: Trained checkpoint
    Training-->>Notebook: Export to HF-compatible format
    Notebook-->>User: Training complete, ready for deployment

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding an example demonstrating EAGLE3 training on the Cosmos-Reason2 model, which aligns with all substantive changes in the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb (2)
90-96: os.chdir("..") is fragile and depends on the Jupyter kernel's working directory.

This assumes the notebook is opened/run with cwd = guides/. If a user opens Jupyter from the speculative_decoding/ root (the common default), os.chdir("..") navigates one level above the intended directory, breaking every subsequent relative path (prepare_input_conversations/add_nemotron_chat.py, guides/CR2_eagle_config.json, etc.).

Consider anchoring relative to the notebook file itself:
♻️ Proposed fix
-import os
-
-# Run from the speculative_decoding root so that relative paths resolve correctly.
-os.chdir("..")
-print("Working directory:", os.getcwd())
+import os
+from pathlib import Path
+
+# Navigate to the speculative_decoding root regardless of where Jupyter was launched.
+os.chdir(Path("__file__").parent.parent if "__file__" in dir() else Path.cwd().parent)
+print("Working directory:", os.getcwd())
Or more robustly, use an absolute path derived from globals()["_dh"][0] (the notebook's directory as set by Jupyter):
import os
from pathlib import Path

notebook_dir = Path(globals().get("_dh", [Path.cwd()])[0])
os.chdir(notebook_dir.parent)
print("Working directory:", os.getcwd())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb`
around lines 90 - 96, The current os.chdir("..") is fragile; replace it by
deriving the notebook directory from Jupyter’s runtime state and changing to its
parent: obtain notebook_dir = Path(globals().get("_dh", [Path.cwd()])[0]) (or
equivalent using globals()["_dh"][0] with a safe fallback), then call
os.chdir(notebook_dir.parent) and keep the existing os.getcwd() print; update
imports to include pathlib.Path if not present and remove the hardcoded
os.chdir("..") call (refer to os.chdir, os.getcwd, globals(), and _dh in the
notebook).
162-166: EAGLE_CONFIG is defined twice with the same value.

Lines 163–165 (Step 4) and line 236 (Step 6 setup cell) both assign EAGLE_CONFIG = "guides/CR2_eagle_config.json". The second definition is harmless but creates an implicit assumption that users must run Step 4 first, and it makes the notebook slightly harder to follow. The Step 4 definition is sufficient; the Step 6 cell comment makes it clear the value is passed to launch_train.sh.

Also applies to: 235-237
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb`
around lines 162 - 166, Remove the duplicate assignment of EAGLE_CONFIG by
keeping the first definition (EAGLE_CONFIG = "guides/CR2_eagle_config.json") and
deleting the later, redundant re-assignment in the subsequent cell; ensure any
comments in that later cell reference the existing variable rather than
redefining it so users can run cells independently without relying on a prior
re-definition of EAGLE_CONFIG.
examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py (1)
166-166: async def main contains no await — drop the async keyword.

Every I/O operation inside (open(), load_dataset(), tqdm) is synchronous. The asyncio.run() trampoline adds overhead and implies async behavior that doesn't exist.
♻️ Proposed fix
-async def main(args: argparse.Namespace) -> None:
+def main(args: argparse.Namespace) -> None:
-    asyncio.run(main(args))
+    main(args)
And remove the import asyncio since it would no longer be needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py`
at line 166, The function main is declared async but contains no await and
should be a regular synchronous function: change "async def main(...)" to "def
main(...)", remove the unused "import asyncio" and replace any invocation of
"asyncio.run(main(...))" with a direct call to "main(...)" (references: main,
asyncio.run, import asyncio, and the synchronous calls open(), load_dataset(),
tqdm).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb`:
- Around line 315-327: The notebook declares nbformat_minor: 5 so each cell must
include a unique "id" field; add a short unique string "id" to the missing cells
named in the review (Step 1 markdown, Step 1 install code, Step 2 markdown, Step
2 login code, and Step 7 export code block) so their JSON objects include an
"id": "<unique_id>" property; ensure each id is unique within the notebook
(e.g., "step1-md-1", "step1-install-1", "step2-md-1", "step2-login-1",
"step7-export-1") to satisfy nbformat 4.5 validation and avoid papermill/nbmake
failures.

In
`@examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py`:
- Around line 87-98: The mapping-file reader currently calls int(line.strip())
for each CSV row which will raise ValueError on headers or non-numeric lines;
update the logic that processes the --mapping-file input (the
parser.add_argument('--mapping-file', ...) and the code block that converts file
lines to ints) to robustly skip blank lines and non-numeric rows — either by
wrapping int(...) in a try/except ValueError and ignoring/logging invalid lines
or by filtering with a numeric check (e.g., str.strip().lstrip('-').isdigit()),
and also update the --mapping-file help text to explicitly state the expected
format (one 0-based integer per line, no header) so callers know the
requirement.

---

Nitpick comments:
In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb`:
- Around line 90-96: The current os.chdir("..") is fragile; replace it by
deriving the notebook directory from Jupyter’s runtime state and changing to its
parent: obtain notebook_dir = Path(globals().get("_dh", [Path.cwd()])[0]) (or
equivalent using globals()["_dh"][0] with a safe fallback), then call
os.chdir(notebook_dir.parent) and keep the existing os.getcwd() print; update
imports to include pathlib.Path if not present and remove the hardcoded
os.chdir("..") call (refer to os.chdir, os.getcwd, globals(), and _dh in the
notebook).
- Around line 162-166: Remove the duplicate assignment of EAGLE_CONFIG by
keeping the first definition (EAGLE_CONFIG = "guides/CR2_eagle_config.json") and
deleting the later, redundant re-assignment in the subsequent cell; ensure any
comments in that later cell reference the existing variable rather than
redefining it so users can run cells independently without relying on a prior
re-definition of EAGLE_CONFIG.

In
`@examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py`:
- Line 166: The function main is declared async but contains no await and should
be a regular synchronous function: change "async def main(...)" to "def
main(...)", remove the unused "import asyncio" and replace any invocation of
"asyncio.run(main(...))" with a direct call to "main(...)" (references: main,
asyncio.run, import asyncio, and the synchronous calls open(), load_dataset(),
tqdm).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02fa362 and 12fda6a.

⛔ Files ignored due to path filters (1)

examples/speculative_decoding/guides/nemotron_mapping.csv is excluded by !**/*.csv

📒 Files selected for processing (3)

examples/speculative_decoding/guides/CR2_eagle_config.json
examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb
examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py

coderabbitai · 2026-02-23T15:24:57Z

examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb

+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.10.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5


⚠️ Potential issue | 🟡 Minor

Several cells are missing the id field required by nbformat 4.5.

nbformat_minor: 5 declares the notebook as nbformat 4.5, which mandates that every cell carry a unique id string. Cells at the Step 1 markdown, Step 1 install code, Step 2 markdown, Step 2 login code, and the Step 7 export code block are all missing id fields. nbformat.validate() will fail with a ValidationError for each missing cell ID, which will break papermill, nbmake, and similar CI-validation tools.

Add a unique id to each affected cell, e.g.:

{ "cell_type": "markdown", "id": "a1b2c3d4", ... }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb` around lines 315 - 327, The notebook declares nbformat_minor: 5 so each cell must include a unique "id" field; add a short unique string "id" to the missing cells named in the review (Step 1 markdown, Step 1 install code, Step 2 markdown, Step 2 login code, and Step 7 export code block) so their JSON objects include an "id": "<unique_id>" property; ensure each id is unique within the notebook (e.g., "step1-md-1", "step1-install-1", "step2-md-1", "step2-login-1", "step7-export-1") to satisfy nbformat 4.5 validation and avoid papermill/nbmake failures.

coderabbitai · 2026-02-23T15:24:57Z

examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py

+    parser.add_argument(
+        "--mapping-file",
+        type=Path,
+        default=None,
+        help=(
+            "Path to a CSV file containing one 0-based dataset row index per line, "
+            "e.g.:\n  429801\n  271023\n  432477\n"
+            "Rows are loaded in the order they appear in the file. "
+            "When provided, the dataset is downloaded (not streamed) to allow random access "
+            "and --max-samples is ignored."
+        ),
+    )


⚠️ Potential issue | 🟡 Minor

Mapping file parsed as bare integers — will crash with ValueError if the CSV has a header row.

The --mapping-file argument is documented as "a CSV file" and references guides/nemotron_mapping.csv, a .csv-extension file. The parsing at lines 184–186 does int(line.strip()) unconditionally; any non-numeric line (e.g., a column header like "index") raises ValueError: invalid literal for int().

🛡️ Proposed fix — skip non-numeric lines and document the expected format precisely

with args.mapping_file.open("r", encoding="utf-8") as f: ordered_source_indices: list[int] = [ - int(line.strip()) for line in f if line.strip() + int(line.strip()) for line in f if line.strip().isdigit() ]

Alternatively, update the help text to explicitly state that the file must contain no header — one 0-based integer per line — and rename it with a .txt extension to eliminate ambiguity.

Also applies to: 183-186

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py` around lines 87 - 98, The mapping-file reader currently calls int(line.strip()) for each CSV row which will raise ValueError on headers or non-numeric lines; update the logic that processes the --mapping-file input (the parser.add_argument('--mapping-file', ...) and the code block that converts file lines to ints) to robustly skip blank lines and non-numeric rows — either by wrapping int(...) in a try/except ValueError and ignoring/logging invalid lines or by filtering with a numeric check (e.g., str.strip().lstrip('-').isdigit()), and also update the --mapping-file help text to explicitly state the expected format (one 0-based integer per line, no header) so callers know the requirement.

skierat added 3 commits February 18, 2026 16:54

Add support for Nemotron dataset

6530f24

Merge branch 'main' into skierat/CR2_BYOC

008b004

Add Jupyter notebook guide for training Eagle on the CosmosReason2 model

12fda6a

skierat requested a review from a team as a code owner February 23, 2026 15:16

skierat requested a review from h-guo18 February 23, 2026 15:16

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

example demonstrating how to train CosmosReason2 Eagle3#920

example demonstrating how to train CosmosReason2 Eagle3#920
skierat wants to merge 3 commits intoNVIDIA:mainfrom
skierat:skierat/CR2_BYOC

skierat commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

skierat commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

skierat commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading