example demonstrating how to train CosmosReason2 Eagle3#920
example demonstrating how to train CosmosReason2 Eagle3#920skierat wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughIntroduces a complete training workflow for EAGLE3 speculative decoding, including a configuration file, Jupyter notebook documenting the training process, and a data preparation utility for ingesting Nemotron chat conversations into a standardized format. Changes
Sequence DiagramsequenceDiagram
actor User
participant Notebook as Training Notebook
participant DataPrep as add_nemotron_chat.py
participant HF as Hugging Face Hub
participant Training as launch_train.sh
participant Model as Eagle Model
User->>Notebook: Execute training workflow
Notebook->>DataPrep: Prepare input conversations
DataPrep->>HF: Resolve Nemotron dataset URLs
HF-->>DataPrep: Return parquet shard URLs
DataPrep->>DataPrep: Parse conversations (mapping/streaming mode)
DataPrep-->>Notebook: Output standardized conversations
Notebook->>Notebook: Inspect CR2_eagle_config.json
Notebook->>Notebook: Optional: calibrate draft vocabulary
Notebook->>Training: Launch training with accelerate/FSDP
Training->>Model: Train Eagle head on Cosmos-Reason2-8B
Model-->>Training: Trained checkpoint
Training-->>Notebook: Export to HF-compatible format
Notebook-->>User: Training complete, ready for deployment
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb (2)
90-96:os.chdir("..")is fragile and depends on the Jupyter kernel's working directory.This assumes the notebook is opened/run with cwd =
guides/. If a user opens Jupyter from thespeculative_decoding/root (the common default),os.chdir("..")navigates one level above the intended directory, breaking every subsequent relative path (prepare_input_conversations/add_nemotron_chat.py,guides/CR2_eagle_config.json, etc.).Consider anchoring relative to the notebook file itself:
♻️ Proposed fix
-import os - -# Run from the speculative_decoding root so that relative paths resolve correctly. -os.chdir("..") -print("Working directory:", os.getcwd()) +import os +from pathlib import Path + +# Navigate to the speculative_decoding root regardless of where Jupyter was launched. +os.chdir(Path("__file__").parent.parent if "__file__" in dir() else Path.cwd().parent) +print("Working directory:", os.getcwd())Or more robustly, use an absolute path derived from
globals()["_dh"][0](the notebook's directory as set by Jupyter):import os from pathlib import Path notebook_dir = Path(globals().get("_dh", [Path.cwd()])[0]) os.chdir(notebook_dir.parent) print("Working directory:", os.getcwd())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb` around lines 90 - 96, The current os.chdir("..") is fragile; replace it by deriving the notebook directory from Jupyter’s runtime state and changing to its parent: obtain notebook_dir = Path(globals().get("_dh", [Path.cwd()])[0]) (or equivalent using globals()["_dh"][0] with a safe fallback), then call os.chdir(notebook_dir.parent) and keep the existing os.getcwd() print; update imports to include pathlib.Path if not present and remove the hardcoded os.chdir("..") call (refer to os.chdir, os.getcwd, globals(), and _dh in the notebook).
162-166:EAGLE_CONFIGis defined twice with the same value.Lines 163–165 (Step 4) and line 236 (Step 6 setup cell) both assign
EAGLE_CONFIG = "guides/CR2_eagle_config.json". The second definition is harmless but creates an implicit assumption that users must run Step 4 first, and it makes the notebook slightly harder to follow. The Step 4 definition is sufficient; the Step 6 cell comment makes it clear the value is passed tolaunch_train.sh.Also applies to: 235-237
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb` around lines 162 - 166, Remove the duplicate assignment of EAGLE_CONFIG by keeping the first definition (EAGLE_CONFIG = "guides/CR2_eagle_config.json") and deleting the later, redundant re-assignment in the subsequent cell; ensure any comments in that later cell reference the existing variable rather than redefining it so users can run cells independently without relying on a prior re-definition of EAGLE_CONFIG.examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py (1)
166-166:async def maincontains noawait— drop theasynckeyword.Every I/O operation inside (
open(),load_dataset(),tqdm) is synchronous. Theasyncio.run()trampoline adds overhead and implies async behavior that doesn't exist.♻️ Proposed fix
-async def main(args: argparse.Namespace) -> None: +def main(args: argparse.Namespace) -> None:- asyncio.run(main(args)) + main(args)And remove the
import asynciosince it would no longer be needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py` at line 166, The function main is declared async but contains no await and should be a regular synchronous function: change "async def main(...)" to "def main(...)", remove the unused "import asyncio" and replace any invocation of "asyncio.run(main(...))" with a direct call to "main(...)" (references: main, asyncio.run, import asyncio, and the synchronous calls open(), load_dataset(), tqdm).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb`:
- Around line 315-327: The notebook declares nbformat_minor: 5 so each cell must
include a unique "id" field; add a short unique string "id" to the missing cells
named in the review (Step 1 markdown, Step 1 install code, Step 2 markdown, Step
2 login code, and Step 7 export code block) so their JSON objects include an
"id": "<unique_id>" property; ensure each id is unique within the notebook
(e.g., "step1-md-1", "step1-install-1", "step2-md-1", "step2-login-1",
"step7-export-1") to satisfy nbformat 4.5 validation and avoid papermill/nbmake
failures.
In
`@examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py`:
- Around line 87-98: The mapping-file reader currently calls int(line.strip())
for each CSV row which will raise ValueError on headers or non-numeric lines;
update the logic that processes the --mapping-file input (the
parser.add_argument('--mapping-file', ...) and the code block that converts file
lines to ints) to robustly skip blank lines and non-numeric rows — either by
wrapping int(...) in a try/except ValueError and ignoring/logging invalid lines
or by filtering with a numeric check (e.g., str.strip().lstrip('-').isdigit()),
and also update the --mapping-file help text to explicitly state the expected
format (one 0-based integer per line, no header) so callers know the
requirement.
---
Nitpick comments:
In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb`:
- Around line 90-96: The current os.chdir("..") is fragile; replace it by
deriving the notebook directory from Jupyter’s runtime state and changing to its
parent: obtain notebook_dir = Path(globals().get("_dh", [Path.cwd()])[0]) (or
equivalent using globals()["_dh"][0] with a safe fallback), then call
os.chdir(notebook_dir.parent) and keep the existing os.getcwd() print; update
imports to include pathlib.Path if not present and remove the hardcoded
os.chdir("..") call (refer to os.chdir, os.getcwd, globals(), and _dh in the
notebook).
- Around line 162-166: Remove the duplicate assignment of EAGLE_CONFIG by
keeping the first definition (EAGLE_CONFIG = "guides/CR2_eagle_config.json") and
deleting the later, redundant re-assignment in the subsequent cell; ensure any
comments in that later cell reference the existing variable rather than
redefining it so users can run cells independently without relying on a prior
re-definition of EAGLE_CONFIG.
In
`@examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py`:
- Line 166: The function main is declared async but contains no await and should
be a regular synchronous function: change "async def main(...)" to "def
main(...)", remove the unused "import asyncio" and replace any invocation of
"asyncio.run(main(...))" with a direct call to "main(...)" (references: main,
asyncio.run, import asyncio, and the synchronous calls open(), load_dataset(),
tqdm).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
examples/speculative_decoding/guides/nemotron_mapping.csvis excluded by!**/*.csv
📒 Files selected for processing (3)
examples/speculative_decoding/guides/CR2_eagle_config.jsonexamples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynbexamples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python 3", | ||
| "language": "python", | ||
| "name": "python3" | ||
| }, | ||
| "language_info": { | ||
| "name": "python", | ||
| "version": "3.10.0" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 |
There was a problem hiding this comment.
Several cells are missing the id field required by nbformat 4.5.
nbformat_minor: 5 declares the notebook as nbformat 4.5, which mandates that every cell carry a unique id string. Cells at the Step 1 markdown, Step 1 install code, Step 2 markdown, Step 2 login code, and the Step 7 export code block are all missing id fields. nbformat.validate() will fail with a ValidationError for each missing cell ID, which will break papermill, nbmake, and similar CI-validation tools.
Add a unique id to each affected cell, e.g.:
{
"cell_type": "markdown",
"id": "a1b2c3d4",
...
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb`
around lines 315 - 327, The notebook declares nbformat_minor: 5 so each cell
must include a unique "id" field; add a short unique string "id" to the missing
cells named in the review (Step 1 markdown, Step 1 install code, Step 2
markdown, Step 2 login code, and Step 7 export code block) so their JSON objects
include an "id": "<unique_id>" property; ensure each id is unique within the
notebook (e.g., "step1-md-1", "step1-install-1", "step2-md-1", "step2-login-1",
"step7-export-1") to satisfy nbformat 4.5 validation and avoid papermill/nbmake
failures.
| parser.add_argument( | ||
| "--mapping-file", | ||
| type=Path, | ||
| default=None, | ||
| help=( | ||
| "Path to a CSV file containing one 0-based dataset row index per line, " | ||
| "e.g.:\n 429801\n 271023\n 432477\n" | ||
| "Rows are loaded in the order they appear in the file. " | ||
| "When provided, the dataset is downloaded (not streamed) to allow random access " | ||
| "and --max-samples is ignored." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Mapping file parsed as bare integers — will crash with ValueError if the CSV has a header row.
The --mapping-file argument is documented as "a CSV file" and references guides/nemotron_mapping.csv, a .csv-extension file. The parsing at lines 184–186 does int(line.strip()) unconditionally; any non-numeric line (e.g., a column header like "index") raises ValueError: invalid literal for int().
🛡️ Proposed fix — skip non-numeric lines and document the expected format precisely
with args.mapping_file.open("r", encoding="utf-8") as f:
ordered_source_indices: list[int] = [
- int(line.strip()) for line in f if line.strip()
+ int(line.strip()) for line in f if line.strip().isdigit()
]Alternatively, update the help text to explicitly state that the file must contain no header — one 0-based integer per line — and rename it with a .txt extension to eliminate ambiguity.
Also applies to: 183-186
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@examples/speculative_decoding/prepare_input_conversations/add_nemotron_chat.py`
around lines 87 - 98, The mapping-file reader currently calls int(line.strip())
for each CSV row which will raise ValueError on headers or non-numeric lines;
update the logic that processes the --mapping-file input (the
parser.add_argument('--mapping-file', ...) and the code block that converts file
lines to ints) to robustly skip blank lines and non-numeric rows — either by
wrapping int(...) in a try/except ValueError and ignoring/logging invalid lines
or by filtering with a numeric check (e.g., str.strip().lstrip('-').isdigit()),
and also update the --mapping-file help text to explicitly state the expected
format (one 0-based integer per line, no header) so callers know the
requirement.
What does this PR do?
Type of change: New example
Overview:
Adds examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb, a step-by-step Jupyter notebook that walks through the full EAGLE3 draft-head training workflow for nvidia/Cosmos-Reason2-8B. The notebook covers:
Also includes guides/nemotron_mapping.csv and guides/CR2_eagle_config.json as companion files.
Usage
Open and run examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb cell by cell. After training, serve the exported checkpoint with:
Testing
Tested end-to-end on a 4xB100 GPUs. The exported checkpoint was validated with specdec_bench
Before your PR is "Ready for review"
Additional Information
Cosmos-Reason2-8B requires at least one 80 GB GPU (H100/A100)
Summary by CodeRabbit
Release Notes