Skip to content

Comments

Fix skip softmax defaults#923

Open
rohansjoshi wants to merge 1 commit intomainfrom
rohjoshi/sparse
Open

Fix skip softmax defaults#923
rohansjoshi wants to merge 1 commit intomainfrom
rohjoshi/sparse

Conversation

@rohansjoshi
Copy link

@rohansjoshi rohansjoshi commented Feb 24, 2026

Set default skip softmax attention sparsity settings (in SKIP_SOFTMAX_CALIB) to correct values: max_seqlen 16384 and prefill/decode target sparsity ratios of 0.5

Summary by CodeRabbit

  • Bug Fixes
    • Updated attention sparsity calibration parameters to reduce target sparse ratio from 0.9 to 0.5 and maximum sequence length from 65536 to 16384.

@rohansjoshi rohansjoshi requested a review from a team as a code owner February 24, 2026 00:51
@rohansjoshi rohansjoshi requested a review from RalphMao February 24, 2026 00:51
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

Configuration values updated in attention sparsity settings. Target sparse ratio reduced from 0.9 to 0.5 for both prefill and decode modes, and maximum sequence length reduced from 65536 to 16384.

Changes

Cohort / File(s) Summary
Configuration Updates
modelopt/torch/sparsity/attention_sparsity/config.py
Updated SKIP_SOFTMAX_CALIB sparse_cfg calibration parameters: reduced target_sparse_ratio values from 0.9 to 0.5 for prefill and decode, and decreased max_seqlen from 65536 to 16384.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix skip softmax defaults' directly and concisely describes the main change in the pull request—updating default values for skip softmax attention sparsity settings.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rohjoshi/sparse

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt/torch/sparsity/attention_sparsity/config.py (1)

404-404: max_seqlen in SKIP_SOFTMAX_CALIB now diverges from CalibrationConfig's default.

SKIP_SOFTMAX_CALIB sets max_seqlen = 16384, but CalibrationConfig.max_seqlen still defaults to 32768 (line 187). A user constructing CalibrationConfig() directly will get a different effective ceiling than a user relying on SKIP_SOFTMAX_CALIB. Consider aligning the two, or add a comment to SKIP_SOFTMAX_CALIB noting the deliberate divergence.

Additionally, for models commonly used with sequences > 16384 tokens (e.g., 32K/128K-context variants), the exponential threshold model will be extrapolating beyond its calibrated range, which may degrade calibration quality at those lengths.

💡 Aligning `CalibrationConfig.max_seqlen` default with `SKIP_SOFTMAX_CALIB`
     max_seqlen: int = ModeloptField(
-        default=32768,
+        default=16384,
         title="Maximum sequence length",
         description="Maximum sequence length for calibration (length bins auto-generated as powers of 2).",
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/sparsity/attention_sparsity/config.py` at line 404,
SKIP_SOFTMAX_CALIB sets "max_seqlen = 16384" but CalibrationConfig.max_seqlen
defaults to 32768, causing inconsistent ceilings; update them to match or
document the intentional divergence. Fix by either (A) changing
CalibrationConfig.max_seqlen default to 16384 to align with SKIP_SOFTMAX_CALIB,
or (B) updating the SKIP_SOFTMAX_CALIB entry to set max_seqlen =
CalibrationConfig.max_seqlen (or add a comment on why 16384 is intentionally
lower), and add a comment on the extrapolation risk for contexts >16384 so
callers know calibration may degrade; reference the identifiers
SKIP_SOFTMAX_CALIB and CalibrationConfig.max_seqlen when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/sparsity/attention_sparsity/config.py`:
- Line 404: SKIP_SOFTMAX_CALIB sets "max_seqlen = 16384" but
CalibrationConfig.max_seqlen defaults to 32768, causing inconsistent ceilings;
update them to match or document the intentional divergence. Fix by either (A)
changing CalibrationConfig.max_seqlen default to 16384 to align with
SKIP_SOFTMAX_CALIB, or (B) updating the SKIP_SOFTMAX_CALIB entry to set
max_seqlen = CalibrationConfig.max_seqlen (or add a comment on why 16384 is
intentionally lower), and add a comment on the extrapolation risk for contexts
>16384 so callers know calibration may degrade; reference the identifiers
SKIP_SOFTMAX_CALIB and CalibrationConfig.max_seqlen when applying the change.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52e662d and d0c0f51.

📒 Files selected for processing (1)
  • modelopt/torch/sparsity/attention_sparsity/config.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant