Conversation
📝 WalkthroughWalkthroughConfiguration values updated in attention sparsity settings. Target sparse ratio reduced from 0.9 to 0.5 for both prefill and decode modes, and maximum sequence length reduced from 65536 to 16384. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/sparsity/attention_sparsity/config.py (1)
404-404:max_seqleninSKIP_SOFTMAX_CALIBnow diverges fromCalibrationConfig's default.
SKIP_SOFTMAX_CALIBsetsmax_seqlen = 16384, butCalibrationConfig.max_seqlenstill defaults to32768(line 187). A user constructingCalibrationConfig()directly will get a different effective ceiling than a user relying onSKIP_SOFTMAX_CALIB. Consider aligning the two, or add a comment toSKIP_SOFTMAX_CALIBnoting the deliberate divergence.Additionally, for models commonly used with sequences > 16384 tokens (e.g., 32K/128K-context variants), the exponential threshold model will be extrapolating beyond its calibrated range, which may degrade calibration quality at those lengths.
💡 Aligning `CalibrationConfig.max_seqlen` default with `SKIP_SOFTMAX_CALIB`
max_seqlen: int = ModeloptField( - default=32768, + default=16384, title="Maximum sequence length", description="Maximum sequence length for calibration (length bins auto-generated as powers of 2).", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/sparsity/attention_sparsity/config.py` at line 404, SKIP_SOFTMAX_CALIB sets "max_seqlen = 16384" but CalibrationConfig.max_seqlen defaults to 32768, causing inconsistent ceilings; update them to match or document the intentional divergence. Fix by either (A) changing CalibrationConfig.max_seqlen default to 16384 to align with SKIP_SOFTMAX_CALIB, or (B) updating the SKIP_SOFTMAX_CALIB entry to set max_seqlen = CalibrationConfig.max_seqlen (or add a comment on why 16384 is intentionally lower), and add a comment on the extrapolation risk for contexts >16384 so callers know calibration may degrade; reference the identifiers SKIP_SOFTMAX_CALIB and CalibrationConfig.max_seqlen when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/torch/sparsity/attention_sparsity/config.py`:
- Line 404: SKIP_SOFTMAX_CALIB sets "max_seqlen = 16384" but
CalibrationConfig.max_seqlen defaults to 32768, causing inconsistent ceilings;
update them to match or document the intentional divergence. Fix by either (A)
changing CalibrationConfig.max_seqlen default to 16384 to align with
SKIP_SOFTMAX_CALIB, or (B) updating the SKIP_SOFTMAX_CALIB entry to set
max_seqlen = CalibrationConfig.max_seqlen (or add a comment on why 16384 is
intentionally lower), and add a comment on the extrapolation risk for contexts
>16384 so callers know calibration may degrade; reference the identifiers
SKIP_SOFTMAX_CALIB and CalibrationConfig.max_seqlen when applying the change.
Set default skip softmax attention sparsity settings (in SKIP_SOFTMAX_CALIB) to correct values: max_seqlen 16384 and prefill/decode target sparsity ratios of 0.5
Summary by CodeRabbit