Add DG_MINIMIZE_NUM_SMS env var to control SM minimization #273

yurekami · 2026-01-01T07:40:34Z

Summary

This PR adds an environment variable DG_MINIMIZE_NUM_SMS to control whether DeepGEMM should minimize the number of SMs used for kernel execution, addressing #239.

Background

The current SM minimization logic in get_best_config() recomputes the minimal number of SMs required:

if (ArchSpec::should_minimize_num_sms()) {
    num_min_sms = ceil_div(ceil_div(m, best_block_m) * ceil_div(n, best_block_n) * num_groups, best_num_waves);
    num_min_sms = align(num_min_sms, best_multicast_config.num_multicast);
}

This can invoke kernel compilation multiple times with different configurations, causing unstable time-to-first-token (TTFT) during inference.

Changes

Modified SM90ArchSpec::should_minimize_num_sms() to check env var
Modified SM100ArchSpec::should_minimize_num_sms() to check env var

Usage

# Enable SM minimization (default, current behavior)
export DG_MINIMIZE_NUM_SMS=1

# Disable SM minimization for stable TTFT
export DG_MINIMIZE_NUM_SMS=0

Benefits

DG_MINIMIZE_NUM_SMS=1 (default): Better L2 cache usage, reduced GPU frequency drops
DG_MINIMIZE_NUM_SMS=0: More predictable compilation, stable TTFT

Test Plan

Verify default behavior is unchanged (SM minimization enabled)
Verify DG_MINIMIZE_NUM_SMS=0 disables SM minimization
Compare TTFT stability with env var disabled

Fixes #239

🤖 Generated with Claude Code

This adds an environment variable to control whether DeepGEMM should minimize the number of SMs used for kernel execution. Background: The SM minimization logic recomputes the minimal number of SMs required for each GEMM configuration, which can cause multiple kernel compilations and result in unstable time-to-first-token (TTFT) during inference. Usage: - DG_MINIMIZE_NUM_SMS=1 (default): Enable SM minimization for better L2 cache usage and reduced GPU frequency drops - DG_MINIMIZE_NUM_SMS=0: Disable SM minimization for more predictable compilation behavior and stable TTFT The change is backward compatible - the default behavior remains unchanged. Fixes deepseek-ai#239 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DG_MINIMIZE_NUM_SMS env var to control SM minimization #273

Add DG_MINIMIZE_NUM_SMS env var to control SM minimization #273

Uh oh!

yurekami commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add DG_MINIMIZE_NUM_SMS env var to control SM minimization #273

Are you sure you want to change the base?

Add DG_MINIMIZE_NUM_SMS env var to control SM minimization #273

Uh oh!

Conversation

yurekami commented Jan 1, 2026

Summary

Background

Changes

Usage

Benefits

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant