Add DG_MINIMIZE_NUM_SMS env var to control SM minimization #273
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an environment variable
DG_MINIMIZE_NUM_SMSto control whether DeepGEMM should minimize the number of SMs used for kernel execution, addressing #239.Background
The current SM minimization logic in
get_best_config()recomputes the minimal number of SMs required:This can invoke kernel compilation multiple times with different configurations, causing unstable time-to-first-token (TTFT) during inference.
Changes
SM90ArchSpec::should_minimize_num_sms()to check env varSM100ArchSpec::should_minimize_num_sms()to check env varUsage
Benefits
Test Plan
DG_MINIMIZE_NUM_SMS=0disables SM minimizationFixes #239
🤖 Generated with Claude Code