Skip to content

Conversation

@tpoisonooo
Copy link
Contributor

@tpoisonooo tpoisonooo commented Dec 31, 2025

Issue Summary

Example 13_two_tensor_op_fusion uses hardcoded SM architecture, causing confusion and extra work for new comers.

Background

I'm using H800 GPU (SM90) and initially misunderstood the SM compatibility (perhaps due to TensorRT 1.0 and cudnn7).

To run the 13_two_tensor_op_fusion example, I have spent hours reading template source code, write new version and fix compile error for SM90 (like these code) .

Proposal

Modify testRun to use actual SM architecture of the current GPU, rather than using a hardcoded value.

Benefits

  • Improves onboarding experience by eliminating manual code modification for new users
  • Avoids confusion about SM compatibility requirements

@tpoisonooo
Copy link
Contributor Author

Now the fix only applies to examples/13_two_tensor_op_fusion.
I tested it with

cd build/examples/13_two_tensor_op_fusion

for f in 13_fused_*; do [ -f "$f" ] && ./"$f" >> run.out; done

# manually check the output file

@hwu36

@hwu36
Copy link
Collaborator

hwu36 commented Jan 7, 2026

@jwang323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants