Skip to content

Investigate torch.Tensor.__setitem__ as unsupported in thunderfx #2790

@mattteochen

Description

@mattteochen

Setting this operator in

UNSUPPORTED_THUNDER_FUNCTION = ()
gives 10x decode throughput improvement on openai/gpt-oss-20b and Qwen/Qwen3-32B inside SGLang bench_one_batch.py.

Repro (this only works on the NV internal thunder-sglang-integration codebase):

SGLANG_USE_THUNDER_GRAPH_RUNNER=1 python3 -m sglang.bench_one_batch   --model-path openai/gpt-oss-20b --trust-remote-code   --model-impl transformers --dtype bfloat16   --json-model-override-args '{"quantization_config": null}' --cuda-graph-bs 1 --tp-size 4 --tp-strategy dtensor --load-format-dummy

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions