Description
Describe the bug
I got "torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow" error
when I ran the example in the link below.
https://github.com/vllm-project/llm-compressor/blob/main/examples/quantizing_moe/deepseek_r1_example.py
https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16/llama3_example.py
Has anyone experienced similar issues?
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
- OS [e.g. Ubuntu 20.04]: Ubuntu 22.04.4 LTS
- Python version [e.g. 3.7]: 3.10.14
- LLM Compressor version or commit hash [e.g. 0.1.0,
f7245c8
]: i use recent main brach code after pip install 0.6.0 - ML framework version(s) [e.g. torch 2.3.1]: torch 2.7.0
- Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
- Other relevant environment information [e.g. hardware, CUDA version]:
pip list
absl-py 2.1.0
accelerate 1.5.2
aiohappyeyeballs 2.6.1
aiohttp 3.11.14
aiohttp-cors 0.8.0
aiosignal 1.3.2
airportsdata 20250224
annotated-types 0.7.0
anyio 4.9.0
astor 0.8.1
attrs 25.3.0
azure-core 1.30.2
azure-identity 1.15.0
bitsandbytes 0.45.3
blake3 1.0.4
blinker 1.4
boto3 1.34.7
botocore 1.34.144
cachetools 5.5.2
certifi 2025.1.31
cffi 1.16.0
charset-normalizer 3.4.1
click 8.1.8
cloudpickle 3.1.1
cmake 3.31.6
colorful 0.5.6
compressed-tensors 0.10.2
cryptography 3.4.8
datasets 3.5.0
dbus-python 1.2.18
depyf 0.18.0
dill 0.3.8
diskcache 5.6.3
distlib 0.3.9
distro 1.7.0
distro-info 1.1+ubuntu0.2
dkmsclient-py 1.8.1
docstring_parser 0.16
durationpy 0.9
einops 0.8.1
fastapi 0.115.11
filelock 3.18.0
flashinfer-python 0.2.0.post1
frozenlist 1.5.0
fsspec 2024.12.0
gguf 0.10.0
google-api-core 2.24.2
google-auth 2.38.0
google-cloud-core 2.4.3
google-cloud-storage 2.19.0
google-crc32c 1.7.1
google-resumable-media 2.7.2
googleapis-common-protos 1.69.2
grpcio 1.71.0
h11 0.14.0
hf_transfer 0.1.9
hf-xet 1.1.5
httpcore 1.0.7
httplib2 0.20.2
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.33.1
humanize 4.12.1
idna 3.10
importlib-metadata 4.6.4
interegular 0.3.3
jeepney 0.7.1
Jinja2 3.1.6
jiter 0.9.0
jmespath 1.0.1
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
keyring 23.5.0
kfp 2.8.0
kfp-pipeline-spec 0.3.0
kfp-server-api 2.0.3
kubernetes 26.1.0
lark 1.2.2
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
llmcompressor 0.6.0
lm-format-enforcer 0.10.11
loguru 0.7.3
MarkupSafe 3.0.2
mistral_common 1.5.4
modelscope 1.24.0
more-itertools 8.10.0
mpmath 1.3.0
msal 1.29.0
msal-extensions 1.2.0
msgpack 1.1.0
msgspec 0.19.0
multidict 6.2.0
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.4.2
ninja 1.11.1.3
numpy 1.26.4
nvidia-cublas-cu12 12.6.4.1
nvidia-cuda-cupti-cu12 12.6.80
nvidia-cuda-nvrtc-cu12 12.6.77
nvidia-cuda-runtime-cu12 12.6.77
nvidia-cudnn-cu12 9.5.1.17
nvidia-cufft-cu12 11.3.0.4
nvidia-cufile-cu12 1.11.1.6
nvidia-curand-cu12 10.3.7.77
nvidia-cusolver-cu12 11.7.1.2
nvidia-cusparse-cu12 12.5.4.2
nvidia-cusparselt-cu12 0.6.3
nvidia-ml-py 12.570.86
nvidia-nccl-cu12 2.26.2
nvidia-nvjitlink-cu12 12.6.85
nvidia-nvtx-cu12 12.6.77
oauthlib 3.2.2
openai 1.67.0
opencensus 0.11.4
opencensus-context 0.1.3
opencv-python-headless 4.11.0.86
outlines 0.1.11
outlines_core 0.1.26
packaging 24.2
pandas 2.2.3
partial-json-parser 0.2.1.1.post5
pillow 11.1.0
pip 25.1.1
platformdirs 4.3.7
portalocker 2.10.1
prometheus_client 0.21.1
prometheus-fastapi-instrumentator 7.1.0
propcache 0.3.0
proto-plus 1.26.1
protobuf 4.25.6
psutil 7.0.0
py-cpuinfo 9.0.0
py-spy 0.4.0
pyarrow 19.0.1
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycountry 24.6.1
pycparser 2.22
pydantic 2.10.6
pydantic_core 2.27.2
PyGObject 3.42.1
PyJWT 2.8.0
pynvml 12.0.0
pyparsing 2.4.7
python-apt 2.4.0+ubuntu4
python-dateutil 2.8.2
python-dotenv 1.0.1
pytz 2025.2
PyYAML 6.0.2
pyzmq 26.3.0
ray 2.43.0
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
requests-oauthlib 2.0.0
requests-toolbelt 0.10.1
rpds-py 0.23.1
rsa 4.9
runai-model-streamer 0.11.2
runai-model-streamer-s3 0.11.2
s3transfer 0.10.2
safetensors 0.5.3
SecretStorage 3.3.1
sentencepiece 0.2.0
setuptools 80.9.0
setuptools-scm 8.2.1
six 1.16.0
smart-open 7.1.0
sniffio 1.3.1
starlette 0.46.1
sympy 1.14.0
tabulate 0.9.0
tiktoken 0.9.0
timm 0.9.10
tink 1.10.0
tokenizers 0.21.1
torch 2.7.1
torchaudio 2.7.1
torchvision 0.22.1
tqdm 4.67.1
transformer_engine 2.4.0
transformer_engine_cu12 2.4.0
transformers 4.53.0
triton 3.3.1
typing_extensions 4.12.2
tzdata 2025.2
unattended-upgrades 0.1
urllib3 1.26.19
uvicorn 0.34.0
uvloop 0.21.0
virtualenv 20.29.3
vllm 0.7.3.dev81+g0611dc788.cu124
wadllib 1.3.6
watchfiles 1.0.4
websocket-client 1.8.0
websockets 15.0.1
wheel 0.45.1
wrapt 1.17.2
xformers 0.0.28.post3
xgrammar 0.1.16
xxhash 3.5.0
yarl 1.18.3
zipp 1.0.0
To Reproduce
Exact steps to reproduce the behavior:
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
2025-06-30T04:41:07.286097+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
2025-06-30T04:41:07.286312+0000 | IndependentPipeline | INFO - Inferred SequentialPipeline for GPTQModifier
Traceback (most recent call last):
File "/workspace/yongho/deepseek.py", line 74, in
oneshot(
File "/opt/conda/lib/python3.10/site-packages/compressed_tensors/utils/helpers.py", line 193, in wrapped
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 33, in oneshot
oneshot(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/entrypoints/oneshot.py", line 309, in oneshot
one_shot()
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/entrypoints/oneshot.py", line 149, in call
self.apply_recipe_modifiers(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/entrypoints/oneshot.py", line 193, in apply_recipe_modifiers
pipeline(self.model, calibration_dataloader, self.dataset_args)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/pipelines/independent/pipeline.py", line 49, in call
pipeline(model, dataloader, dataset_args)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/pipelines/sequential/pipeline.py", line 66, in call
subgraphs = trace_subgraphs(model, sample_input, sequential_targets, ignore)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 126, in trace_subgraphs
tracer.trace(
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/fx.py", line 1315, in trace
self.graph = super().trace(root, concrete_args=concrete_args)
File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 838, in trace
(self.create_arg(fn(*args)),),
File "DeepseekV3ForCausalLM_8781495187634_autowrapped", line 59, in forward
File "/opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 813, in module_call_wrapper
return self.call_module(mod, forward, args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/fx.py", line 1179, in call_module
return super().call_module(m, forward, args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 531, in call_module
ret_val = forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 806, in forward
return _orig_module_call(mod, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "DeepseekV3Model_8781495172093_autowrapped", line 50, in forward
@torch.fx.wrap
File "/opt/conda/lib/python3.10/site-packages/transformers/masking_utils.py", line 719, in create_causal_mask
causal_mask = mask_interface(
File "/opt/conda/lib/python3.10/site-packages/transformers/masking_utils.py", line 466, in eager_mask
mask = sdpa_mask(
File "/opt/conda/lib/python3.10/site-packages/transformers/masking_utils.py", line 329, in sdpa_mask_recent_torch
padding_mask = prepare_padding_mask(attention_mask, kv_length, kv_offset, _slice=False)
File "/opt/conda/lib/python3.10/site-packages/transformers/masking_utils.py", line 172, in prepare_padding_mask
if (padding_length := kv_length + kv_offset - attention_mask.shape[-1]) > 0:
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/fx.py", line 670, in bool
return super().bool()
File "/opt/conda/lib/python3.10/site-packages/torch/fx/proxy.py", line 555, in bool
return self.tracer.to_bool(self)
File "/opt/conda/lib/python3.10/site-packages/torch/fx/proxy.py", line 366, in to_bool
raise TraceError(
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow
Additional context
Add any other context about the problem here. Also include any relevant files.