Open
Description
Tried to run nanoVLM/measure_vram.py
as is with kaggle T4 GPU.
But encountered the error:
_scaled_dot_product_attention: Explicit attn_mask should not be set when is_causal=True
See full error
--- VRAM Measurement ---
Testing Batch Size: 1
W0530 05:01:15.467000 19 torch/_inductor/utils.py:1137] [0/0_1] Not enough SMs to use max_autotune_gemm mode
/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py:1948: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py:1948: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py:1948: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py:1948: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py:1948: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py:1948: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/_inductor/compile_fx.py:1948: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
warnings.warn(
An unexpected runtime error occurred for batch size 1: Failed running call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(1, 9, 128, 64), grad_fn=<AddBackward0>), FakeTensor(..., device='cuda:0', size=(1, 9, 128, 64), grad_fn=<ViewBackward0>), FakeTensor(..., device='cuda:0', size=(1, 9, 128, 64), dtype=torch.bfloat16,
grad_fn=<ViewBackward0>)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(1, 1, 1, 128)), 'dropout_p': 0.0, 'is_causal': True}):
_scaled_dot_product_attention: Explicit attn_mask should not be set when is_causal=True
from user code:
File "/tmp/ipykernel_19/3657429462.py", line 596, in torch_dynamo_resume_in_forward_at_589
x, kv_cache[i] = block(x, cos, sin, attention_mask, kv_cache[i])
File "/tmp/ipykernel_19/3657429462.py", line 540, in forward
x, block_kv_cache = self.attn(x, cos, sin, attention_mask, block_kv_cache)
File "/tmp/ipykernel_19/3657429462.py", line 481, in forward
y = torch.nn.functional.scaled_dot_product_attention(
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Testing Batch Size: 2
An unexpected runtime error occurred for batch size 2: Failed running call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64), grad_fn=<AddBackward0>), FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64),
grad_fn=<ViewBackward0>), FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64), dtype=torch.bfloat16,
grad_fn=<ViewBackward0>)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(s5, 1, 1, 128)), 'dropout_p': 0.0, 'is_causal': True}):
_scaled_dot_product_attention: Explicit attn_mask should not be set when is_causal=True
from user code:
File "/tmp/ipykernel_19/3657429462.py", line 596, in torch_dynamo_resume_in_forward_at_589
x, kv_cache[i] = block(x, cos, sin, attention_mask, kv_cache[i])
File "/tmp/ipykernel_19/3657429462.py", line 540, in forward
x, block_kv_cache = self.attn(x, cos, sin, attention_mask, block_kv_cache)
File "/tmp/ipykernel_19/3657429462.py", line 481, in forward
y = torch.nn.functional.scaled_dot_product_attention(
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Testing Batch Size: 4
An unexpected runtime error occurred for batch size 4: Failed running call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64), grad_fn=<AddBackward0>), FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64),
grad_fn=<ViewBackward0>), FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64), dtype=torch.bfloat16,
grad_fn=<ViewBackward0>)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(s5, 1, 1, 128)), 'dropout_p': 0.0, 'is_causal': True}):
_scaled_dot_product_attention: Explicit attn_mask should not be set when is_causal=True
from user code:
File "/tmp/ipykernel_19/3657429462.py", line 596, in torch_dynamo_resume_in_forward_at_589
x, kv_cache[i] = block(x, cos, sin, attention_mask, kv_cache[i])
File "/tmp/ipykernel_19/3657429462.py", line 540, in forward
x, block_kv_cache = self.attn(x, cos, sin, attention_mask, block_kv_cache)
File "/tmp/ipykernel_19/3657429462.py", line 481, in forward
y = torch.nn.functional.scaled_dot_product_attention(
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
--- Summary of VRAM Usage ---
Batch Size 1: Error: Failed running call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(1, 9, 128, 64), grad_fn=<AddBackward0>), FakeTensor(..., device='cuda:0', size=(1, 9, 128, 64), grad_fn=<ViewBackward0>), FakeTensor(..., device='cuda:0', size=(1, 9, 128, 64), dtype=torch.bfloat16,
grad_fn=<ViewBackward0>)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(1, 1, 1, 128)), 'dropout_p': 0.0, 'is_causal': True}):
_scaled_dot_product_attention: Explicit attn_mask should not be set when is_causal=True
from user code:
File "/tmp/ipykernel_19/3657429462.py", line 596, in torch_dynamo_resume_in_forward_at_589
x, kv_cache[i] = block(x, cos, sin, attention_mask, kv_cache[i])
File "/tmp/ipykernel_19/3657429462.py", line 540, in forward
x, block_kv_cache = self.attn(x, cos, sin, attention_mask, block_kv_cache)
File "/tmp/ipykernel_19/3657429462.py", line 481, in forward
y = torch.nn.functional.scaled_dot_product_attention(
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Batch Size 2: Error: Failed running call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64), grad_fn=<AddBackward0>), FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64),
grad_fn=<ViewBackward0>), FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64), dtype=torch.bfloat16,
grad_fn=<ViewBackward0>)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(s5, 1, 1, 128)), 'dropout_p': 0.0, 'is_causal': True}):
_scaled_dot_product_attention: Explicit attn_mask should not be set when is_causal=True
from user code:
File "/tmp/ipykernel_19/3657429462.py", line 596, in torch_dynamo_resume_in_forward_at_589
x, kv_cache[i] = block(x, cos, sin, attention_mask, kv_cache[i])
File "/tmp/ipykernel_19/3657429462.py", line 540, in forward
x, block_kv_cache = self.attn(x, cos, sin, attention_mask, block_kv_cache)
File "/tmp/ipykernel_19/3657429462.py", line 481, in forward
y = torch.nn.functional.scaled_dot_product_attention(
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Batch Size 4: Error: Failed running call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64), grad_fn=<AddBackward0>), FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64),
grad_fn=<ViewBackward0>), FakeTensor(..., device='cuda:0', size=(s0, 9, 128, 64), dtype=torch.bfloat16,
grad_fn=<ViewBackward0>)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(s5, 1, 1, 128)), 'dropout_p': 0.0, 'is_causal': True}):
_scaled_dot_product_attention: Explicit attn_mask should not be set when is_causal=True
from user code:
File "/tmp/ipykernel_19/3657429462.py", line 596, in torch_dynamo_resume_in_forward_at_589
x, kv_cache[i] = block(x, cos, sin, attention_mask, kv_cache[i])
File "/tmp/ipykernel_19/3657429462.py", line 540, in forward
x, block_kv_cache = self.attn(x, cos, sin, attention_mask, block_kv_cache)
File "/tmp/ipykernel_19/3657429462.py", line 481, in forward
y = torch.nn.functional.scaled_dot_product_attention(
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
To reproduce, run this kaggle notebook: https://www.kaggle.com/code/asapannarakesh/vram-usage?scriptVersionId=242665513
I am planning to use the same measure_vram
function for paligemma (look at ariG23498/gemma3-object-detection#9 (comment))
Metadata
Metadata
Assignees
Labels
No labels