You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
networkx seems to fail to complete min cut for an MLP with two torchao.float8 linears and GELU, bf16.
The script below works when dtype is float32.
If the activation is ReLU, then I see a different error.
Traceback (most recent call last):
File "/opt/pytorch/lightning-thunder/thunder/core/rematerialization.py", line 378, in find_cut
_, (reachable, non_reachable) = nx.minimum_cut(g, "source", "sink")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 4", line 3, in argmap_minimum_cut_1
File "/usr/local/lib/python3.12/dist-packages/networkx/utils/backends.py", line 967, in __call__
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/networkx/algorithms/flow/maxflow.py", line 454, in minimum_cut
R = flow_func(flowG, _s, _t, capacity=capacity, value_only=True, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 8", line 3, in argmap_preflow_push_5
File "/usr/local/lib/python3.12/dist-packages/networkx/utils/backends.py", line 967, in __call__
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/networkx/algorithms/flow/preflowpush.py", line 422, in preflow_push
R = preflow_push_impl(G, s, t, capacity, residual, global_relabel_freq, value_only)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/networkx/algorithms/flow/preflowpush.py", line 41, in preflow_push_impl
detect_unboundedness(R, s, t)
File "<class 'networkx.utils.decorators.argmap'> compilation 16", line 3, in argmap_detect_unboundedness_13
File "/usr/local/lib/python3.12/dist-packages/networkx/utils/backends.py", line 967, in __call__
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/networkx/algorithms/flow/utils.py", line 173, in detect_unboundedness
raise nx.NetworkXUnbounded(
networkx.exception.NetworkXUnbounded: Infinite capacity path, flow unbounded above.
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:
Additional context
For this MLP with nvfuser executor, there seems to be either NVIDIA/Fuser#3498 or this one, depending on whether or not I'm applying DCE implemented in 232328c
The text was updated successfully, but these errors were encountered:
#1415
🐛 Bug
networkx seems to fail to complete min cut for an MLP with two torchao.float8 linears and GELU, bf16.
The script below works when dtype is float32.
If the activation is ReLU, then I see a different error.
To Reproduce
Steps to reproduce the behavior:
Code sample
Error with ReLU --
Expected behavior
Environment
conda
,pip
, source):Additional context
For this MLP with nvfuser executor, there seems to be either NVIDIA/Fuser#3498 or this one, depending on whether or not I'm applying DCE implemented in 232328c
The text was updated successfully, but these errors were encountered: