You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had 30GB RAM, and I used 2MBx13000 swapfiles with the following command
: sudo dd if=/dev/zero of=/swapfile bs=2M count=13000 status=progress
Allocating transformer on host
Loading checkpoint 0
Loading checkpoint 1
Loaded in 2590.17 seconds with 13.19 GiB
cuBLAS API failed with status 15
A: torch.Size([72, 5120]), B: torch.Size([5120, 5120]), C: (72, 5120); (lda, ldb, ldc): (c_int(2304), c_int(163840), c_int(2304)); (m, n, k): (c_int(72), c_int(5120), c_int(5120))
error detectedTraceback (most recent call last):
File "/home/jupyter/llama-int8/example.py", line 117, in <module>
fire.Fire(main)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/pt/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/jupyter/llama-int8/example.py", line 107, in main
results = generator.generate(
File "/home/jupyter/llama-int8/llama/generation.py", line 42, in generate
logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/jupyter/llama-int8/llama/model.py", line 281, in forward
h = layer(h, start_pos, freqs_cis, mask)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jupyter/llama-int8/llama/model.py", line 221, in forward
h = x + self.attention.forward(
File "/home/jupyter/llama-int8/llama/model.py", line 142, in forward
xq, xk, xv = self.wq(x), self.wk(x), self.wv(x)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/opt/conda/envs/pt/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
The text was updated successfully, but these errors were encountered:
any clues?
I had 30GB RAM, and I used 2MBx13000 swapfiles with the following command
:
sudo dd if=/dev/zero of=/swapfile bs=2M count=13000 status=progress
The text was updated successfully, but these errors were encountered: