-
Notifications
You must be signed in to change notification settings - Fork 258
llm/peft/lora/lora_seq2seq.ipynb案例执行遇到CANN异常 #1917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
bug
Something isn't working
Comments
类似的,roberta_sequence_classification.ipynb 案例 Traceback (most recent call last):
File "/home/usersshared/githubSrc/mindnlp/llm/peft/lora/roberta_sequence_classification.py", line 70, in <module>
print(next(datasets['train'].create_dict_iterator()))
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindspore/dataset/engine/iterators.py", line 152, in __next__
data = self._get_next()
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindspore/dataset/engine/iterators.py", line 277, in _get_next
raise err
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindspore/dataset/engine/iterators.py", line 260, in _get_next
return {k: self._transform_md_to_output(t) for k, t in self._iterator.GetNextAsMap().items()}
RuntimeError: Exception thrown from user defined Python function in dataset.
------------------------------------------------------------------
- Python Call Stack:
------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindspore/dataset/engine/datasets_user_defined.py", line 104, in _cpp_sampler_fn
yield _convert_row(val)
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindspore/dataset/engine/datasets_user_defined.py", line 173, in _convert_row
item = np.array(x, copy=False)
ValueError: Unable to avoid copy while creating an array as requested.
If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
------------------------------------------------------------------
- Dataset Pipeline Error Message:
------------------------------------------------------------------
[ERROR] Execute user Python code failed, check 'Python Call Stack' above.
------------------------------------------------------------------
- C++ Call Stack: (For framework developers)
------------------------------------------------------------------
mindspore/ccsrc/minddata/dataset/engine/datasetops/source/generator_op.cc(261). |
CANN的版本不配套,23.0.0的hdk不对,需要根据 https://www.mindspore.cn/versions 安装hdk 24的版本 |
修改NPU驱动版本为 [ERROR] RUNTIME(33130,python):2025-01-24-01:36:21.067.639 [driver.cc:65]33130 GetDeviceCount:report error module_type=1, module_name=EL9999
[ERROR] RUNTIME(33130,python):2025-01-24-01:36:21.067.803 [driver.cc:65]33130 GetDeviceCount:Call drvGetDevNum, drvRetCode=7.
[ERROR] RUNTIME(33130,python):2025-01-24-01:36:21.068.050 [api_c_device.cc:21]33130 rtGetDeviceCount:ErrCode=507899, desc=[driver error:internal error], InnerCode=0x7020010
[ERROR] RUNTIME(33130,python):2025-01-24-01:36:21.068.122 [error_message_manage.cc:53]33130 FuncErrorReason:report error module_type=3, module_name=EE8888
[ERROR] RUNTIME(33130,python):2025-01-24-01:36:21.068.209 [error_message_manage.cc:53]33130 FuncErrorReason:rtGetDeviceCount execute failed, reason=[driver error:internal error]
[ERROR] ASCENDCL(33130,python):2025-01-24-01:36:21.068.345 [device.cpp:366]33130 aclrtGetDeviceCount: get device count failed, runtime result = 507899.
Traceback (most recent call last):
File "/home/usersshared/githubSrc/mindnlp/llm/peft/lora/lora_seq2seq.py", line 34, in <module>
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/transformers/models/auto/auto_factory.py", line 510, in from_pretrained
return model_class.from_pretrained(
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/transformers/modeling_utils.py", line 3126, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/transformers/models/mt5/modeling_mt5.py", line 1134, in __init__
self.encoder = MT5Stack(encoder_config, self.shared)
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/transformers/models/mt5/modeling_mt5.py", line 715, in __init__
[MT5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/transformers/models/mt5/modeling_mt5.py", line 715, in <listcomp>
[MT5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/transformers/models/mt5/modeling_mt5.py", line 460, in __init__
self.layer.append(MT5LayerSelfAttention(config, has_relative_attention_bias=has_relative_attention_bias))
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/transformers/models/mt5/modeling_mt5.py", line 389, in __init__
self.layer_norm = MT5LayerNorm(config.d_model, eps=config.layer_norm_epsilon)
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/transformers/models/mt5/modeling_mt5.py", line 59, in __init__
self.weight = nn.Parameter(ops.ones(hidden_size))
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindnlp/core/ops/creation.py", line 62, in ones
return mindspore.mint.ones(size, dtype=dtype)
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindspore/mint/__init__.py", line 692, in ones
return ops.auto_generate.ones(size, dtype)
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindspore/ops/auto_generate/gen_ops_def.py", line 3971, in ones
return ones_op(shape, dtype)
File "/home/tridu33/.conda/envs/mindnlp/lib/python3.9/site-packages/mindspore/ops/operations/manually_defined/ops_def.py", line 1817, in __call__
return _convert_stub(pyboost_ones(self, [size, type if type is None \
RuntimeError: Ascend kernel runtime initialization failed. The details refer to 'Ascend Error Message'.
----------------------------------------------------
- Framework Error Message: (For framework developers)
----------------------------------------------------
Call rtGetDeviceCount, ret[507899]
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:358 Init
mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:642 SetRtDevice
[INFO] RUNTIME(33130,python):2025-01-24-01:36:22.458.448 [runtime.cc:1991] 33130 ~Runtime: deconstruct runtime
[INFO] RUNTIME(33130,python):2025-01-24-01:36:22.463.831 [runtime.cc:1998] 33130 ~Runtime: wait monitor success, use=0. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug/ 问题描述 (Mandatory / 必填)
lora_seq2seq案例执行遇到CANN异常
CANN
910Ascend
To Reproduce / 重现步骤 (Mandatory / 必填)
~/workspace/githubSrc/mindnlp/llm/peft/lora$ python lora_seq2seq.py
Screenshots/ 日志 / 截图 (Mandatory / 必填)
If applicable, add screenshots to help explain your problem.
The text was updated successfully, but these errors were encountered: