python interface "DeepPot" cannot load dpa2/dpa3 model #4995

mhkcjjmt-debug · 2025-09-29T10:38:16Z

mhkcjjmt-debug
Sep 29, 2025

I am learning a note book on bohrium named "DPA-2: a large atomic model as a multi-task learner ", url="https://www.bohrium.com/notebooks/83525789178" and I encountered a problem that I cannot handle.
when execute the following script:
import torch
from deepmd.pt.infer.deep_eval import DeepPot
import numpy as np
model = DeepPot("../model/H2O-PD.pt")

the error appeares

UnpicklingError Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = DeepPot("../model/H2O-PD.pt")

File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/infer/deep_eval.py:341, in DeepEval.init(self, model_file, auto_batch_size, neighbor_list, *args, **kwargs)
333 def init(
334 self,
335 model_file: str,
(...) 339 **kwargs: Any,
340 ) -> None:
--> 341 self.deep_eval = DeepEvalBackend(
342 model_file,
343 self.output_def,
344 *args,
345 auto_batch_size=auto_batch_size,
346 neighbor_list=neighbor_list,
347 **kwargs,
348 )
349 if self.deep_eval.get_has_spin() and hasattr(self, "output_def_mag"):
350 self.deep_eval.output_def = self.output_def_mag

File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/pt/infer/deep_eval.py:106, in DeepEval.init(self, model_file, output_def, auto_batch_size, neighbor_list, head, *args, **kwargs)
104 self.model_path = model_file
105 if str(self.model_path).endswith(".pt"):
--> 106 state_dict = torch.load(
107 model_file, map_location=env.DEVICE, weights_only=True
108 )
109 if "model" in state_dict:
110 state_dict = state_dict["model"]

File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/torch/serialization.py:1470, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
1462 return _load(
1463 opened_zipfile,
1464 map_location,
(...) 1467 **pickle_load_args,
1468 )
1469 except pickle.UnpicklingError as e:
-> 1470 raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
1471 return _load(
1472 opened_zipfile,
1473 map_location,
(...) 1476 **pickle_load_args,
1477 )
1478 if mmap:

UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with weights_only=True please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use torch.serialization.add_safe_globals([scalar]) or the torch.serialization.safe_globals([scalar]) context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

According to the instruction, I have changed "weights_only=True"
in deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/pt/infer/deep_eval.py:106
104 self.model_path = model_file
105 if str(self.model_path).endswith(".pt"):
--> 106 state_dict = torch.load(
107 model_file, map_location=env.DEVICE, weights_only=True
108 )

to " model_file, map_location=env.DEVICE, weights_only=False"

However, another error appears:

RuntimeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = DeepPot("../model/H2O-PD.pt")

File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/infer/deep_eval.py:341, in DeepEval.init(self, model_file, auto_batch_size, neighbor_list, *args, **kwargs)
333 def init(
334 self,
335 model_file: str,
(...) 339 **kwargs: Any,
340 ) -> None:
--> 341 self.deep_eval = DeepEvalBackend(
342 model_file,
343 self.output_def,
344 *args,
345 auto_batch_size=auto_batch_size,
346 neighbor_list=neighbor_list,
347 **kwargs,
348 )
349 if self.deep_eval.get_has_spin() and hasattr(self, "output_def_mag"):
350 self.deep_eval.output_def = self.output_def_mag

File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/pt/infer/deep_eval.py:136, in DeepEval.init(self, model_file, output_def, auto_batch_size, neighbor_list, head, *args, **kwargs)
134 model = torch.jit.script(model)
135 self.dp = ModelWrapper(model)
--> 136 self.dp.load_state_dict(state_dict)
137 elif str(self.model_path).endswith(".pth"):
138 model = torch.jit.load(model_file, map_location=env.DEVICE)

File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/torch/nn/modules/module.py:2581, in Module.load_state_dict(self, state_dict, strict, assign)
2573 error_msgs.insert(
2574 0,
2575 "Missing key(s) in state_dict: {}. ".format(
2576 ", ".join(f'"{k}"' for k in missing_keys)
2577 ),
2578 )
2580 if len(error_msgs) > 0:
-> 2581 raise RuntimeError(
2582 "Error(s) in loading state_dict for {}:\n\t{}".format(
2583 self.class.name, "\n\t".join(error_msgs)
2584 )
2585 )
2586 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for ModelWrapper:
Missing key(s) in state_dict: "model.Default.min_nbor_dist", "model.Default.atomic_model.descriptor.repinit.compress_info.0", "model.Default.atomic_model.descriptor.repinit.compress_data.0", "model.Default.atomic_model.descriptor.repformers.layers.0.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.0.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.1.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.1.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.2.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.2.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.3.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.3.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.4.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.4.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.5.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.5.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.6.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.6.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.7.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.7.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.8.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.8.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.9.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.9.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.10.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.10.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.11.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.11.g1_self_mlp.bias".
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.0.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.0.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.1.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.1.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.2.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.2.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.3.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.3.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.4.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.4.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.5.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.5.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.6.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.6.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.7.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.7.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.8.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.8.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.9.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.9.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.10.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.10.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.11.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.11.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).

Another notebook "Distillation of DPA-2 models" URL="https://www.bohrium.com/notebooks/76262686918" also has similar situations
at python script
class DPPTPredict:
def load_model(self, model: Path):
self.dp = DeepPot(model)

Any help from you will be hihgly appreciated . Thanks!
Environments: deepmd-kit 3.1.0 pytorch 2.6.0

The file "H2O-PD.pt" is from the data set of "https://www.bohrium.com/notebooks/83525789178" in qscft_v6 directory. It is beyond 25 MB size limit and connot be uploaded here.

iProzd · 2025-09-30T04:42:02Z

iProzd
Sep 30, 2025
Collaborator

@mhkcjjmt-debug Hi, please note that models need to be run with the corresponding DeePMD-kit version. The Introduction section of your notebook specifies compatibility with DeePMD-kit v3.0.0beta3. To reuse this notebook's model files in your environment, please install the matching version. Additionally, we recommend using the latest pretrained DPA2 or DPA3 models compatible with v3.1.0 for enhanced performance. Hope this helps :)

1 reply

mhkcjjmt-debug Sep 30, 2025
Author

It really works. Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

python interface "DeepPot" cannot load dpa2/dpa3 model #4995

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

python interface "DeepPot" cannot load dpa2/dpa3 model #4995

Uh oh!

Uh oh!

mhkcjjmt-debug Sep 29, 2025

the error appeares

However, another error appears:

Replies: 1 comment · 1 reply

Uh oh!

iProzd Sep 30, 2025 Collaborator

Uh oh!

mhkcjjmt-debug Sep 30, 2025 Author

mhkcjjmt-debug
Sep 29, 2025

Replies: 1 comment 1 reply

iProzd
Sep 30, 2025
Collaborator

mhkcjjmt-debug Sep 30, 2025
Author