python interface "DeepPot" cannot load dpa2/dpa3 model #4995
Unanswered
mhkcjjmt-debug
asked this question in
Q&A
Replies: 1 comment 1 reply
-
@mhkcjjmt-debug Hi, please note that models need to be run with the corresponding DeePMD-kit version. The |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am learning a note book on bohrium named "DPA-2: a large atomic model as a multi-task learner ", url="https://www.bohrium.com/notebooks/83525789178" and I encountered a problem that I cannot handle.
when execute the following script:
import torch
from deepmd.pt.infer.deep_eval import DeepPot
import numpy as np
model = DeepPot("../model/H2O-PD.pt")
the error appeares
UnpicklingError Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = DeepPot("../model/H2O-PD.pt")
File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/infer/deep_eval.py:341, in DeepEval.init(self, model_file, auto_batch_size, neighbor_list, *args, **kwargs)
333 def init(
334 self,
335 model_file: str,
(...) 339 **kwargs: Any,
340 ) -> None:
--> 341 self.deep_eval = DeepEvalBackend(
342 model_file,
343 self.output_def,
344 *args,
345 auto_batch_size=auto_batch_size,
346 neighbor_list=neighbor_list,
347 **kwargs,
348 )
349 if self.deep_eval.get_has_spin() and hasattr(self, "output_def_mag"):
350 self.deep_eval.output_def = self.output_def_mag
File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/pt/infer/deep_eval.py:106, in DeepEval.init(self, model_file, output_def, auto_batch_size, neighbor_list, head, *args, **kwargs)
104 self.model_path = model_file
105 if str(self.model_path).endswith(".pt"):
--> 106 state_dict = torch.load(
107 model_file, map_location=env.DEVICE, weights_only=True
108 )
109 if "model" in state_dict:
110 state_dict = state_dict["model"]
File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/torch/serialization.py:1470, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
1462 return _load(
1463 opened_zipfile,
1464 map_location,
(...) 1467 **pickle_load_args,
1468 )
1469 except pickle.UnpicklingError as e:
-> 1470 raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
1471 return _load(
1472 opened_zipfile,
1473 map_location,
(...) 1476 **pickle_load_args,
1477 )
1478 if mmap:
UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the
weights_only
argument intorch.load
fromFalse
toTrue
. Re-runningtorch.load
withweights_only
set toFalse
will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.(2) Alternatively, to load with
weights_only=True
please check the recommended steps in the following error message.WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use
torch.serialization.add_safe_globals([scalar])
or thetorch.serialization.safe_globals([scalar])
context manager to allowlist this global if you trust this class/function.Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
According to the instruction, I have changed "weights_only=True"
in deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/pt/infer/deep_eval.py:106
104 self.model_path = model_file
105 if str(self.model_path).endswith(".pt"):
--> 106 state_dict = torch.load(
107 model_file, map_location=env.DEVICE, weights_only=True
108 )
to " model_file, map_location=env.DEVICE, weights_only=False"
However, another error appears:
RuntimeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = DeepPot("../model/H2O-PD.pt")
File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/infer/deep_eval.py:341, in DeepEval.init(self, model_file, auto_batch_size, neighbor_list, *args, **kwargs)
333 def init(
334 self,
335 model_file: str,
(...) 339 **kwargs: Any,
340 ) -> None:
--> 341 self.deep_eval = DeepEvalBackend(
342 model_file,
343 self.output_def,
344 *args,
345 auto_batch_size=auto_batch_size,
346 neighbor_list=neighbor_list,
347 **kwargs,
348 )
349 if self.deep_eval.get_has_spin() and hasattr(self, "output_def_mag"):
350 self.deep_eval.output_def = self.output_def_mag
File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/deepmd/pt/infer/deep_eval.py:136, in DeepEval.init(self, model_file, output_def, auto_batch_size, neighbor_list, head, *args, **kwargs)
134 model = torch.jit.script(model)
135 self.dp = ModelWrapper(model)
--> 136 self.dp.load_state_dict(state_dict)
137 elif str(self.model_path).endswith(".pth"):
138 model = torch.jit.load(model_file, map_location=env.DEVICE)
File ~/deepmd3.1.0-cpu/lib/python3.12/site-packages/torch/nn/modules/module.py:2581, in Module.load_state_dict(self, state_dict, strict, assign)
2573 error_msgs.insert(
2574 0,
2575 "Missing key(s) in state_dict: {}. ".format(
2576 ", ".join(f'"{k}"' for k in missing_keys)
2577 ),
2578 )
2580 if len(error_msgs) > 0:
-> 2581 raise RuntimeError(
2582 "Error(s) in loading state_dict for {}:\n\t{}".format(
2583 self.class.name, "\n\t".join(error_msgs)
2584 )
2585 )
2586 return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for ModelWrapper:
Missing key(s) in state_dict: "model.Default.min_nbor_dist", "model.Default.atomic_model.descriptor.repinit.compress_info.0", "model.Default.atomic_model.descriptor.repinit.compress_data.0", "model.Default.atomic_model.descriptor.repformers.layers.0.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.0.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.1.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.1.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.2.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.2.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.3.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.3.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.4.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.4.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.5.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.5.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.6.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.6.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.7.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.7.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.8.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.8.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.9.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.9.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.10.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.10.g1_self_mlp.bias", "model.Default.atomic_model.descriptor.repformers.layers.11.g1_self_mlp.matrix", "model.Default.atomic_model.descriptor.repformers.layers.11.g1_self_mlp.bias".
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.0.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.0.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.1.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.1.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.2.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.2.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.3.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.3.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.4.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.4.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.5.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.5.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.6.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.6.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.7.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.7.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.8.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.8.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.9.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.9.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.10.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.10.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.11.linear1.matrix: copying a param with shape torch.Size([800, 128]) from checkpoint, the shape in current model is torch.Size([640, 128]).
size mismatch for model.Default.atomic_model.descriptor.repformers.layers.11.proj_g1g2.matrix: copying a param with shape torch.Size([128, 32]) from checkpoint, the shape in current model is torch.Size([32, 128]).
Another notebook "Distillation of DPA-2 models" URL="https://www.bohrium.com/notebooks/76262686918" also has similar situations
at python script
class DPPTPredict:
def load_model(self, model: Path):
self.dp = DeepPot(model)
Any help from you will be hihgly appreciated . Thanks!
Environments: deepmd-kit 3.1.0 pytorch 2.6.0
The file "H2O-PD.pt" is from the data set of "https://www.bohrium.com/notebooks/83525789178" in qscft_v6 directory. It is beyond 25 MB size limit and connot be uploaded here.
Beta Was this translation helpful? Give feedback.
All reactions