set.000 vs set.001 #4657
Unanswered
jinalee314
asked this question in
Q&A
Replies: 1 comment 1 reply
-
The tutorial is incorrect. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am currently using DeePMD-kit v2.2.0 to train a MLP. I am not entirely certain about how set.000 and set.001 are handled, if they are in the same directory. According to this tutorial, "The last set will be considered as the test set by the DeePMD-kit by default."
In my data folder for each system, I have the following format:
type.raw
type_map.raw
set.000/box.npy
set.000/coord.npy
set.000/energy.npy
set.000/force.npy
set.000/fparam.npy
set.000/virial.npy
set.001/box.npy
set.001/coord.npy
set.001/energy.npy
set.001/force.npy
set.001/fparam.npy
set.001/virial.npy
There are 200 frames total in set.000 + set.001, divided unevenly. My intention is for set.001 to be a test set.
In my input .json file, I specified this entire data folder in training_data/systems section, under the assumption that the training process only utilizes set.000 as the training data. When I look at the output file, however, I see that n_bch is displayed as 200, which is the total number of frames from set.000 and set.001:
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO -- /data 202 1 200 0.009 T
Is this an indication that set.001 is being used as training data as well? Or does it still hold that the last set, set.001, is automatically considered a test set and the display is just misleading? Should set.001 be explicitly used as validation data? Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions