Increasing performance in molecular dynamics calculation #1156
-
Hello. My system has around 1200 atoms. I set up a molecular dynamic simulation using the
This job was run in my notebook that has 20 cores. After launching the script, I change the job to the highest priority using The image below show that all the processes are running in Is there a way to increase the job performance in a way that the cores are used more efficiently? Best, |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
@icamps can you share your input coordinates, parameter file and command line? And lscpu or cpu-z, basically what is the processor type? |
Beta Was this translation helpful? Give feedback.
-
@icamps thanks! What are the CPU specs lscpu (Linux) cpu-z (Win), because maybe you are oversubscribing threads, which could lead to lower overall performance. Basically more threads assigned to xtb, than actual CPU cores are available. Reason, you show 20 threads or CPUs in the screenshot, but are they true CPU cores (AMD, Intel)? So what is the specific type of CPU? Plus for scaling, start benchmarking with smaller molecules 200Da, 400Da, 800Da and checking CPU utilization usually gives better insights. Also there could be IO overhead, unless you use a SSD or SSD RAID (multiple SSDs set as RAID). |
Beta Was this translation helpful? Give feedback.
-
Hey @tobigithub , the output of lscpu is below. Even when the notebook has only 14 real cores, it can perform using 20 threads. In this case, I run with 18 threads.
|
Beta Was this translation helpful? Give feedback.
-
@icamps thanks for providing the input data. Multiple aspects come to my mind:
From the results with 96 true CPU cores available, it seems that the algorithm does not really benefit from more CPUs, one could run multiple molecules for example at the same time, but rather peak optimum is at a 1/4 of all CPUs used (could be boost GHz) but when the number of defined NUM_THREADS > true number of CPU cores we see a sudden drop in performance.
Yeah, final thought its a mini benchmark and there could be multiple errors made, but figuring out the fastest overall program use would be the most practical way I guess. Also older AMD and Intel CPUs on a HPC compute cluster perform very well, even if they run at 2 Ghz or so, simply because you can use 100s of cores and just keep them running or even run multiple experiments! So for such large molecules I would rather use a cluster. |
Beta Was this translation helpful? Give feedback.
@icamps thanks for providing the input data. Multiple aspects come to my mind:
The Intel i7-13650HX has 14 true CPU cores, see Intel SKU [Link]
Using more than the provided true CPU cores does not always increase the performance, lets say 1 core/ 2 threads would not automatically double the speed. In many cases when oversubscribing CPU cores, the algorithm could even become slower. There are exceptions, with hybrid algorithms, that use multiple different processes, here hyperthreading could massively speed up sof…