-
Notifications
You must be signed in to change notification settings - Fork 1
GPU Batch System
In many cases training on CPU's is not sufficient enough. It is necessary to switch to a GPU supported training, which is ~80 times more powerful than CPU supported training. Therefore one can use the phys3b GPU-machine integrated in the RWTH Aachen RZ Cluster. It provides 128GB RAM, 1TB SSD, 12 CPU cores and 2 high performance Nvidia graphic cards. First one has to ask an Admin (e.g.: Jan Auffenberg (IceCube)) for an account for this machine. This allows to log in to this machine via: ssh <tim-kennung>@cluster.rz.rwth-aachen.de. Now you should have access to the phys3b directory. There you can find several examples scripts to submit trainings to the GPU-machine. Copy&Paste an example .submit script to your directory and modify the submission script for your needs. Keep in mind that all paths have to be full paths (also in your script).
Here are some useful commands:
- For submission:
bsub<xxxx.submit - List all running/queued jobs:
bjobs -u all -P phys3b - Kill a running job:
bkill JOBID - Have a short peek into the script output of the last running job:
bpeek
IMPORTANT: Only one user at a time can run jobs on this machine, other user have to wait until those are finished. Therefore it is very helpful to communicate with the other users.
For your analysis you can copy root files with scp or rsync to your home directory on the GPU machine.