GPU Batch System

In many cases training on CPU's is not sufficient enough. It is necessary to switch to a GPU supported training, which is ~80 times more powerful than CPU supported training. Therefore one can use the phys3b GPU-machine integrated in the RWTH Aachen RZ Cluster. It provides 128GB RAM, 1TB SSD, 12 CPU cores and 2 high performance Nvidia graphic cards. First one has to ask an Admin (e.g.: Jan Auffenberg (IceCube)) for an account for this machine. This allows to log in to this machine via: ssh <tim-kennung>@cluster.rz.rwth-aachen.de. Now you should have access to the phys3b directory. There you can find several examples scripts to submit trainings to the GPU-machine. Copy&Paste an example .submit script to your directory and modify the submission script for your needs. Keep in mind that all paths have to be full paths (also in your script).

Here are some useful commands:

For submission: bsub<xxxx.submit
List all running/queued jobs: bjobs -u all -P phys3b
Kill a running job: bkill JOBID
Have a short peek into the script output of the last running job: bpeek

IMPORTANT: Only one user at a time can run jobs on this machine, other user have to wait until those are finished. Therefore it is very helpful to communicate with the other users.

For your analysis you can copy root files with scp or rsync to your home directory on the GPU machine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU Batch System

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally