drl_artificial_unitelligence

To install the conda environment, type

conda env create -f environment.yml

then activate the environment with

conda activate drl_artificial_unintelligence

and on first install, retrieve the additional dependencies from pip by running the post_install.sh script in a bash terminal. This should look like this:

bash post_install.sh

To monitor the training there is a tensorboard logger which can easily be accessed by opening a bash terminal and typing the following:

tensorboard --logdir runs  --port 6006

One can then find the plots etc at http://localhost:6006/ on their local browser.

HPC Cluster Submission Guide

This guide walks you through submitting jobs to the HPC cluster using SLURM workload manager.

1. Install Bitvise SSH Client

Download and install Bitvise SSH Client from the official website. This will be your primary tool for connecting to the cluster and transferring files.

2. Configure Connection Settings

In Bitvise SSH Client, set up your connection:

Host: cool.hpc.lrz.de
Username: drlearn001
Authentication: sm-7xwnZP+8s

3. Authentication Setup

Configure the authentication:

Password: Enter your account password
MFA (Multi-Factor Authentication): Use drlearn001 as the MFA token

4. Connect and Access Tools

Once connected, you'll see a sidebar on the left with two important buttons:

New Terminal Console: Opens a command-line interface for running commands
New SFTP Window: Opens a file transfer interface for uploading/downloading files

5. Upload Required Files

Use the SFTP window to upload the following essential files to your cluster home directory:

master.sh - The main submission script
run_bash.cmd - The SLURM job script
learning.py - Your Python script (or other computational files)

6. Understanding run_bash.cmd (SLURM Job Script)

The run_bash.cmd file is a SLURM batch script that defines how your job should be executed on the cluster. Here's what each section does:

SLURM Directives (Lines starting with #SBATCH):

#SBATCH -J test: Sets the job name to "test"
#SBATCH -o ./%x.%j.%N.out: Defines output file naming pattern
#SBATCH -D ./: Sets working directory to current directory
#SBATCH --clusters=serial: Specifies the cluster partition
#SBATCH --partition=serial_std: Uses standard serial partition
#SBATCH --mem=5000mb: Allocates 5GB of memory
#SBATCH --cpus-per-task=1: Requests 1 CPU core
#SBATCH --time=10:00:00: Sets maximum runtime to 10 hours
#SBATCH [email protected]: Email for job notifications

Environment Setup:

module load python/3.8.11-base: Loads Python 3.8.11
module load slurm_setup: Loads SLURM configuration
source ../venv/bin/activate: Activates Python virtual environment

Job Execution:

python learning.py --dummy_variable=${VARIABLE} --hyperparameter_value=${HYPERPARAMETER}

Runs your Python script with environment variables

7. Understanding master.sh (Job Submission Controller)

The master.sh script automates the submission of multiple jobs with different parameters:

HYPERPARAMETER_A="hyp_A"
HYPERPARAMETER_B="hyp_B"
VARIABLE=4

for HYPERPARAMETER in HYPERPARAMETER_A HYPERPARAMETER_B
do
    sbatch --job-name="test" --export=VARIABLE=$VARIABLE,HYPERPARAMETER=$HYPERPARAMETER run_bash.cmd
done

What it does:

Defines two hyperparameter values (hyp_A and hyp_B)
Sets a variable value (4)
Loops through each hyperparameter
Submits a separate SLURM job for each hyperparameter using sbatch
Passes environment variables to each job

8. Python Script Requirements (learning.py structure)

Your Python scripts need to be structured to work with the SLURM environment. Here are the key requirements:

Command Line Arguments:

Your script should use argparse to handle command-line arguments that will be passed from the SLURM job:

import argparse

def main():
    parser = argparse.ArgumentParser(description="Your script description")
    parser.add_argument('--dummy_variable', type=str, required=False, help='Variable description')
    parser.add_argument('--hyperparameter_value', type=str, required=False, help='Hyperparameter description')
    args = parser.parse_args()

Output Handling:

Ensure your script produces output that can be captured:

Print important information to stdout
Write results to files for persistence
Handle missing or None arguments gracefully

File I/O:

Write output files to the current working directory
Use relative paths when possible
Ensure proper file permissions

Example Structure:

import argparse

def main():
    # Parse command line arguments
    parser = argparse.ArgumentParser(description="Your ML/analysis script")
    parser.add_argument('--param1', type=str, required=False, help='Parameter 1')
    parser.add_argument('--param2', type=str, required=False, help='Parameter 2')
    args = parser.parse_args()
    
    # Your computation logic here
    print(f"Running with param1: {args.param1}")
    print(f"Running with param2: {args.param2}")
    
    # Write results to file
    with open("results.txt", "w") as f:
        f.write(f"Results for {args.param1}, {args.param2}\n")

if __name__ == "__main__":
    main()

Execution Steps:

Upload all files via SFTP
Open terminal console
Submit jobs using either method:

Option A - Using bash (no chmod needed):
```
bash master.sh
```
Option B - Making executable first:
```
chmod +x master.sh run_bash.cmd
./master.sh
```
Monitor jobs:
```
squeue -u drlearn001
```
Check output files when jobs complete

The system will create output files with names like test.jobid.nodename.out containing your script's output and any error messages.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
data		data
figures		figures
saved_models		saved_models
spawn_analysis		spawn_analysis
tensorboard_files_final_runs		tensorboard_files_final_runs
.gitignore		.gitignore
FEATURE_TOGGLES.md		FEATURE_TOGGLES.md
Features.md		Features.md
README.md		README.md
WallImpactAnalysisVariant2.png		WallImpactAnalysisVariant2.png
a2c.py		a2c.py
analysis_utils.py		analysis_utils.py
dqn.py		dqn.py
dqn_lookahead.py		dqn_lookahead.py
dynaDQN.py		dynaDQN.py
environment.py		environment.py
environment.yml		environment.yml
environment_analysis.py		environment_analysis.py
environment_v2.py		environment_v2.py
greedy.py		greedy.py
main.py		main.py
main_v2.py		main_v2.py
networks.py		networks.py
plotting.py		plotting.py
post_install.sh		post_install.sh
ppo.py		ppo.py
ppo_v2.py		ppo_v2.py
sac.py		sac.py
test_env_v2.py		test_env_v2.py
test_main.py		test_main.py
test_policy.py		test_policy.py
test_static_features		test_static_features
todo.txt		todo.txt
transitionModel.py		transitionModel.py
tree_search_utils_new.py		tree_search_utils_new.py
vec_env_utils.py		vec_env_utils.py
view_tensorboard_data.py		view_tensorboard_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

drl_artificial_unitelligence

HPC Cluster Submission Guide

1. Install Bitvise SSH Client

2. Configure Connection Settings

3. Authentication Setup

4. Connect and Access Tools

5. Upload Required Files

6. Understanding run_bash.cmd (SLURM Job Script)

SLURM Directives (Lines starting with #SBATCH):

Environment Setup:

Job Execution:

7. Understanding master.sh (Job Submission Controller)

What it does:

8. Python Script Requirements (learning.py structure)

Command Line Arguments:

Output Handling:

File I/O:

Example Structure:

Execution Steps:

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Predixx/drl_artificial_unitelligence

Folders and files

Latest commit

History

Repository files navigation

drl_artificial_unitelligence

HPC Cluster Submission Guide

1. Install Bitvise SSH Client

2. Configure Connection Settings

3. Authentication Setup

4. Connect and Access Tools

5. Upload Required Files

6. Understanding run_bash.cmd (SLURM Job Script)

SLURM Directives (Lines starting with #SBATCH):

Environment Setup:

Job Execution:

7. Understanding master.sh (Job Submission Controller)

What it does:

8. Python Script Requirements (learning.py structure)

Command Line Arguments:

Output Handling:

File I/O:

Example Structure:

Execution Steps:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages