You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _uw-research-computing/htcondor-job-submission.md
+80-22Lines changed: 80 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,6 +35,7 @@ We are going to run the traditional 'hello world' program with a CHTC twist. In
35
35
36
36
> You can follow along with the job submission tutorial outlined in this guide in video format.
37
37
> <iframewidth="560"height="315"src="https://www.youtube.com/embed/d5siupeu2kE?si=32FUkZyceV9ROfb1"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
38
+
> You may notice that the example in the video is slightly different—it uses `executable` and `arguments` in the submit file instead of `shell`. This is reflects an older submit convention, however, either case still works!
38
39
{:.tip}
39
40
40
41
### Prepare job executable and submit file on an Access Point
@@ -55,50 +56,76 @@ We are going to run the traditional 'hello world' program with a CHTC twist. In
55
56
sleep 180
56
57
```
57
58
58
-
This script would be run locally on our terminal by typing `hello-world.sh <FirstArgument>`.
59
-
However, to run it on CHTC, we will use our HTCondor submit file to run the `hello-world.sh` executable and to automatically pass different arguments to our script.
59
+
Let's test this script locally. First, let's add *executable* permissions to the script with the `chmod` command, which allows us to execute the code.
60
+
61
+
```
62
+
chmod +x hello-world.sh
63
+
```
64
+
{:.term}
65
+
66
+
Test the code by typing the following line:
67
+
```
68
+
./hello-world.sh 0
69
+
```
70
+
{:.term}
71
+
72
+
You should see a message printed to the terminal, like so:
73
+
74
+
```
75
+
[alice@ap2002 hello-world]$ ./hello-world.sh 0
76
+
Hello CHTC from Job 0 running on alice@ap2002
77
+
```
78
+
{:.term}
79
+
80
+
The terminal will pause for 3 minutes, as specified by `sleep 180` in our script. Cancel the pause time by pressing `CTRL + C`. Now we've successfully run the script locally!
81
+
82
+
However, to run it on CHTC, we will use our HTCondor submit file to run the `hello-world.sh` executable and to automatically pass different arguments to our script.
83
+
84
+
> ### ⚠️ Do not test your full workload directly on the Access Points!
85
+
{:.tip-header}
86
+
87
+
> Simple scripts, such as this example, which use few compute resources, are safe to test, but **any script or executable that requires computing power or excessive memory should be tested inside of a job.**
88
+
{:.tip}
60
89
61
90
3. Prepare your HTCondor submit file, which you will use to tell HTCondor what job to run and how to run it.
62
91
Copy the text below, and paste it into file called `hello-world.sub`.
63
92
This is the file you will submit to HTCondor to describe your jobs (known as the submit file).
64
93
65
94
```
66
95
# hello-world.sub
67
-
# My HTCondor submit file
68
96
69
97
# Specify your executable (single binary or a script that runs several
70
98
# commands) and arguments to be passed to jobs.
71
99
# $(Process) will be a integer number for each job, starting with "0"
72
100
# and increasing for the relevant number of jobs.
73
-
executable = hello-world.sh
74
-
arguments = $(Process)
101
+
shell = ./hello-world.sh $(Process)
75
102
76
-
# Specify the name of the log, standard error, and standard output (or "screen output") files. Wherever you see $(Cluster), HTCondor will insert the
103
+
# Specify the name of the log, standard error, and standard output (or
104
+
# "screen output") files. Wherever you see $(Cluster), HTCondor will insert the
77
105
# queue number assigned to this set of jobs at the time of submission.
78
106
log = hello-world_$(Cluster)_$(Process).log
79
107
error = hello-world_$(Cluster)_$(Process).err
80
108
output = hello-world_$(Cluster)_$(Process).out
81
109
82
-
# This line *would* be used if there were any other files
# Tell HTCondor requirements (e.g., operating system) your job needs,
87
-
# what amount of compute resources each job will need on the computer where it runs.
113
+
# Requirements (e.g., operating system) your job needs, what amount of
114
+
# compute resources each job will need on the computer where it runs.
88
115
request_cpus = 1
89
116
request_memory = 1GB
90
117
request_disk = 5GB
91
118
92
-
# Tell HTCondor to run 3 instances of our job:
119
+
# Run 3 instances of our job:
93
120
queue 3
94
121
```
95
122
{:.sub}
96
123
97
-
By using the "$1" variable in our hello-world.sh executable, we are telling HTCondor to fetch the value of the argument in the first position in the submit file and to insert it in location of "$1" in our executable file.
124
+
By using the "`$1`" variable in our hello-world.sh executable, we are telling HTCondor to fetch the value of the argument in the first position in the submit file and to insert it in location of "$1" in our executable file.
98
125
99
-
Therefore, when HTCondor runs this executable, it will pass the $(Process) value for each job and hello-world.sh will insert that value for "$1" in hello-world.sh.
126
+
Therefore, when HTCondor runs this executable, it will pass the `$(Process)` value for each job and hello-world.sh will insert that value for "$1" in hello-world.sh.
100
127
101
-
More information on special variables like "$1", "$2", and "$@" can be found [here](https://swcarpentry.github.io/shell-novice/06-script.html).
128
+
More information on special variables like "`$1"`, "`$2`", and "`$@`" can be found [here](https://swcarpentry.github.io/shell-novice/06-script.html).
102
129
103
130
5. Now, submit your job to HTCondor’s queue using `condor_submit`:
104
131
@@ -191,7 +218,7 @@ We are going to run the traditional 'hello world' program with a CHTC twist. In
191
218
192
219
## Important Workflow Elements
193
220
194
-
**A. Removing Jobs**
221
+
###Removing Jobs
195
222
196
223
To remove a specific job, use `condor_rm <JobID, ClusterID, Username>`.
197
224
Example:
@@ -201,13 +228,11 @@ Example:
201
228
```
202
229
{:.term}
203
230
204
-
**B. Importance of Testing & Resource Optimization**
231
+
### Test and Optimize Resources
205
232
206
-
1.**Examine Job Success** Within the log file, you can see information about the completion of each job, including a system error code (as seen in "return value 0").
207
-
You can use this code, as well as information in your ".err" file and other output files, to determine what issues your job(s) may have had, if any.
233
+
1.**Examine Job Success**. Within the log file, you can see information about the completion of each job, including a system error code (as seen in "return value 0"). You can use this code, as well as information in your ".err" file and other output files, to determine what issues your job(s) may have had, if any.
208
234
209
-
2.**Improve Efficiency** Researchers with input and output files greater than 1GB, should store them in their `/staging` directory instead of `/home` to improve file transfer efficiency.
210
-
See our data transfer guides to learn more.
235
+
2.**Improve Efficiency**. Researchers with input and output files greater than 1GB, should store them in their `/staging` directory instead of `/home` to improve file transfer efficiency. See our [data transfer guides](htc-job-file-transfer) to learn more.
211
236
212
237
3.**Get the Right Resource Requests**
213
238
Be sure to always add or modify the following lines in your submit files, as appropriate, and after running a few tests.
@@ -237,4 +262,37 @@ Example:
237
262
To learn more about why a job as gone on hold, use `condor_q -hold`.
238
263
When you request too much, your jobs may not match to as many available "slots" as they could otherwise, and your overall throughput will suffer.
239
264
240
-
## You have the basics, now you are ready to run your OWN jobs!
265
+
## Use `shell` or `executable`/`arguments` in your submit file
266
+
267
+
You can either use `shell` or `executable` and `arguments` in your submit file to specify how to run your jobs.
268
+
269
+
### Option 1: Submit with `shell`
270
+
271
+
You can use `shell` to specify the whole command you want to run.
272
+
273
+
```
274
+
shell = ./hello-world.sh $(Process)
275
+
transfer_input_files = hello-world.sh
276
+
```
277
+
278
+
When using `shell`, consider:
279
+
280
+
***Do you need to transfer your executable?** You may need to add your executable script (i.e., `hello-world.sh`) in the `transfer_input_files` line, as HTCondor does not have the ability to autodetect scripts to be transferred.
281
+
* If you are using `./` to execute your code, as in the example above, **ensure your shell script has executable permissions** with the `chmod +x <script>` command.
282
+
* Alternatively, **you may use a shell like `bash` to execute your code**, (i.e., `shell = bash hello-world.sh 0`). When you use this option, you do not have to give your shell script executable permissions.
283
+
***Keep your `shell` script simple**; quoting and special characters may throw errors. If you need complex scripting, we recommend writing a wrapper script.
284
+
285
+
### Option 2: `executable` and `arguments`
286
+
287
+
In this convention, you break your command into two parts—the executable and the arguments.
288
+
289
+
```
290
+
executable = hello-world.sh
291
+
arguments = $(Process)
292
+
```
293
+
294
+
When using this option:
295
+
296
+
***HTCondor will transfer your executable by default.** You do not need to list your executable in `transfer_input_files`.
297
+
* You do not have to add a `./` or `/bin/bash` to the beginning of your `executable` line.
298
+
* You do not have to give your `executable` script executable permissions.
0 commit comments