You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _uw-research-computing/htcondor-job-submission.md
+73-21Lines changed: 73 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,6 +35,7 @@ We are going to run the traditional 'hello world' program with a CHTC twist. In
35
35
36
36
> You can follow along with the job submission tutorial outlined in this guide in video format.
37
37
> <iframewidth="560"height="315"src="https://www.youtube.com/embed/d5siupeu2kE?si=32FUkZyceV9ROfb1"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
38
+
> You may notice that the example in the video is slightly different—it uses `executable` and `arguments` in the submit file instead of `shell`. This is reflects an older submit convention, however, either case still works!
38
39
{:.tip}
39
40
40
41
### Prepare job executable and submit file on an Access Point
@@ -55,7 +56,29 @@ We are going to run the traditional 'hello world' program with a CHTC twist. In
55
56
sleep 180
56
57
```
57
58
58
-
This script would be run locally on our terminal by typing `hello-world.sh <FirstArgument>`.
59
+
Let's test this script locally. First, let's add *executable* permissions to the script with the `chmod` command, which allows us to execute the code.
60
+
61
+
```
62
+
chmod +x hello-world.sh
63
+
```
64
+
{:.term}
65
+
66
+
Test the code by typing the following line:
67
+
```
68
+
./hello-world.sh 0
69
+
```
70
+
{:.term}
71
+
72
+
You should see a message printed to the terminal, like so:
73
+
74
+
```
75
+
[alice@ap2002 hello-world]$ ./hello-world.sh 0
76
+
Hello CHTC from Job 0 running on alice@ap2002
77
+
```
78
+
{:.term}
79
+
80
+
The terminal will pause for 3 minutes, as specified by `sleep 180` in our script. Cancel the pause time by pressing `CTRL + C`. Now we've successfully run the script locally!
81
+
59
82
However, to run it on CHTC, we will use our HTCondor submit file to run the `hello-world.sh` executable and to automatically pass different arguments to our script.
60
83
61
84
3. Prepare your HTCondor submit file, which you will use to tell HTCondor what job to run and how to run it.
@@ -64,41 +87,39 @@ We are going to run the traditional 'hello world' program with a CHTC twist. In
64
87
65
88
```
66
89
# hello-world.sub
67
-
# My HTCondor submit file
68
90
69
91
# Specify your executable (single binary or a script that runs several
70
92
# commands) and arguments to be passed to jobs.
71
93
# $(Process) will be a integer number for each job, starting with "0"
72
94
# and increasing for the relevant number of jobs.
73
-
executable = hello-world.sh
74
-
arguments = $(Process)
95
+
shell = ./hello-world.sh $(Process)
75
96
76
-
# Specify the name of the log, standard error, and standard output (or "screen output") files. Wherever you see $(Cluster), HTCondor will insert the
97
+
# Specify the name of the log, standard error, and standard output (or
98
+
# "screen output") files. Wherever you see $(Cluster), HTCondor will insert the
77
99
# queue number assigned to this set of jobs at the time of submission.
78
100
log = hello-world_$(Cluster)_$(Process).log
79
101
error = hello-world_$(Cluster)_$(Process).err
80
102
output = hello-world_$(Cluster)_$(Process).out
81
103
82
-
# This line *would* be used if there were any other files
# Tell HTCondor requirements (e.g., operating system) your job needs,
87
-
# what amount of compute resources each job will need on the computer where it runs.
107
+
# Requirements (e.g., operating system) your job needs, what amount of
108
+
# compute resources each job will need on the computer where it runs.
88
109
request_cpus = 1
89
110
request_memory = 1GB
90
111
request_disk = 5GB
91
112
92
-
# Tell HTCondor to run 3 instances of our job:
113
+
# Run 3 instances of our job:
93
114
queue 3
94
115
```
95
116
{:.sub}
96
117
97
-
By using the "$1" variable in our hello-world.sh executable, we are telling HTCondor to fetch the value of the argument in the first position in the submit file and to insert it in location of "$1" in our executable file.
118
+
By using the "`$1`" variable in our hello-world.sh executable, we are telling HTCondor to fetch the value of the argument in the first position in the submit file and to insert it in location of "$1" in our executable file.
98
119
99
-
Therefore, when HTCondor runs this executable, it will pass the $(Process) value for each job and hello-world.sh will insert that value for "$1" in hello-world.sh.
120
+
Therefore, when HTCondor runs this executable, it will pass the `$(Process)` value for each job and hello-world.sh will insert that value for "$1" in hello-world.sh.
100
121
101
-
More information on special variables like "$1", "$2", and "$@" can be found [here](https://swcarpentry.github.io/shell-novice/06-script.html).
122
+
More information on special variables like "`$1"`, "`$2`", and "`$@`" can be found [here](https://swcarpentry.github.io/shell-novice/06-script.html).
102
123
103
124
5. Now, submit your job to HTCondor’s queue using `condor_submit`:
104
125
@@ -191,7 +212,7 @@ We are going to run the traditional 'hello world' program with a CHTC twist. In
191
212
192
213
## Important Workflow Elements
193
214
194
-
**A. Removing Jobs**
215
+
###Removing Jobs
195
216
196
217
To remove a specific job, use `condor_rm <JobID, ClusterID, Username>`.
197
218
Example:
@@ -201,13 +222,11 @@ Example:
201
222
```
202
223
{:.term}
203
224
204
-
**B. Importance of Testing & Resource Optimization**
225
+
### Test and Optimize Resources
205
226
206
-
1.**Examine Job Success** Within the log file, you can see information about the completion of each job, including a system error code (as seen in "return value 0").
207
-
You can use this code, as well as information in your ".err" file and other output files, to determine what issues your job(s) may have had, if any.
227
+
1.**Examine Job Success**. Within the log file, you can see information about the completion of each job, including a system error code (as seen in "return value 0"). You can use this code, as well as information in your ".err" file and other output files, to determine what issues your job(s) may have had, if any.
208
228
209
-
2.**Improve Efficiency** Researchers with input and output files greater than 1GB, should store them in their `/staging` directory instead of `/home` to improve file transfer efficiency.
210
-
See our data transfer guides to learn more.
229
+
2.**Improve Efficiency**. Researchers with input and output files greater than 1GB, should store them in their `/staging` directory instead of `/home` to improve file transfer efficiency. See our [data transfer guides](htc-job-file-transfer) to learn more.
211
230
212
231
3.**Get the Right Resource Requests**
213
232
Be sure to always add or modify the following lines in your submit files, as appropriate, and after running a few tests.
@@ -237,4 +256,37 @@ Example:
237
256
To learn more about why a job as gone on hold, use `condor_q -hold`.
238
257
When you request too much, your jobs may not match to as many available "slots" as they could otherwise, and your overall throughput will suffer.
239
258
240
-
## You have the basics, now you are ready to run your OWN jobs!
259
+
## Use `shell` or `executable`/`arguments` in your submit file
260
+
261
+
You can either use `shell` or `executable` and `arguments` in your submit file to specify how to run your jobs.
262
+
263
+
### Option 1: Submit with `shell`
264
+
265
+
You can use `shell` to specify the whole command you want to run.
266
+
267
+
```
268
+
shell = ./hello-world.sh $(Process)
269
+
transfer_input_files = hello-world.sh
270
+
```
271
+
272
+
When using `shell`, consider:
273
+
274
+
***Do you need to transfer your executable?** You may need to add your executable script (i.e., `hello-world.sh`) in the `transfer_input_files` line, as HTCondor does not have the ability to autodetect scripts to be transferred.
275
+
* If you are using `./` to execute your code, as in the example above, **ensure your shell script has executable permissions** with the `chmod +x <script>` command.
276
+
* Alternatively, **you may use a shell like `bash` to execute your code**, (i.e., `shell = bash hello-world.sh 0`). When you use this option, you do not have to give your shell script executable permissions.
277
+
***Keep your `shell` script simple**; quoting and special characters may throw errors. If you need complex scripting, we recommend writing a wrapper script.
278
+
279
+
### Option 2: `executable` and `arguments`
280
+
281
+
In this convention, you break your command into two parts—the executable and the arguments.
282
+
283
+
```
284
+
executable = hello-world.sh
285
+
arguments = $(Process)
286
+
```
287
+
288
+
When using this option:
289
+
290
+
***HTCondor will transfer your executable by default.** You do not need to list your executable in `transfer_input_files`.
291
+
* You do not have to add a `./` or `/bin/bash` to the beginning of your `executable` line.
292
+
* You do not have to give your `executable` script executable permissions.
0 commit comments