Skip to content

Commit 4a7cebf

Browse files
committed
Add documentation for retry_request_memory
1 parent 8653ea7 commit 4a7cebf

File tree

2 files changed

+66
-0
lines changed

2 files changed

+66
-0
lines changed

_data/htc-guide-menu.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,9 @@
183183
- text: "Known issues"
184184
url: "/uw-research-computing/htc-known-issues"
185185
icon: ""
186+
- text: "Request variable memory"
187+
url: "/uw-research-computing/variable-memory"
188+
icon: ""
186189
- text: "Windows/Linux incompatibility"
187190
url: "/uw-research-computing/dos-unix"
188191
icon: ""
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
highlighter: none
3+
layout: guide
4+
title: Request variable memory
5+
guide:
6+
category: Troubleshooting
7+
tag:
8+
- htc
9+
---
10+
11+
## Introduction
12+
13+
Over-requesting memory may cause your jobs to wait in idle for longer than needed, but under-requesting memory may cause your jobs to go on hold when they do exceed the memory allocated to your job.
14+
15+
**This page outlines strategies for requesting variable amounts of memory in jobs.** This guide is for users whose memory usage for a list of jobs may spike unexpectedly or vary depending on inputs or other conditions.
16+
17+
{% capture content %}
18+
- [Introduction](#introduction)
19+
{% endcapture %}
20+
{% include /components/directory.html title="Table of Contents" %}
21+
22+
## Option 1: Use `retry_request_memory`
23+
24+
This submit file option is good for jobs where a **few of their jobs have unexpected spikes in memory usage**. To use this feature, add this line to your submit file:
25+
26+
```
27+
retry_request_memory = <memory>
28+
```
29+
30+
If your job is evicted because it uses more memory than allocated, the `retry_request_memory` option tells HTCondor to retry the job with the specified increased memory.
31+
32+
For example, if you use these lines in your submit file:
33+
34+
```
35+
request_memory = 1 GB
36+
retry_request_memory = 4 GB
37+
```
38+
39+
Each job generated in this submission will request 1 GB of memory. If the job is evicted because it uses more than 1 GB of memory, the job will be restarted with 4 GB of memory.
40+
41+
## Option 2: Use `retry_request_memory_increase` and `retry_request_memory_max`
42+
43+
If you need a more incremental list of memory options, you can use these two submit file attributes together.
44+
45+
```
46+
retry_request_memory_increase = <quantity to add or RequestMemory expression>
47+
retry_request_memory_max = <memory>
48+
```
49+
50+
This option works similar to `retry_request_memory`, except allowing multiple retries in increments.
51+
52+
For example, if you use these lines in your submit file:
53+
```
54+
request_memory = 1 GB
55+
retry_request_memory_increase = RequestMemory*4
56+
retry_request_memory_max = 16 GB
57+
```
58+
59+
Your jobs will be submitted at three increments of increasing memory (1 GB, 4 GB, and 16 GB) until they succeed. If your jobs exceed 16 GB of memory, they will go on hold.
60+
61+
## Related pages
62+
* [Job submission basics](htcondor-job-submission)
63+
* [Monitor your job](condor_q)

0 commit comments

Comments
 (0)