Skip to content

Commit 3a633cc

Browse files
authored
Merge pull request #935 from CHTC/preview-retry-request-memory
Add documentation for retry_request_memory
2 parents d537905 + 5f344b9 commit 3a633cc

File tree

2 files changed

+73
-0
lines changed

2 files changed

+73
-0
lines changed

_data/htc-guide-menu.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,9 @@
183183
- text: "Known issues"
184184
url: "/uw-research-computing/htc-known-issues"
185185
icon: ""
186+
- text: "Request variable memory"
187+
url: "/uw-research-computing/variable-memory"
188+
icon: ""
186189
- text: "Windows/Linux incompatibility"
187190
url: "/uw-research-computing/dos-unix"
188191
icon: ""
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
highlighter: none
3+
layout: guide
4+
title: Request variable memory
5+
guide:
6+
category: Troubleshooting
7+
tag:
8+
- htc
9+
---
10+
11+
## Introduction
12+
13+
**This page outlines strategies for requesting variable amounts of memory in jobs.** This guide is for users whose memory usage for a list of jobs may spike unexpectedly or vary depending on inputs or other conditions.
14+
15+
{% capture content %}
16+
- [Introduction](#introduction)
17+
- [Why you should care about memory usage](#why-you-should-care-about-memory-usage)
18+
- [Use `retry_request_memory`](#use-retry_request_memory)
19+
- [Related pages](#related-pages)
20+
{% endcapture %}
21+
{% include /components/directory.html title="Table of Contents" %}
22+
23+
If your job has ever gone on hold for exceeding memory use, you've probably solved it by increasing your `request_memory` attribute in your submit file. You might even always over-request memory, just to be on the safe side.
24+
25+
## Why you should care about memory usage
26+
27+
Because CHTC is a shared resource, correctly requesting the resources that you require for your jobs to function ensures that both you and other users have a good experience on the system.
28+
29+
* **Over-requesting memory** may cause your jobs to **wait in idle** for longer than needed, since HTCondor needs to find and allocate these larger resource requests for your jobs. And resources unused by your job could be used for others' jobs.
30+
31+
* **Under-requesting memory** may cause your jobs to **go on hold** when they do exceed the memory allocated to your job. Whatever work by your job will be lost but the computing time will still affect your priority.
32+
33+
But what if only a **fraction** of your jobs needs more memory than the rest of the list of jobs? How can you get the throughput you need without over-requesting memory?
34+
35+
## Use `retry_request_memory`
36+
37+
This submit file option is good for jobs where a **few of the jobs have unexpected spikes in memory usage**. To use this feature, add this line to your submit file:
38+
39+
```
40+
retry_request_memory = <memory>
41+
```
42+
43+
If your job is evicted because it uses more memory than allocated, the `retry_request_memory` option tells HTCondor to retry the job with the specified increased memory.
44+
45+
For example, if you use these lines in your submit file:
46+
47+
```
48+
request_memory = 1 GB
49+
retry_request_memory = 4 GB
50+
```
51+
52+
Each job generated in this submission will request 1 GB of memory. If the job is evicted because it uses more than 1 GB of memory, the job will be restarted with 4 GB of memory.
53+
54+
You may also use expressions:
55+
56+
```
57+
request_memory = 1 GB
58+
retry_request_memory = RequestMemory*4
59+
```
60+
61+
When using expressions:
62+
63+
* We recommend *only* multiplying by integers.
64+
* Expressions using addition operators or floating point numbers are not recommended.
65+
66+
## Related pages
67+
68+
* [HTCondor manual reference](https://htcondor.readthedocs.io/en/main/man-pages/condor_submit.html#retry_request_memory)
69+
* [Job submission basics](htcondor-job-submission)
70+
* [Monitor your job](condor_q)

0 commit comments

Comments
 (0)