Skip to content

Commit b9fdbd5

Browse files
committed
Add documentation for retry_request_memory
1 parent 4a7cebf commit b9fdbd5

File tree

1 file changed

+23
-18
lines changed

1 file changed

+23
-18
lines changed

_uw-research-computing/variable-memory.md

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -10,18 +10,29 @@ guide:
1010

1111
## Introduction
1212

13-
Over-requesting memory may cause your jobs to wait in idle for longer than needed, but under-requesting memory may cause your jobs to go on hold when they do exceed the memory allocated to your job.
14-
1513
**This page outlines strategies for requesting variable amounts of memory in jobs.** This guide is for users whose memory usage for a list of jobs may spike unexpectedly or vary depending on inputs or other conditions.
1614

1715
{% capture content %}
1816
- [Introduction](#introduction)
17+
- [Why you should care about memory usage](#why-you-should-care-about-memory-usage)
18+
- [Use `retry_request_memory`](#use-retry_request_memory)
19+
- [Related pages](#related-pages)
1920
{% endcapture %}
2021
{% include /components/directory.html title="Table of Contents" %}
2122

22-
## Option 1: Use `retry_request_memory`
23+
If your job has ever gone on hold for exceeding memory use, you've probably solved it by increasing your `request_memory` attribute in your submit file. You might even always over-request memory, just to be on the safe side. But have you ever checked your HTCondor `.log` file to see how much memory you actually used?
24+
25+
## Why you should care about memory usage
26+
27+
* **Over-requesting memory** may cause your jobs to **wait in idle** for longer than needed, since HTCondor needs to find and allocate these larger resource requests for your jobs. Additionally, CHTC's HTC system is a shared resource, so we encourage you to be a good citizen and only request the resources you need for your jobs.
28+
29+
* **Under-requesting memory** may cause your jobs to **go on hold** when they do exceed the memory allocated to your job.
2330

24-
This submit file option is good for jobs where a **few of their jobs have unexpected spikes in memory usage**. To use this feature, add this line to your submit file:
31+
> But what if a fraction of your jobs needs more memory than the rest of the list of jobs? How can you get the throughput you need without over-requesting memory?
32+
33+
## Use `retry_request_memory`
34+
35+
This submit file option is good for jobs where a **few of the jobs have unexpected spikes in memory usage**. To use this feature, add this line to your submit file:
2536

2637
```
2738
retry_request_memory = <memory>
@@ -38,26 +49,20 @@ retry_request_memory = 4 GB
3849

3950
Each job generated in this submission will request 1 GB of memory. If the job is evicted because it uses more than 1 GB of memory, the job will be restarted with 4 GB of memory.
4051

41-
## Option 2: Use `retry_request_memory_increase` and `retry_request_memory_max`
42-
43-
If you need a more incremental list of memory options, you can use these two submit file attributes together.
52+
You may also use expressions:
4453

45-
```
46-
retry_request_memory_increase = <quantity to add or RequestMemory expression>
47-
retry_request_memory_max = <memory>
48-
```
49-
50-
This option works similar to `retry_request_memory`, except allowing multiple retries in increments.
51-
52-
For example, if you use these lines in your submit file:
5354
```
5455
request_memory = 1 GB
55-
retry_request_memory_increase = RequestMemory*4
56-
retry_request_memory_max = 16 GB
56+
retry_request_memory = RequestMemory*4
5757
```
5858

59-
Your jobs will be submitted at three increments of increasing memory (1 GB, 4 GB, and 16 GB) until they succeed. If your jobs exceed 16 GB of memory, they will go on hold.
59+
When using expressions:
60+
61+
* We recommend *only* multiplying by integers.
62+
* Addition expressions and floating point numbers are not recommended.
6063

6164
## Related pages
65+
66+
* [HTCondor manual reference](https://htcondor.readthedocs.io/en/main/man-pages/condor_submit.html#retry_request_memory)
6267
* [Job submission basics](htcondor-job-submission)
6368
* [Monitor your job](condor_q)

0 commit comments

Comments
 (0)