Add documentation for retry_request_memory

xamberl · xamberl · commit b9fdbd5efe78 · 2025-10-21T13:19:31.000-05:00
diff --git a/_uw-research-computing/variable-memory.md b/_uw-research-computing/variable-memory.md
@@ -10,18 +10,29 @@ guide:
 
 ## Introduction
 
-Over-requesting memory may cause your jobs to wait in idle for longer than needed, but under-requesting memory may cause your jobs to go on hold when they do exceed the memory allocated to your job.
-
 **This page outlines strategies for requesting variable amounts of memory in jobs.** This guide is for users whose memory usage for a list of jobs may spike unexpectedly or vary depending on inputs or other conditions.
 
 {% capture content %}
 - [Introduction](#introduction)
+- [Why you should care about memory usage](#why-you-should-care-about-memory-usage)
+- [Use `retry_request_memory`](#use-retry_request_memory)
+- [Related pages](#related-pages)
 {% endcapture %}
 {% include /components/directory.html title="Table of Contents" %}
 
-## Option 1: Use `retry_request_memory`
+If your job has ever gone on hold for exceeding memory use, you've probably solved it by increasing your `request_memory` attribute in your submit file. You might even always over-request memory, just to be on the safe side. But have you ever checked your HTCondor `.log` file to see how much memory you actually used?
+
+## Why you should care about memory usage
+
+* **Over-requesting memory** may cause your jobs to **wait in idle** for longer than needed, since HTCondor needs to find and allocate these larger resource requests for your jobs. Additionally, CHTC's HTC system is a shared resource, so we encourage you to be a good citizen and only request the resources you need for your jobs.
+
+* **Under-requesting memory** may cause your jobs to **go on hold** when they do exceed the memory allocated to your job.
 
-This submit file option is good for jobs where a **few of their jobs have unexpected spikes in memory usage**. To use this feature, add this line to your submit file:
+> But what if a fraction of your jobs needs more memory than the rest of the list of jobs? How can you get the throughput you need without over-requesting memory?
+
+## Use `retry_request_memory`
+
+This submit file option is good for jobs where a **few of the jobs have unexpected spikes in memory usage**. To use this feature, add this line to your submit file:
 
 ```
 retry_request_memory = <memory>
@@ -38,26 +49,20 @@ retry_request_memory = 4 GB
 
 Each job generated in this submission will request 1 GB of memory. If the job is evicted because it uses more than 1 GB of memory, the job will be restarted with 4 GB of memory.
 
-## Option 2: Use `retry_request_memory_increase` and `retry_request_memory_max`
-
-If you need a more incremental list of memory options, you can use these two submit file attributes together.
+You may also use expressions:
 
-```
-retry_request_memory_increase = <quantity to add or RequestMemory expression>
-retry_request_memory_max = <memory>
-```
-
-This option works similar to `retry_request_memory`, except allowing multiple retries in increments.
-
-For example, if you use these lines in your submit file:
 ```
 request_memory = 1 GB
-retry_request_memory_increase = RequestMemory*4
-retry_request_memory_max = 16 GB
+retry_request_memory = RequestMemory*4
 ```
 
-Your jobs will be submitted at three increments of increasing memory (1 GB, 4 GB, and 16 GB) until they succeed. If your jobs exceed 16 GB of memory, they will go on hold.
+When using expressions:
+
+* We recommend *only* multiplying by integers.
+* Addition expressions and floating point numbers are not recommended.
 
 ## Related pages
+
+* [HTCondor manual reference](https://htcondor.readthedocs.io/en/main/man-pages/condor_submit.html#retry_request_memory)
 * [Job submission basics](htcondor-job-submission)
 * [Monitor your job](condor_q)