Skip to content

Commit 1931f30

Browse files
committed
Refactor language, add output_destination
1 parent 7dd3b97 commit 1931f30

File tree

2 files changed

+63
-30
lines changed

2 files changed

+63
-30
lines changed

_layouts/file_avail.html

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,22 @@ <h2>Which Option is the Best for Your Files?</h2>
3030
</tr>
3131

3232
<tr>
33-
<td>30 GB+</td>
33+
<td>30 - 100 GB</td>
3434
<td><code>/staging</code></td>
3535
<td><code>file:///staging/</code></td>
3636
<td>CHTC only</td>
3737
<!--<td>special submit "Requirements"</td>-->
3838
</tr>
3939

40+
<tr>
41+
<td>100 GB+</td>
42+
<td>Contact CHTC facilitators</td>
43+
<td</td>
44+
<td></td>
45+
<!--<td>special submit "Requirements"</td>-->
46+
</tr>
47+
48+
4049
<!--To be added once RD is widely available
4150
<tr>
4251
<td>Any</td>

_uw-research-computing/htc-job-file-transfer.md

Lines changed: 53 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,24 @@ guide:
88
- htc
99
---
1010

11+
## Introduction
12+
13+
This guide covers general information on using and transferring data on the HTC system. We will introduce you to the two file systems, how to determine which one is the best place for your data, and how to edit your submit file to transfer input and output files.
14+
1115
{% capture content %}
12-
- [Data Storage Locations](#data-storage-locations)
13-
- [Transferring Data to Jobs with `transfer_input_files`](#transferring-data-to-jobs-with-transfer_input_files)
14-
* [Important Note: File Transfers and Caching with `osdf:///`](#important-note-file-transfers-and-caching-with-osdf)
15-
- [Transferring Data Back from Jobs to `/home` or `/staging`](#transferring-data-back-from-jobs-to-home-or-staging)
16-
* [Default Behavior for Transferring Output Files](#default-behavior-for-transferring-output-files)
17-
* [Specify Which Output Files to Transfer with `transfer_output_files` and `transfer_output_remaps`](#specify-which-output-files-to-transfer-with-transfer_output_files-and-transfer_output_remaps)
16+
- [Introduction](#introduction)
17+
- [Data storage locations](#data-storage-locations)
18+
- [Transfer input data to jobs with `transfer_input_files`](#transfer-input-data-to-jobs-with-transfer_input_files)
19+
- [Transfer output data from jobs](#transfer-output-data-from-jobs)
20+
* [Default behavior for transferring output files](#default-behavior-for-transferring-output-files)
21+
* [Specify which output files to transfer with `transfer_output_files`](#specify-which-output-files-to-transfer-with-transfer_output_files)
22+
* [Transfer files to other locations with `transfer_output_remaps`](#transfer-files-to-other-locations-with-transfer_output_remaps)
23+
* [Transfer files to other locations with `output_destination`](#transfer-files-to-other-locations-with-output_destination)
1824
- [Related pages](#related-pages)
1925
{% endcapture %}
2026
{% include /components/directory.html title="Table of Contents" %}
2127

22-
## Data Storage Locations
28+
## Data storage locations
2329

2430
<p style="text-align:center"><img src="/images/htc-data-spaces.png" width=300px></p>
2531

@@ -43,17 +49,23 @@ The data management mechanisms behind `/home` and `/staging` are different and a
4349
</div>
4450

4551

46-
## Transferring Data to Jobs with `transfer_input_files`
52+
## Transfer input data to jobs with `transfer_input_files`
4753

48-
In the HTCondor submit file, `transfer_input_files` should always be used to tell HTCondor what files to transfer to each job, regardless of if that file originates from your `/home` or `/staging` directory. However, the syntax you use to tell HTCondor to fetch files from `/home` and `/staging` and transfer to your job will change depending on the file size.
54+
To transfer files to jobs, we must specify these files with `transfer_input_files` in the HTCondor job submit file. The syntax you use will depend on its location and file size.
4955

50-
| Input Sizes | File Location | Submit File Syntax to Transfer to Jobs |
56+
| Input File Size (Per File)* | File Location | Submit File Syntax to Transfer to Jobs |
5157
| ----------- | ----------- | ----------- | ----------- |
52-
| 0 - 100 MB | `/home` | `transfer_input_files = input.txt` |
53-
| 100 MB - 30 GB | `/staging` | `transfer_input_files = osdf:///chtc/staging/NetID/input.txt` |
54-
| 100 MB - 100 GB | `/staging/groups` | `transfer_input_files = file:///staging/groups/group_dir/input.txt` |
55-
| > 30 GB | `/staging` | `transfer_input_files = file:///staging/NetID/input.txt` |
56-
| > 100 GB | | For larger datasets (100GB+ per job), contact the facilitation team about the best strategy to stage your data |
58+
| 0 - 1 GB | `/home` | `transfer_input_files = input.txt` |
59+
| 1 - 30 GB | `/staging` | `transfer_input_files = osdf:///chtc/staging/NetID/input.txt` |
60+
| 30 - 100 GB | `/staging` | `transfer_input_files = file:///staging/NetID/input.txt` |
61+
| 1 - 100 GB | `/staging/groups`<sup>†</sup> | `transfer_input_files = file:///staging/groups/group_dir/input.txt` |
62+
| 100 GB+ | | Contact the facilitation team about the best strategy to stage your data |
63+
64+
<caption>
65+
<sup>*</sup> If you are transferring many small files, we recommend <a href="transfer-files-computer#transfer-multiple-files-using-tarballs">compressing them into a single file (.zip, .tar.gz)</a> before transfer. Use the size of the compressed file to determine where to place it.<br>
66+
<sup>†</sup> Only files in personal staging directories can be transferred to jobs with the <code>osdf:///</code> protocol. Files in shared directories (i.e. <code>/staging/groups</code>) currently cannot be transferred to jobs with <code>osdf:///</code> and should use <code>file:///</code>.<br>
67+
<!--<sup>‡</sup> While available on external pools, file transfer performance may be limited.-->
68+
</caption><br>
5769

5870
Multiple input files and file transfer protocols can be specified and delimited by commas, as shown below:
5971

@@ -68,41 +80,53 @@ transfer_input_files = file1, osdf:///chtc/staging/username/file2, file:///stagi
6880

6981
Ensure you are using the correct file transfer protocol for efficiency. Failure to use the right protocol can result in slow file transfers or overloading the system.
7082

71-
### Important Note: File Transfers and Caching with `osdf:///`
72-
The `osdf:///` file transfer protocol uses a [caching](https://en.wikipedia.org/wiki/Cache_(computing)) mechanism for input files to reduce file transfers over the network. This can affect users who refer to input files that are frequently modified.
83+
> ### ⚠️ File transfers and caching with `osdf:///`
84+
{:.tip-header}
7385

74-
*If you are changing the contents of the input files frequently, you should rename the file or change its path to ensure the new version is transferred.*
86+
> The `osdf:///` file transfer protocol uses a [caching](https://en.wikipedia.org/wiki/Cache_(computing)) mechanism for input files to reduce file transfers over the network.
87+
>
88+
> The caching mechanism enables faster transfers for frequently used files/containers. However, older versions of frequently modified files may be transferred instead of the latest version.
89+
>
90+
> **If you are changing the contents of the input files frequently, you should rename the file or change its path to ensure the new version is transferred.**
91+
{:.tip}
7592

76-
## Transferring Data Back from Jobs to `/home` or `/staging`
93+
## Transfer output data from jobs
7794

78-
### Default Behavior for Transferring Output Files
79-
When a job completes, by default, HTCondor will return **newly created or edited files only in top-level directory** back to your `/home` directory. **Files in subdirectories are *not* transferred.** Ensure that the files you want are in the top-level directory by moving them, [creating tarballs](transfer-files-computer#transfer-multiple-files-using-tarballs), or specifying them in your submit file.
95+
### Default behavior for transferring output files
96+
When a job completes, by default, HTCondor will only return **newly created or edited files in top-level directory** back to your `/home` directory. **Files in subdirectories are *not* transferred.** Ensure that the files you want are in the top-level directory by moving them, [creating tarballs](transfer-files-computer#transfer-multiple-files-using-tarballs), or specifying them in your submit file.
8097

8198
<p style="text-align:center"><img src="/images/htc-output-file.png" width=300px></p>
8299
<caption>The directory structure of an example job on the execution point. In this example, according to its default behavior, HTCondor will only transfer the newly created "output_file" and will not transfer the subdirectory "output/".</caption>
83100

84-
### Specify Which Output Files to Transfer with `transfer_output_files` and `transfer_output_remaps`
101+
### Specify which output files to transfer with `transfer_output_files`
85102
If you don't want to transfer all files but only *specific files*, in your HTCondor submit file, use
86103
```
87-
transfer_output_files = file1.txt, file2.txt, file3.txt
104+
transfer_output_files = output_file, output/output_file2, output/output_file3
88105
```
89106
{:.sub}
90107

91-
To transfer a file or folder back to `/staging`, you will need an additional line in your HTCondor submit file:
108+
### Transfer files to other locations with `transfer_output_remaps`
109+
110+
To transfer files back to `/staging`, you will need an additional line in your HTCondor submit file, with each item separated by a semicolon (;):
92111
```
93-
transfer_output_remaps = "file1.txt = file:///staging/NetID/output1.txt; file2.txt = /home/NetId/outputs/output2.txt"
112+
transfer_output_remaps = "output_file = osdf:///chtc/staging/NetID/output1.txt; output_file2 = /home/netid/outputs/output_file2"
94113
```
95114
{:.sub}
96115

97-
In this example above, `file1.txt` is remapped to the staging directory using the `file:///` transfer protocol and simultaneously renamed `output1.txt`. In addition, `file2.txt` is renamed to `output2.txt`and will be transferred to a different directory on `/home`. Ensure you have the right file transfer syntax (`osdf:///` or `file:///` depending on the anticipated file size).
116+
In this example above, `output_file` is remapped to the staging directory using the `file:///` transfer protocol and simultaneously renamed `output1.txt`. In addition, `output_file2` is transferred to a different directory on `/home`. The last output file, `output_file3` is transferred back to the original directory from where the job was submitted from. Ensure you have the right file transfer syntax (`osdf:///` or `file:///` depending on the anticipated file size).
117+
118+
Make sure to only include one set of quotation marks that wraps around the information you are feeding to `transfer_output_remaps`.
119+
120+
### Transfer files to other locations with `output_destination`
121+
122+
If you want to transfer *all* files to a specific destination, use `output_destination`:
98123

99-
If you have multiple files or folders to transfer back to `/staging`, use a semicolon (;) to separate each object:
100124
```
101-
transfer_output_remaps = "output1.txt = file:///staging/NetID/output1.txt; output2.txt = file:///staging/NetID/output2.txt"
125+
output_destination = osdf:///chtc/staging/netid/
102126
```
103127
{:.sub}
104128

105-
Make sure to only include one set of quotation marks that wraps around the information you are feeding to `transfer_output_remaps`.
129+
Do not use `output_destination` and `transfer_output_remaps` simultaneously.
106130

107131
## Related pages
108132
- [Manage Large Data in HTC Jobs](/uw-research-computing/file-avail-largedata)

0 commit comments

Comments
 (0)