Skip to content

Commit 706521c

Browse files
committed
Add UWDF/ResearchDrive guide
1 parent 1931f30 commit 706521c

File tree

2 files changed

+117
-0
lines changed

2 files changed

+117
-0
lines changed
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
highlighter: none
3+
layout: guide
4+
title: Directly transfer files between ResearchDrive and your Jobs
5+
guide:
6+
category: Manage data
7+
tag:
8+
- htc
9+
---
10+
11+
## Introduction
12+
13+
CHTC is launching a pilot program in which users can directly transfer files between ResearchDrive and their jobs. This can remove the additional step of uploading or downloading data to and from CHTC data storage locations.
14+
15+
{% capture content %}
16+
- [Introduction](#introduction)
17+
- [Overview](#overview)
18+
- [Is UWDF/ResearchDrive right for me?](#is-uwdfresearchdrive-right-for-me)
19+
* [Other considerations](#other-considerations)
20+
- [Enable CHTC integration for your PI's ResearchDrive](#enable-chtc-integration-for-your-pis-researchdrive)
21+
- [Transfer input files from ResearchDrive to jobs](#transfer-input-files-from-researchdrive-to-jobs)
22+
- [Transfer output files from jobs to ResearchDrive](#transfer-output-files-from-jobs-to-researchdrive)
23+
- [Related pages](#related-pages)
24+
{% endcapture %}
25+
{% include /components/directory.html title="Table of Contents" %}
26+
27+
> ### ⚗️ UWDF/ResearchDrive Pilot Program
28+
{:.tip-header}
29+
30+
> Currently the UWDF/ResearchDrive feature is in its pilot phase and is not yet widely available to CHTC users. This file transfer feature is still under testing and may have occasional issues or bugs.
31+
>
32+
> If you are interested in testing this feature, contact us at [[email protected]](mailto:[email protected]).
33+
{:.tip}
34+
35+
## Overview
36+
37+
Users can transfer files directly between [ResearchDrive](https://it.wisc.edu/services/researchdrive/) and their jobs by using the UW Data Federation (UWDF) and the [Pelican Platform](https://docs.pelicanplatform.org/about-pelican) (which also powers the `osdf:///` file transfer plugin). This integration can remove the additional step of uploading or downloading data to and from CHTC data storage locations, saving time and disk space.
38+
39+
In this way, ResearchDrive behaves as a "staging" location for data to be used in jobs.
40+
41+
<p style="text-align:center"><img src="/images/uwdf-researchdrive-diagram.png" width=800px alt="A diagram illustrating data transfer between CHTC data spaces, ResearchDrive, and Execution Points (where jobs are run)."></p>
42+
<caption>
43+
A diagram illustrating data transfer between CHTC data spaces, ResearchDrive, and Execution Points (where jobs are run).
44+
</caption>
45+
46+
## Is UWDF/ResearchDrive right for me?
47+
48+
This feature is ideal for researchers who:
49+
* Have existing access to their PI's ResearchDrive
50+
* Need to run computations with data already on ResearchDrive
51+
* Work with large datasets
52+
* Need to transfer the same data to multiple jobs
53+
54+
### Other considerations
55+
* Restricted ResearchDrives are ineligible for this service.
56+
* CHTC will only be able to access files within a “CHTC” subdirectory within the PI's ResearchDrive. Any files outside of this directory are inaccessible to our systems.
57+
* ResearchDrive has a hard limit of 25 TB for its free service. Your PI can pay to have the cap increased but they will only be charged for the amount over the 25 TB that is used.
58+
* The Pelican Platform uses a [caching](https://en.wikipedia.org/wiki/Cache_(computing)) mechanism for input files to reduce file transfers over the network. The caching mechanism enables faster transfers for frequently used files/containers; however, older versions of frequently modified files may be transferred instead of the latest version. **If you are changing the contents of the input files frequently, you should rename the file or change its path to ensure the new version is transferred.**
59+
60+
## Enable CHTC integration for your PI's ResearchDrive
61+
62+
To use this feature, we will need to integrate your PI's ResearchDrive with CHTC systems. The PI should send an email to us at [[email protected]](mailto:[email protected]) giving permission for this integration, as well as a list of CHTC users who are allowed to use this integration. Once we have this permission, we will complete the integration process within 3-5 business days. You will be notified when this integration is ready to use.
63+
64+
## Transfer input files from ResearchDrive to jobs
65+
66+
Any file you place in the "CHTC" directory in the top-level directory of your PI's ResearchDrive is accessible to your CHTC jobs. Your jobs are unable to access any data outside of this directory.
67+
68+
To transfer input files from ResearchDrive to your jobs, specify input files with the `pelican://` plugin:
69+
70+
```
71+
transfer_input_files = pelican://chtc.wisc.edu/researchdrive/<PI Netid>/CHTC/inputfile1.txt
72+
```
73+
74+
## Transfer output files from jobs to ResearchDrive
75+
76+
To transfer output files from jobs to ResearchDrive, you will need to use `transfer_output_remaps` or `output_destination` with the `pelican://` plugin in your submit file.
77+
78+
For example, if you specify which output files to transfer:
79+
80+
```
81+
transfer_output_files = outputfile1.txt, outputfile2.txt, outputfile3.txt
82+
```
83+
84+
You can use `transfer_output_remaps` to place files in different locations:
85+
86+
```
87+
transfer_output_remaps = "outputfile1.txt = pelIican://chtc.wisc.edu/researchdrive/<PI NetID>/CHTC/outputfile1.txt; outputfile2.txt = osdf:///chtc/staging/<NetID>/outputfile2.txt"
88+
```
89+
90+
The example above remaps the output files such that only `outputfile1.txt` is placed in ResearchDrive, `outputfile2.txt` is placed in `/staging`, and `outputfile3.txt` is placed in the submit directory on `/home`.
91+
92+
If you want to remap ALL of your outputs to ResearchDrive, instead of `transfer_output_remaps`, use `output_destination`:
93+
94+
```
95+
output_destination = pelican://chtc.wisc.edu/researchdrive/<PI Netid>/CHTC/
96+
```
97+
98+
For more information about transferring output files using HTCondor, [read our guide](/uw-research-computing/htc-job-file-transfer#transfer-output-data-from-jobs).
99+
100+
> ## 💡 Tip: Define the ResearchDrive path as a variable
101+
{:.tip-header}
102+
103+
> Because the Pelican plugin and ResearchDrive path is lengthy, it's useful to define the path as a variable. This variable can then be used in `transfer_input_files`, `transfer_output_files`, and `output_destination` with the `$(variable)` syntax.
104+
>
105+
> For example:
106+
>
107+
> ```
108+
> ResearchDrive = pelican://chtc.wisc.edu/researchdrive/<PI NetID>/CHTC
109+
>
110+
> transfer_input_files = $(ResearchDrive)/inputfile1.txt
111+
> ```
112+
{:.tip}
113+
114+
## Related pages
115+
- [Use and transfer data in jobs on the HTC system](/uw-research-computing/htc-job-file-transfer)
116+
- [Manage large data in HTC jobs](/uw-research-computing/file-avail-largedata)
117+
- [Transfer files between CHTC and ResearchDrive](/uw-research-computing/transfer-data-researchdrive)
144 KB
Loading

0 commit comments

Comments
 (0)