Use and transfer data in jobs on the HTC system
Which Option is the Best for Your Files?
Input Sizes | Output Sizes | File Location | Syntax for transfer_input_files |
Availability, Security |
---|---|---|---|---|
0 - 100 MB per file, up to 500 MB per job | 0 - 5 GB per job | /home |
No special syntax | CHTC and external pools |
100 MB - TBs per job-specific file; repeatedly-used files > 1GB | 4 GB - TBs per job | /staging |
osdf:/// |
CHTC and external pools |
100 MB - TBs per job-specific file | 4 GB - TBs per job | /staging/groups |
file:/// |
CHTC only |
Table of Contents
Data Storage Locations
The HTC system has two primary locations where users can place their files:
/home
- The default location for files and job submission
- Efficiently handles many files
- Smaller input files (<100 MB) should be placed here
/staging
- Expandable storage system but cannot efficiently handle many small (few MB or less) files
- Larger input files (>100 MB) should be placed here, including container images (.sif)
The data management mechanisms behind /home
and /staging
are different and are optimized to handle different file sizes and numbers of files. It’s important to place your files in the correct location, as it will improve the speed and efficiency at which your data is handled and will help maintain the stability of the HTC filesystem.
If you need a
/staging
directory, request one here.
Transferring Data to Jobs with transfer_input_files
In the HTCondor submit file, transfer_input_files
should always be used to tell HTCondor what files to transfer to each job, regardless of if that file originates from your /home
or /staging
directory. However, the syntax you use to tell HTCondor to fetch files from /home
and /staging
and transfer to your job will change depending on the file size.
Input Sizes | File Location | Submit File Syntax to Transfer to Jobs |
---|---|---|
0 - 100 MB | /home |
transfer_input_files = input.txt |
100 MB - 30 GB | /staging |
transfer_input_files = osdf:///chtc/staging/NetID/input.txt |
100 MB - 100 GB | /staging/groups |
transfer_input_files = file:///staging/groups/group_dir/input.txt |
> 30 GB | /staging |
transfer_input_files = file:///staging/NetID/input.txt |
> 100 GB | For larger datasets (100GB+ per job), contact the facilitation team about the best strategy to stage your data |
Multiple input files and file transfer protocols can be specified and delimited by commas, as shown below:
# My job submit file
transfer_input_files = file1, osdf:///chtc/staging/username/file2, file:///staging/username/file3, dir1, dir2/
... other submit file details ...
Ensure you are using the correct file transfer protocol for efficiency. Failure to use the right protocol can result in slow file transfers or overloading the system.
Important Note: File Transfers and Caching with osdf:///
The osdf:///
file transfer protocol uses a caching mechanism for input files to reduce file transfers over the network. This can affect users who refer to input files that are frequently modified.
If you are changing the contents of the input files frequently, you should rename the file or change its path to ensure the new version is transferred.
Transferring Data Back from Jobs to /home
or /staging
Default Behavior for Transferring Output Files
When a job completes, by default, HTCondor will return newly created or edited files only in top-level directory back to your /home
directory. Files in subdirectories are not transferred. Ensure that the files you want are in the top-level directory by moving them or creating tarballs.
Specify Which Output Files to Transfer with transfer_output_files
and transfer_output_remaps
If you don’t want to transfer all files but only specific files, in your HTCondor submit file, use
transfer_output_files = file1.txt, file2.txt, file3.txt
To transfer a file or folder back to /staging
, you will need an additional line in your HTCondor submit file:
transfer_output_remaps = "file1.txt = file:///staging/NetID/output1.txt; file2.txt = /home/NetId/outputs/output2.txt"
In this example above, file1.txt
is remapped to the staging directory using the file:///
transfer protocol and simultaneously renamed output1.txt
. In addition, file2.txt
is renamed to output2.txt
and will be transferred to a different directory on /home
. Ensure you have the right file transfer syntax (osdf:///
or file:///
depending on the anticipated file size).
If you have multiple files or folders to transfer back to /staging
, use a semicolon (;) to separate each object:
transfer_output_remaps = "output1.txt = file:///staging/NetID/output1.txt; output2.txt = file:///staging/NetID/output2.txt"
Make sure to only include one set of quotation marks that wraps around the information you are feeding to transfer_output_remaps
.