Skip to content

Commit bc23b85

Browse files
committed
Document data containers
* data package * --data * push & pull Fixes #240
1 parent 945f6d7 commit bc23b85

File tree

4 files changed

+186
-0
lines changed

4 files changed

+186
-0
lines changed

data_containers.rst

+165
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
.. _sec:data-containers:
2+
3+
###############
4+
Data Containers
5+
###############
6+
7+
*New in {Singularity} 4.2 OCI-Mode.*
8+
9+
********
10+
Overview
11+
********
12+
13+
Workflows in HPC often involve three distinct inputs:
14+
15+
- User data, which needs to be analyzed.
16+
- A software application, which will analyze the user data.
17+
- Reference data, which the software uses to make sense of the user data.
18+
19+
Packaging the software application into an OCI-SIF, with {Singularity} in
20+
OCI-Mode, makes it easy to run and share. User data is also easy to handle with
21+
{Singularity}; simply bind your project directories or files from the HPC system
22+
into the container.
23+
24+
Reference data is a little more complicated, as it tends to be specific to the
25+
software being used and the data being analyzed. Perhaps you are aligning
26+
RNA-Seq data to a reference genome sequence, or passing medical images through a
27+
neural network model. Different reference data might be needed for different
28+
inputs (human vs mouse sequences, CT vs MRI images). Although software is
29+
containerized and ready to go, you will probably have to download reference data
30+
from a 3rd party, assemble, and often pre-process it before it can be used with
31+
the specific program that you need to run.
32+
33+
Putting all the reference data that might ever be needed into the same container
34+
as the software application could simplify things, but could make that container
35+
very large. What if we could easily distribute different sets of reference data
36+
alongside, but separately from the software application? The solution is a data
37+
container.
38+
39+
*************************
40+
Creating a Data Container
41+
*************************
42+
43+
{Singularity} 4.2 introduces the ``data package`` command, to create a data
44+
container OCI-SIF, by 'packaging' files and directories on the host:
45+
46+
.. code::
47+
48+
$ singularity data package <source file/dir> <data container>
49+
50+
For example, to create a data container from the content of the directory
51+
``mydata/`` on the host:
52+
53+
.. code::
54+
55+
$ singularity data package mydata mydata.oci.sif
56+
INFO: Converting layers to SquashFS
57+
58+
The resulting OCI-SIF file contains the packaged data as a SquashFS image,
59+
stored as an OCI artifact, with associated manifest. This allows it to be pushed / pulled
60+
to and from standard OCI registries.
61+
62+
**********************
63+
Using a Data Container
64+
**********************
65+
66+
.. note::
67+
68+
OCI-SIF data containers can only be used in OCI-Mode (when running
69+
containers with ``--oci``).
70+
71+
To use a data container with an application container, the ``--data`` flag is
72+
passed to ``run / shell / exec`` in OCI-Mode. The data flag takes one or more
73+
comma separated ``<data container>:<dest>`` pairs, where ``<data container>`` is
74+
the path to the data container to use, and ``<dest>`` is the path in the
75+
application container at which its content should be made available.
76+
77+
For example, to make the content of the ``mydata.oci.sif`` data container
78+
available under ``/mydata`` in an application container:
79+
80+
.. code::
81+
82+
$ singularity run --oci --data mydata.oci.sif:/mydata application.oci.sif
83+
dtrudg-sylabs@mini:~$ ls /mydata/
84+
bar foo
85+
86+
You can use more than one data container by specifying the ``--data`` flag
87+
multiple times, or listing comma separated ``<data container>:<dest>`` pairs:
88+
89+
.. code::
90+
91+
$ singularity run --oci \
92+
--data mydata.oci.sif:/mydata,otherdata.oci.sif:/otherdata \
93+
application.oci.sif
94+
95+
Is equivalent to:
96+
97+
.. code::
98+
99+
$ singularity run --oci \
100+
--data mydata.oci.sif:/mydata \
101+
--data otherdata.oci.sif:/otherdata \
102+
application.oci.sif
103+
104+
************************
105+
Sharing a Data Container
106+
************************
107+
108+
As mentioned above, a data container stores a SquashFS filesystem as an OCI
109+
artifact. This means it can be pushed to, and pulled from, standard OCI
110+
registries alongside application container images.
111+
112+
To push to the container library:
113+
114+
.. code::
115+
116+
$ singularity push -U mydata.oci.sif library://example/datac/mydata:latest
117+
WARNING: Skipping container verification
118+
INFO: Pushing an OCI-SIF to the library OCI registry. Use `--oci` to pull this image.
119+
4.0KiB / 4.0KiB [=================================================================] 100 %0s
120+
121+
To pull from the container library:
122+
123+
.. code::
124+
125+
$ singularity pull --oci mydata.oci.sif library://example/datac/mydata:latest
126+
WARNING: OCI image doesn't declare a platform. It may not be compatible with this system.
127+
INFO: Cleaning up.
128+
WARNING: integrity: signature not found for object group 1
129+
WARNING: Skipping container verification
130+
131+
To push to Docker Hub, or a similar OCI registry, :ref:`after authenticating <registry>`:
132+
133+
.. code::
134+
135+
$ singularity push mydata.oci.sif docker://dctrud/mydata:latest
136+
4.0KiB / 4.0KiB [=================================================================] 100 %0s
137+
INFO: Upload complete
138+
139+
To pull from Docker Hub, or a similar OCI registry:
140+
141+
.. code::
142+
143+
$ singularity pull --oci docker://dctrud/mydata:latest
144+
WARNING: OCI image doesn't declare a platform. It may not be compatible with this system.
145+
INFO: Using cached OCI-SIF image
146+
147+
148+
149+
150+
151+
152+
153+
154+
155+
156+
157+
158+
159+
160+
161+
162+
163+
164+
165+

index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ networking and security configuration.
8989

9090
Bind Paths and Mounts <bind_paths_and_mounts>
9191
Persistent Overlays <persistent_overlays>
92+
Data Containers <data_containers>
9293
Instances - Running Services <running_services>
9394
Environment and Metadata <environment_and_metadata>
9495
Plugins <plugins>

new.rst

+12
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ OCI-mode
2020
OCI-SIF image to be pushed to ``library://`` and ``docker://`` registries with
2121
layers in the standard OCI tar format. Images pushed with ``--layer-format``
2222
tar can be pulled and run by other OCI runtimes. See :ref:`sec:layer-format`.
23+
2324
- Persistent overlays embedded in OCI-SIF files. See :ref:`overlay-oci-sif`.
2425

2526
- A writable overlay can be added to an OCI-SIF file with the ``singularity
@@ -34,6 +35,17 @@ OCI-mode
3435
an OCI-SIF image into a read-only squashfs layer. This seals changes made to
3536
the image via the overlay, so that they are permanent.
3637

38+
- OCI-SIF data containers provide a way to package reference data into an
39+
OCI-SIF file that can be distributed alongside application containers. See
40+
:ref:`sec:data-containers`.
41+
42+
- A new ``singularity data package`` command allows files and directories to
43+
be packaged into an OCI-SIF data container.
44+
- A new ``--data <data container>:<dest>`` flag for OCI-Mode allows the
45+
contents of a data container to be made available at ``<dest>`` inside an
46+
application container.
47+
48+
3749
*******
3850
Runtime
3951
*******

oci_runtime.rst

+8
Original file line numberDiff line numberDiff line change
@@ -470,6 +470,14 @@ addition to the image manifest and image config:
470470
Multi-layer OCI-SIF images are supported by {Singularity} 4.1 and later. Than
471471
cannot be executed using {Singularity} 4.0.
472472

473+
Data Containers
474+
===============
475+
476+
The OCI-SIF format also supports, from {Singularity} 4.2, the creation of
477+
:ref:`<sec:data-containers>`, which can be used to distribute reference data
478+
alongside applications containers, in a convenient single file that may be
479+
shared via standard OCI registries. See the :ref:`<sec:data-containers>` section
480+
for more information.
473481

474482
.. _sec:cdi:
475483

0 commit comments

Comments
 (0)