Working with DUCC and Docker Images (Experimental)¶
DUCC (Daemon that Unpacks Container Images into CernVM-FS) helps in
publishing container images in CernVM-FS. The daemon publishes images in
their extracted form in order for clients to benefit from CernVM-FS'
on-demand loading of files. The DUCC service is deployed as an extra
package and supposed to be co-located with a publisher node having the
cvmfs-server package installed.
Converted images are usable with Docker through the
CernVM-FS docker graph
driver
Vocabulary¶
The following section introduces the terms used in the context of DUCC publishing container images.
Registry A Docker image registry such as:
Image Repository This specifies a group of images. Each image in an image repository is addressed by tag or by digest. Examples are:
- library/redis
- library/ubuntu
The term image repository is unrelated to a CernVM-FS repository.
Image Tag An image tag identifies an image inside an image repository. Tags are mutable and may refer to different container images over time. Examples are:
- 4
- 3-alpine
Image Digest A digest is an immutable identifier for a container image. Digests are calculated based on the result of a hash function to the content of the image. Examples are:
- sha256:2aa24e8248d5c6483c99b6ce5e905040474c424965ec866f7decd87cb316b541
- sha256:d582aa10c3355604d4133d6ff3530a35571bd95f97aadc5623355e66d92b6d2c
To uniquely identify an image, we need to provide: 1. registry 2. image repository 3. image tag or image digest (or both)
We use a slash (/) to separate the [registry]{.title-ref} from the
[repository]{.title-ref}, a colon (:) to separate the
[repository]{.title-ref} from the [tag]{.title-ref} and the at (@) to
separate the [digest]{.title-ref} from the tag or from the
[repository]{.title-ref}. The syntax is
REGISTRY/REPOSITORY[:TAG]
Examples of fully identified images are:
- https://registry.hub.docker.com/library/redis:4
- https://registry.hub.docker.com/minio/minio@sha256:b1e5dd4a7be831107822243a0675ceb5eabe124356a9815f2519fe02beb3f167
- https://registry.hub.docker.com/wurstmeister/kafka:1.1.0@sha256:3a63b48894bce633fb2f0d2579e162163367113d79ea12ca296120e90952b463
Thin Image A Docker image that contains only a reference to the image contents in CernVM-FS. Requires the CernVM-FS Docker graph driver in order to start.
Image Wish List¶
The user specifies the set of images supposed to be published on CernVM-FS in the form of a wish list. The wish list consists of triplets of input image, the output thin image and the cvmfs destination repository for the unpacked data.
wish => (input_image, output_thin_image, cvmfs_repository)
The input image in your wish should unambiguously specify an image as described above.
Wish List Syntax v1¶
The wish list is provided as YAML file. An example of a wish list containing four images is show below.
version: 1
user: smosciat
cvmfs_repo: unpacked.cern.ch
output_format: '$(scheme)://registry.gitlab.cern.ch/thin/$(image)'
input:
- 'https://registry.hub.docker.com/econtal/numpy-mkl:latest'
- 'https://registry.hub.docker.com/agladstein/simprily:version1'
- 'https://registry.hub.docker.com/library/fedora:latest'
- 'https://registry.hub.docker.com/library/debian:stable'
version: wish list version; at the moment only 1 is supported.
user: the account that will push the thin images into the docker
registry. The password must be stored in the
DOCKER2CVMFS_DOCKER_REGISTRY_PASS environment variable.
cvmfs_repo: the target CernVM-FS repository to store the layers and the flat root file systems.
output_format: how to name the thin images. It accepts a few variables that refer to the input image.
$(scheme), the image url protocol, most likelyhttporhttps$(registry), the Docker registry of the input image, in the case of the example it would beregistry.hub.docker.com$(repository), the image repository of the input image, likelibrary/ubuntuoratlas/athena$(tag), the tag of the image, which could belatest,stableorv0.1.4$(image), combines$(repository)and$(tag)
input: list of docker images to convert
The current wish list format requires all the images to be stored in the same CernVM-FS repository and have the same thin output image format.
DUCC Commands¶
DUCC supports the following commands.
convert¶
The convert command provides the core functionality of DUCC:
cvmfs_ducc convert wishlist.yaml
where wishlist.yaml is the path of a wish list file.
This command will try to ingest all the specified images into CernVM-FS.
The process consists of downloading the manifest of the image, downloading and ingesting the layers that compose each image, uploading the thin image, creating the flat root file system necessary to work with Singularity and writing DUCC specific metadata in the CernVM-FS repository next to the unpacked image data.
The layers are stored in the .layer subdirectory in the CernVM-FS
repository, while the flat root file systems are stored in the .flat
subdirectory.
loop¶
The loop command continuously executes the convert command. On each
iteration, the wish list file is read again in order to pick up changes.
cvmfs_ducc loop recipe.yaml
convert-single-image¶
The convert-single-image command is useful when only a single image
need to be converted and pushed into a CernVM-FS repository.
cvmfs_ducc convert-single-image image-to-convert repository.cern.ch
The command takes two arguments as input, the image to convert and the CernVM-FS repository where to store it.
The image-to-convert argument follow the same syntax of the wishlist,
for instance it could be something like
https://registry.hub.docker.com/library/fedora:latest.
Incremental Conversion¶
The convert command will extract image contents into CernVM-FS only
where necessary. In general, some parts of the wish list will be already
converted while others will need to be converted from scratch.
An image that has been already unpacked in CernVM-FS will be skipped. For unconverted images, only the missing layers will be unpacked.
Layer Aware¶
DUCC is now aware that containers images are build incrementally on top of smaller layers.
Converting an image based on an image already inside the repository will skip most of the work.
As long as the lower layers of an image don't change this allows a very fast ingestion of software images, irrespectively of their size.
Notification¶
DUCC provides a basic notification system to alert external services of updates in the file system.
The notifications are appended to a simple text file as JSON objects.
Human operator or software can follow the file and react on notification of interest.
The notification file, eventually can grow large. The suggestion is to
treat it as a standard log file with tools like logrotate.
Multiple DUCC processes can write on the same notification file at the same time, multiple consumer can read from it.
The notification are activated if and only if the user ask for them
providing a file where to write them. To provide a notification file the
flag -n/--notification-file is available.
Multiprocess¶
DUCC is able to run multiprocess against the same CernVM-FS repository.
Before to interact with the CernVM-FS repository, DUCC takes a file
system level lock against /tmp/DUCC.lock.
This allows to run multiple instances of DUCC at the same time, one instance could listen to a web socket, while one could be doing wishlist conversion.