Client Configuration¶
Structure of /etc/cvmfs¶
The local configuration of CernVM-FS is controlled by several files in
/etc/cvmfs listed in the table below. For every .conf file except for
the files in /etc/cvmfs/default.d you can create a corresponding
.local file having the same prefix in order to customize the
configuration. The .local file will be sourced after the corresponding
.conf file.
In a typical installation, a handful of parameters need to be set in
/etc/cvmfs/default.local. Most likely, this is the list of
repositories (CVMFS_REPOSITORIES), HTTP proxies (see the
network settings section), and
perhaps the cache directory and the cache quota (see the
cache settings section). In a few
cases, one might change a parameter for a specific domain or a specific
repository, or provide an exclusive cache for a specific repository. For
a list of all parameters, see Appendix
Client Parameters.
The .conf and .local configuration files are key-value pairs in the
form PARAMETER=value. For boolean parameters, yes/no, on/off,
true/false or 1/0 can be used as truth values. These are
case-insensitive, so TRUE, On, and yes are equivalent.
The configuration files are sourced by /bin/sh. Hence, a limited set of
shell commands can be used inside these files including comments, if
clauses, parameter evaluation, and shell math ($((...))). Special
characters have to be quoted. For instance, instead of
CVMFS_HTTP_PROXY=p1;p2, write CVMFS_HTTP_PROXY='p1;p2' in order to
avoid parsing errors. The shell commands in the configuration files can
use the CVMFS_FQRN parameter, which contains the fully qualified
repository names that is being mounted. The current working directory is
set to the parent directory of the configuration file at hand.
config.sh- Set of internal helper functions.
default.conf- Set of base parameters.
default.d/$config.conf- Adjustments to the default.conf configuration, usually installed by a cvmfs-config-... package. Read before default.local.
domain.d/$domain.conf- Domain-specific parameters and implementations of the functions in
config.sh. config.d/$repository.conf- Repository-specific parameters and implementations of the functions in
config.sh. keys/- Contains domain-specific sub directories with public keys used to verify the digital signature of file catalogs.
The Config Repository¶
In addition to the local system configuration, a client can configure a
dedicated config repository. A config repository is a standard mountable
CernVM-FS repository that resembles the directory structure of
/etc/cvmfs. It can be used to centrally maintain the public keys and
configuration of repositories that should not be distributed with rather
static packages, and also to centrally
blacklist compromised repository keys.
Configuration from the config repository is overwritten
by the local configuration in case of conflicts; see the comments in
/etc/cvmfs/default.conf for the precise ordering of processing the
config files. The config repository is set by the
CVMFS_CONFIG_REPOSITORY parameter. The default configuration rpm
cvmfs-config-default sets this parameter to cvmfs-config.cern.ch.
The CVMFS_CONFIG_REPO_REQUIRED parameter can be used to force
availability of the config repository in order for other repositories to
get mounted.
The config repository is a very convenient method for updating the configuration on a lot of CernVM-FS clients at once. This also means that it is very easy to break configurations on a lot of clients at once. Also note that only one config repository may be used per client, and this is a technical limitation that is not expected to change. For these reasons, it makes the most sense to reserve the use of this feature for large groups of sites that share a common infrastructure with trusted people that maintain the configuration repository. In order to facilitate sharing of configurations between the infrastructures, a github repository has been set up. Infrastructure maintainers are invited to collaborate there.
Some large sites that prefer to maintain control over their own client configurations publish their own config repository but have automated processes to compare it to a repository from a larger infrastructure. They then quickly update their own config repository with whatever changes have been made to the infrastructure's config repository.
Exchanges of configurations between limited numbers of sites that also use their own separate configuration repository are encouraged to be done by making rpm and/or dpkg packages and distributing them through cvmfs-contrib package repositories. Keeping configurations up to date through packages is less convenient than the configuration repository but better than manually maintaining configuration files.
Mounting¶
Mounting of CernVM-FS repositories is typically handled by autofs.
Just by accessing a repository directory under /cvmfs
(/cvmfs/atlas.cern.ch), autofs will take care of mounting. autofs
will also automatically unmount a repository if it is not used for a
while.
Instead of using autofs, CernVM-FS repositories can be mounted
manually with the system's mount command. In order to do so, use the
cvmfs file system type, like
mount -t cvmfs atlas.cern.ch /cvmfs/atlas.cern.ch
Likewise, CernVM-FS repositories can be mounted through entries in /etc/fstab. A sample entry in /etc/fstab:
atlas.cern.ch /mnt/test cvmfs defaults,_netdev,nodev 0 0
Every mount point corresponds to a CernVM-FS process. Using autofs or
the system's mount command, every repository can only be mounted once.
Otherwise, multiple CernVM-FS processes would collide in the same cache
location. If a repository is needed under several paths, use a bind
mount or use a
private file system mount point.
If a configuration repository is required to mount other repositories,
it will need to be mounted first. Since /etc/fstab mounts are done in
parallel at boot time, the order in /etc/fstab is not sufficient to make
sure that happens. On systemd-based systems this can be done by adding
the option x-systemd.requires-mounts-for=<configrepo> on all the other
mounts. For example:
config-egi.egi.eu /cvmfs/config-egi.egi.eu cvmfs defaults,_netdev,nodev 0 0
cms.cern.ch /cvmfs/cms.cern.ch cvmfs defaults,_netdev,nodev,x-systemd.requires-mounts-for=/cvmfs/config-egi.egi.eu 0 0
Private Mount Points¶
In contrast to the system's mount command which requires root
privileges, CernVM-FS can also be mounted like other Fuse file systems
by normal users. In this case, CernVM-FS uses parameters from one or
several user-provided config files instead of using the files under
/etc/cvmfs. CernVM-FS private mount points do not appear as cvmfs2
file systems but as fuse file systems. The cvmfs_config and
cvmfs_talk commands ignore privately mounted CernVM-FS repositories.
On an interactive machine, private mount points are for instance
unaffected by an administrator unmounting all system's CernVM-FS mount
points by cvmfs_config umount.
In order to mount CernVM-FS privately, use the cvmfs2 command like
cvmfs2 -o config=myparams.conf atlas.cern.ch /home/user/myatlas
A minimal sample myparams.conf file could look like this:
CVMFS_CACHE_BASE=/home/user/mycache
CVMFS_RELOAD_SOCKETS=/home/user/mycache
CVMFS_USYSLOG=/home/user/cvmfs.log
CVMFS_CLAIM_OWNERSHIP=yes
CVMFS_SERVER_URL=http://cvmfs-stratum-one.cern.ch/cvmfs/atlas.cern.ch
CVMFS_KEYS_DIR=/etc/cvmfs/keys/cern.ch
CVMFS_HTTP_PROXY=DIRECT
Make sure to use absolute path names for the mount point and for the
cache directory. Use fusermount -u in order to unmount a privately
mounted CernVM-FS repository.
The private mount points can also be used to use the CernVM-FS Fuse
module in case it has not been installed under /usr and /etc. If the
public keys are not installed under /etc/cvmfs/keys, the directory of
the keys needs to be specified in the config file by
CVMFS_KEYS_DIR=<directory>. If the libcvmfs_fuse.so resp.
libcvmfs_fuse3.so library is not installed in one of the standard search
paths, the CVMFS_LIBRARY_PATH variable has to be set accordingly for
the cvmfs2 command.
The easiest way to make use of CernVM-FS private mount points is with
the cvmfsexec package. Read about that in the Security appendix section
on running the client as a normal user.
Pre-mounting¶
In usual deployments, the fusermount utility from the system fuse
package takes care of mounting a repository before handing of control to
the CernVM-FS client. The fusermount utility is a suid binary because
on older kernels and outside user namespaces, mounting is a privileged
operation.
As of libfuse3, the task of mounting /dev/fuse can be performed by any utility. This functionality has been added, for instance, to Singularity 3.4.
An executable that pre-mounts /dev/fuse has to call the mount() system
call in order to open a file descriptor. The file descriptor number is
than passed as command line parameter to the CernVM-FS client. A working
code example is available in the CernVM-FS
tests.
In order to use the pre-mount functionality in Singularity, create a
container that has the cvmfs package and configuration installed in
it, and also the corresponding cvmfs-fuse3 package. Bind-mount scratch
space at /var/run/cvmfs and cache space at /var/lib/cvmfs. For each
desired repository, add a --fusemount option with container:cvmfs2
followed by the repository name and mountpoint, separated by whitespace.
First mount the configuration repository if required. For example:
CONFIGREPO=config-osg.opensciencegrid.org
singularity exec -S /var/run/cvmfs -B $HOME/cvmfs_cache:/var/lib/cvmfs \
--fusemount "container:cvmfs2 $CONFIGREPO /cvmfs/$CONFIGREPO" \
--fusemount "container:cvmfs2 cms.cern.ch /cvmfs/cms.cern.ch" \
docker://davedykstra/cvmfs-fuse3 bash
The singcvmfs command in the cvmfsexec package makes use of fuse
pre-mounting. Read more about that package in the Security appendix
section on running the client as a normal user.
Remounting and Namespaces/Containers¶
It is common practice to use CernVM-FS from within containers,
especially with Singularity. This sometimes results in a problem because
the Linux kernel does not prevent unmounting a CernVM-FS repository if
the only processes accessing it are in mount namespaces, even though the
fuse processes managing the repository need to keep running until all
processes using the repository exit. The problem in that case is that
the repository cannot be remounted as long as the background processes
keep running. This can be easily reproduced by interactively running a
Singularity container out of CernVM-FS (without the -p option),
running sleep in the background, and exiting Singularity. The
repository can then be unmounted, but it cannot be remounted until the
sleep process dies.
When this happens, cvmfs_config fuser <repo> can be used to identify
all the processes using <repo>. The system administrator can then
contact the owners of the processes to ask to change the application
behavior to avoid this situation (for example by using Singularity
-p), and the processes can be killed to enable the repository to be
remounted.
Docker Containers¶
There are two options to mount CernVM-FS in docker containers. The first option is to bind mount a mounted repository as a volume into the container. This has the advantage that the CernVM-FS cache is shared among multiple containers. The second option is to mount a repository inside a container, which requires a privileged container.
Volume Driver¶
There is an external package that provides a Docker Volume Driver for CernVM-FS. This package provides management of repositories in Docker and Kubernetes. It provides a convenient interface to handle CernVM-FS volume definitions.
Bind mount from the host¶
On Docker >= 1.10, the autofs managed area /cvmfs can be directly
mounted into the container as a shared mount point like
docker run -it -v /cvmfs:/cvmfs:shared centos /bin/bash
In order to bind mount an individual repository from the host, turn off
autofs on the host and mount the repository manually, like:
service autofs stop # systemd: systemctl stop autofs
chkconfig autofs off # systemd: systemctl disable autofs
mkdir -p /cvmfs/sft.cern.ch
mount -t cvmfs sft.cern.ch /cvmfs/sft.cern.ch
Start the docker container with the -v option to mount the CernVM-FS
repository inside, like
docker run -it -v /cvmfs/sft.cern.ch:/cvmfs/sft.cern.ch centos /bin/bash
The -v option can be used multiple times with different repositories.
Mount inside a container¶
In order to use mount inside a container, the container must be
started in privileged mode, like
docker run --privileged -i -t centos /bin/bash
In such a container, CernVM-FS can be installed and used the usual way
provided that autofs is turned off.
Parrot Connector to CernVM-FS¶
In case Fuse cannot be installed, the parrot
toolkit provides a means to
"mount" CernVM-FS on Linux in pure user space. Parrot sandboxes are an
application similar to gdb sandboxes. But instead of debugging the
application, parrot transparently rewrites file system calls and can
effectively provide /cvmfs to an application. We recommend using the
latest precompiled
parrot, which has
CernVM-FS support built-in.
In order to sandbox a command <CMD> with options <OPTIONS> in
parrot, use
export PARROT_ALLOW_SWITCHING_CVMFS_REPOSITORIES=yes
export PARROT_CVMFS_REPO="<default-repositories>"
export HTTP_PROXY='<SITE HTTP PROXY>' # or 'DIRECT;' if not on a cluster or grid site
parrot_run <PARROT_OPTIONS> <CMD> <OPTIONS>
Repositories that are not available by default from the built-in
<default-repositories> list can be explicitly added to
PARROT_CVMFS_REPO. The repository name, a stratum 1 URL, and the
public key of the repository need to be provided. For instance, in order
to add alice-ocdb.cern.ch and ilc.desy.de to the list of repositories,
one can write
export CERN_S1="http://cvmfs-stratum-one.cern.ch/cvmfs"
export DESY_S1="http://grid-cvmfs-one.desy.de:8000/cvmfs"
export PARROT_CVMFS_REPO="<default-repositories> \
alice-ocdb.cern.ch:url=${CERN_S1}/alice-ocdb.cern.ch,pubkey=<PATH/key.pub> \
ilc.desy.de:url=${DESY_S1}/ilc.desy.de,pubkey=<PATH/key.pub>"
given that the repository public keys are in the provided paths.
By default, parrot uses a shared CernVM-FS cache for all parrot instances of the same user stored under a temporary directory that is derived from the user ID. In order to place the CernVM-FS cache into a different directory, use
export PARROT_CVMFS_ALIEN_CACHE=</path/to/cache>
In order to share this directory among multiple users, the users have to belong to the same UNIX group.
Network Settings¶
CernVM-FS uses HTTP for the data transfer. Repository data can be replicated to multiple web servers and cached by standard web proxies such as Squid [Guerrero99]. In a typical setup, repositories are replicated to a handful of web servers in different locations. These replicas form the CernVM-FS Stratum 1 service, whereas the replication source server is the CernVM-FS Stratum 0 server. In every cluster of client machines, there should be two or more web proxy servers that CernVM-FS can use (see cpt_squid). These site-local web proxies reduce the network latency for the CernVM-FS clients, and they reduce the load for the Stratum 1 service. CernVM-FS supports WPAD/PAC proxy auto-configuration [Gauthier99], choosing a random proxy for load-balancing, and automatic fail-over to other hosts and proxies in case of network errors. Roaming clients can connect directly to the Stratum 1 service.
IP Protocol Version¶
CernVM-FS can use both IPv4 and IPv6. For dual-stack stratum 1 hosts it
will use the system default settings when connecting directly to the
host. When connecting to a proxy, by default it will try on the IPv4
address unless the proxy only has IPv6 addresses configured. The
CVMFS_IPFAMILY_PREFER=[46] parameter can be used to select the
preferred IP protocol for dual-stack proxies.
Stratum 1 List¶
To specify the Stratum 1 servers, set CVMFS_SERVER_URL to a
semicolon-separated list of known replica servers (enclose in quotes).
The so defined URLs are organized as a ring buffer. Whenever download of
files fails from a server, CernVM-FS automatically switches to the next
mirror server. For repositories under the cern.ch domain, the Stratum 1
servers are specified in /etc/cvmfs/domain.d/cern.ch.conf.
It is recommended to adjust the order of Stratum 1 servers so that the
closest servers are used with priority. This can be done automatically
by using
geographic ordering.
Alternatively, for roaming clients (clients not using a proxy server),
the Stratum 1 servers can be automatically sorted according to round
trip time by cvmfs_talk host probe (see the Auxiliary Tools section). Otherwise, the proxy server would invalidate round trip
time measurement.
The special sequence @fqrn@ in the CVMFS_SERVER_URL string is
replaced by fully qualified repository name (atlas.cern.ch, cms.cern.ch,
...). That allows to use the same parameter for many repositories
hosted under the same domain. For instance,
http://cvmfs-stratum-one.cern.ch/cvmfs/@fqrn@ can resolve to
http://cvmfs-stratum-one.cern.ch/cvmfs/atlas.cern.ch,
http://cvmfs-stratum-one.cern.ch/cvmfs/cms.cern.ch, and so on
depending on the repository that is being mounted. The same works for
the sequence @org@ which is replaced by the unqualified repository
name (atlas, cms, ...).
Proxy Lists¶
CernVM-FS uses a dedicated HTTP proxy configuration, independent of
system-wide settings. Instead of a single proxy, CernVM-FS uses a chain
of load-balanced proxy groups. The CernVM-FS proxies are set by the
CVMFS_HTTP_PROXY parameter.
Proxy groups are used for load-balancing among several proxies of equal priority. Starting with the first group, one proxy within a group is selected at random. By default, this randomly selected proxy will be used for all requests. If proxy sharding is enabled, then the proxy is instead selected on a per-request basis to distribute the requests across all proxies within the current group.
If a proxy fails, CernVM-FS automatically switches to another proxy from the current group. If all proxies in a group have failed, CernVM-FS switches to the next proxy group. After probing the last proxy group in the chain, the first is probed again. To avoid endless loops, for each file download the number of switches is limited by the total number of proxies.
Proxies within the same group are separated by a pipe character |,
while groups are separated from each other by a semicolon character
; (for example, http://proxy1:8080|http://proxy2:8080;DIRECT). Note that it is possible for a proxy group to consist of only
one proxy. In the case of proxies that use a DNS round-robin entry,
wherein a single host name resolves to multiple IP addresses, CVMFS
automatically internally transforms the name into a load-balanced group,
so you should use the host name and a semicolon. In order to limit the
number of individual proxy servers used in a round-robin DNS entry, set
CVMFS_MAX_IPADDR_PER_PROXY. This can also limit the perceived "hang
duration" while CernVM-FS performs fail-overs.
The DIRECT keyword for a hostname avoids using a proxy altogether.
Note that CVMFS_HTTP_PROXY must be defined in order to mount CVMFS,
but to avoid using any proxies, you can set the parameter to DIRECT.
However, note that this is not recommended for large numbers of clients
accessing remote stratum servers, and stratum server administrators may
ask you to deploy and use proxies.
CVMFS_HTTP_PROXY is typically configured with a primary proxy group
listed first, and potentially other proxy groups listed after that for
backup. In order to prevent CernVM-FS from permanently using the backup
proxies after a fail-over, CernVM-FS will automatically retry the first
proxy group in the list after some time. The delay for re-trying is set
in seconds by CVMFS_PROXY_RESET_AFTER. This reset behavior can be
disabled by setting this parameter to 0.
Proxy List Examples¶
Suppose there are two proxy servers local to your site,
p1.site.example.org and p2.site.example.org, and two regional proxy
servers nearby available for backup use, p3.region.example.org and
p4.region.example.org. In this example all proxy servers are
configured to listen on port 3128. If the two local proxies are equally
preferable to use and configured identically to each other, and the same
applies for the two regional proxies, use :
CVMFS_HTTP_PROXY="http://p1.site.example.org:3128|http://p2.site.example.org:3128;http://p3.region.example.org:3128|http://p4.region.example.org:3128"
However, if p1 should always be preferred over p2 (for example if it
has a faster network or larger cache), use :
CVMFS_HTTP_PROXY="http://p1.site.example.org:3128;http://p2.site.example.org:3128;http://p3.region.example.org:3128|http://p4.region.example.org:3128"
Moreover, if p3 should always be preferred over p4 (for example if
it is significantly closer to your site), use :
CVMFS_HTTP_PROXY="http://p1.site.example.org:3128;http://p2.site.example.org:3128;http://p3.region.example.org:3128;http://p4.region.example.org:3128"
Automatic Proxy Configuration¶
The proxy settings can be automatically gathered through WPAD. The
special proxy server "auto" in CVMFS_HTTP_PROXY is resolved
according to the proxy server specification loaded from a PAC file. PAC
files can be on a file system or accessible via HTTP. CernVM-FS looks
for PAC files in the order given by the semicolon separated URLs in the
CVMFS_PAC_URLS environment variable. This variable defaults to
http://wpad/wpad.dat. The auto keyword used as a URL in
CVMFS_PAC_URLS is resolved to http://wpad/wpad.dat, too, in order to
be compatible with Frontier [Blumenfeld08].
Fallback Proxy List¶
In addition to the regular proxy list set by CVMFS_HTTP_PROXY, a
fallback proxy list is supported in CVMFS_FALLBACK_PROXY. The syntax
of both lists is the same. The fallback proxy list is appended to the
regular proxy list, and if the fallback proxy list is set, any DIRECT is
removed from both lists. The automatic proxy configuration of the
previous section only sets the regular proxy list, not the fallback
proxy list. Also, the fallback proxy list can be automatically
reordered; see the next section.
Ordering of Servers according to Geographic Proximity¶
CernVM-FS Stratum 1 servers provide a RESTful service for geographic
ordering. Clients can request
http://<HOST>/cvmfs/<FQRN>/api/v1.0/geo/<proxy_address>/<server_list>.
The proxy address can be replaced by a UUID if no proxies are used, and
the CernVM-FS client does that if there are no regular proxies. The
server list is comma-separated. The result is an ordered list of indexes
of the input host names. Use of this API can be enabled in a CernVM-FS
client with CVMFS_USE_GEOAPI=yes. That will geographically sort both
the servers set by CVMFS_SERVER_URL and the fallback proxies set by
CVMFS_FALLBACK_PROXY.
Timeouts¶
CernVM-FS tries to gracefully recover from broken network links and
temporarily overloaded paths. The timeout for connection attempts and
for very slow downloads can be set by CVMFS_TIMEOUT and
CVMFS_TIMEOUT_DIRECT. The two timeout parameters apply to a connection
with a proxy server and to a direct connection to a Stratum 1 server,
respectively. A download is considered to be "very slow" if the
transfer rate is below for more than the timeout interval. The threshold
can be adjusted with the CVMFS_LOW_SPEED_LIMIT parameter. A very slow
download is treated like a broken connection.
On timeout errors and on connection failures (but not on name resolving
failures), CernVM-FS will retry the path using an exponential backoff
algorithm. This introduces a jitter in case there are many concurrent
requests by a cluster of nodes, allowing a proxy server or web server to
serve all the nodes consecutively. CVMFS_MAX_RETRIES sets the number
of retries on a given path before CernVM-FS tries to switch to another
proxy or host. The overall number of requests with a given proxy/host
combination is $CVMFS_MAX_RETRIES+1. CVMFS_BACKOFF_INIT sets the
maximum initial backoff (time) in seconds. The actual initial backoff is
picked with milliseconds precision randomly in the interval
[1, CVMFS_BACKOFF_INIT * 1000]. With every retry, the
backoff is then doubled.
DNS Nameserver Changes¶
CernVM-FS can watch /etc/resolv.conf and automatically follow changes
to the DNS servers. This behavior is controlled by the
CVMFS_DNS_ROAMING client configuration. It is by default turned on
macOS and turned off on Linux.
Network Path Selection¶
This section summarizes the CernVM-FS mechanics to select a network path from the client through an HTTP forward proxy to an HTTP endpoint. At any given point in time, there is only one combination of web proxy and web host that a new request will utilize. In this section, it is this combination of proxy and host that is called "network path". The network path is chosen from the collection of web proxies and hosts in the CernVM-FS configuration according to the following rules.
Host Selection¶
The hosts specified as an ordered list. CernVM-FS will always start with the first host and fail-over one by one to the next hosts in the list.
Proxy Selection¶
Web proxies are treated as an ordered list of load-balance groups. Like
the hosts, load-balance groups will be probed one after another. Within
a load-balance group, a proxy is chosen at random. DNS proxy names that
resolve to multiple IP addresses are automatically transformed into a
proxy load-balance group, whose maximum size can be limited by
CVMFS_MAX_IPADDR_PER_PROXY.
Proxy Sharding¶
In the default (non-sharded) configuration, each CernVM-FS client will independently choose a single proxy to be used for all requests. For sites with many clients that are likely to access the same content, this can result in unnecessary duplication of cached content across multiple proxies.
If proxy sharding is enabled via the CVMFS_PROXY_SHARD parameter, all
proxies within a load-balancing group are used concurrently. Each proxy
handles a subset of the requests. Proxies are selected using consistent
hashing so that multiple clients will independently select the same
proxy for a given request, to maximize cache efficiency. If any proxy
fails, CernVM-FS automatically removes it from the load-balancing group
and distributes its requests evenly across the remaining proxies.
Failover Rules¶
On download failures, CernVM-FS tries to figure out if the failure is caused by the host or by the proxy.
- Failures of host name resolution, HTTP 5XX and 404 return codes, and any connection/timeout error, partial file transfer, or non 2XX return code in case no proxy is in use are classified as host failure.
- Failures of proxy name resolution and any connection/timeout error, partial file transfer, or non 2XX return code (except 5XX and 404) are classified as proxy failure if a proxy server is used.
- Explicit proxy errors (indicated via the [X-Squid-Error]{.title-ref} or [Proxy-Status]{.title-ref} headers) will always be classified as proxy failure.
If CernVM-FS detects a host failure, it will fail over to the next host in the list while keeping the proxy server untouched. If it detects a proxy failure, it will fail over to another proxy while keeping the host untouched. CernVM-FS will try all proxies of the current load-balance group in random order before trying proxies from the next load-balance group.
The change of host or proxy is a global change affecting all subsequent requests. In order to avoid concurrent requests changing the global network path at the same time, the actual change of path is only performed if the global host/proxy is equal to the currently used host/proxy of the request. Otherwise, the request assumes that another request already performed the fail-over and only the request's fail-over counter is increased.
In order to avoid endless loops, every request carries a host fail-over counter and a proxy fail-over counter. Once this counter reaches the number of host/proxies, CernVM-FS gives up and returns a failure.
The failure classification can mistakenly take a host failure for a proxy failure. Therefore, after all proxies have been probed, a connection/timeout error, partial file transfer, or non 2XX return code is treated like a host failure in any case and the proxy server as well as the proxy server failure counter of the request at hand is reset. This way, eventually all possible network paths are examined.
Network Path Reset Rules¶
On host or proxy fail-over, CernVM-FS will remember the timestamp of the failover. The first request after a given grace period (see the Network Settings section) will reset the proxy to a random proxy of the first load-balance group or the host to the first host, respectively. If the default proxy/host is still unavailable, the fail-over routines again switch to a working network path.
Retry and Backoff¶
On connection and timeout errors, CernVM-FS retries a fixed, limited number of times on the same network path before performing a fail-over. Retrying involves an exponential backoff with a minimum and maximum waiting time.
Default Values¶
- Network timeout for connections using a proxy: 5 seconds (adjustable
by
CVMFS_TIMEOUT) - Network timeout for connections without a proxy: 10 seconds
(adjustable by
CVMFS_TIMEOUT_DIRECT) - Grace period for proxy reset after fail-over: 5 minutes (adjustable
by
CVMFS_PROXY_RESET_AFTER) - Grace period for host reset after fail-over: 30 minutes (adjustable
by
CVMFS_HOST_RESET_AFTER) - Maximum number of retries on the same network path: 1 (adjustable by
CVMFS_MAX_RETRIES) - Minimum waiting time on a retry: 2 seconds (adjustable by CVMFS_BACKOFF_MIN)
- Maximum waiting time on a retry: 10 seconds (adjustable by CVMFS_BACKOFF_MAX)
- Minimum/Maximum DNS name cache: 1 minute / 1 day
Note
A continuous transfer rate below 1 kB/s is treated like a network timeout.
Cache Settings¶
Downloaded files will be stored in a local cache directory. The CernVM-FS cache has a soft quota; as a safety margin, the partition hosting the cache should provide more space than the soft quota limit; we recommend to leave at least 20% + 1 GB.
Once the quota limit is reached, CernVM-FS will automatically remove
files from the cache according to the least recently used policy.
Removal of files is performed bunch-wise until half of the maximum cache
size has been freed. The quota limit can be set in Megabytes by
CVMFS_QUOTA_LIMIT. For typical repositories, a few Gigabytes make a
good quota limit.
The cache directory needs to be on a local file system in order to allow
each host the accurate accounting of the cache contents; on a network
file system, the cache can potentially be modified by other hosts.
Furthermore, the cache directory is used to create (transient) sockets
and pipes, which is usually only supported by a local file system. The
location of the cache directory can be set by CVMFS_CACHE_BASE.
On SELinux enabled systems, the cache directory and its content need to
be labeled as cvmfs_cache_t. During the installation of CernVM-FS
RPMs, this label is set for the default cache directory
/var/lib/cvmfs. For other directories, the label needs to be set
manually by chcon -Rv --type=cvmfs_cache_t $CVMFS_CACHE_BASE.
Each repository can either have an exclusive cache or join the CernVM-FS
shared cache. The shared cache enforces a common quota for all
repositories used on the host. File duplicates across repositories are
stored only once in the shared cache. The quota limit of the shared
directory should be at least the maximum of the recommended limits of
its participating repositories. In order to have a repository not join
the shared cache but use an exclusive cache, set
CVMFS_SHARED_CACHE=no.
Alien Cache¶
An "alien cache" provides the possibility to use a data cache outside the control of CernVM-FS. This can be necessary, for instance, in HPC environments where local disk space is not available or scarce but powerful cluster file systems are available. The alien cache directory is a directory in addition to the ordinary cache directory. The ordinary cache directory is still used to store control files.
The alien cache directory is set by the CVMFS_ALIEN_CACHE option. It
can be located anywhere including cluster and network file systems. If
configured, all data chunks are stored there. CernVM-FS ensures atomic
access to the cache directory. It is safe to have the alien directory
shared by multiple CernVM-FS processes, and it is safe to unlink files
from the alien cache directory anytime. The contents of files, however,
must not be touched by third-party programs.
In contrast to normal cache mode where files are store in mode 0600, in the alien cache files are stored in mode 0660. So all users being part of the alien cache directory's owner group can use it.
The skeleton of the alien cache directory should be created upfront.
Otherwise, the first CernVM-FS process accessing the alien cache
determines the ownership. The cvmfs2 binary can create such a skeleton
using
cvmfs2 __MK_ALIEN_CACHE__ $alien_cachedir $owner_uid $owner_gid
Since the alien cache is unmanaged, there is no automatic quota
management provided by CernVM-FS; the alien cache directory is
ever-growing. The CVMFS_ALIEN_CACHE requires CVMFS_QUOTA_LIMIT=-1
and CVMFS_SHARED_CACHE=no.
The alien cache might be used in combination with a special repository replication mode that preloads a cache directory (Section cpt_replica). This allows to propagate an entire repository into the cache of a cluster file system for HPC setups that do not allow outgoing connectivity.
Advanced Cache Configuration¶
For exotic cache configurations, CernVM-FS supports specifying multiple,
independent "cache manager instances" of different types. Such cache
manager instances replace the local cache directory. Since the local
cache directory is also used to store transient special files,
CVMFS_WORKSPACE=$local_path must be used when advanced cache
configuration is used.
A concrete cache manager instance has a user-defined name, and it is specified like
CVMFS_CACHE_PRIMARY=myInstanceName
CVMFS_CACHE_myInstanceName_TYPE=posix
Multiple instances can thus be safely defined with different names, but only one is selected when the client boots. The following table lists the valid cache manager instance types.
posix- Uses a cache directory with the standard cache implementation.
tiered- Uses two other cache manager instances in a layered configuration.
external- Uses an external cache plugin process (see Plugins).
The instance name "default" is blocked because the regular cache
configuration syntax is automatically mapped to
CVMFS_CACHE_default_... parameters. The command
sudo cvmfs_talk cache instance can be used to show the currently used
cache manager instance.
Refcounted Cache Mode¶
The default posix cache manager has a "refcounted" mode, which uses additional maps to count references to open file descriptors. Multiple processes reading the same cached files will then no longer create new duplicated file descriptors for the same opened file, which can be useful for highly parallelized workloads. This functionality comes with a small memory overhead, which should however not exceed a few MBs.
The refcount mode can be turned on by setting
CVMFS_CACHE_REFCOUNT=yes
and reloading the cvmfs configuration. To switch it off, the
repositories have to be remounted. Switching it off and doing
cvmfs_config reload will not fail, but silently ignore the option
until the next remount in order to properly work with already open file
descriptors.
Tiered Cache¶
The tiered cache manager combines two other cache manager instances as an upper layer and a lower layer into a single functional cache manager. Usually, a small and fast upper layer (SSD, memory) is combined with a larger and slower lower layer (HDD, network drive). The upper layer needs to be large enough to serve all currently open files. On an upper layer cache miss, CernVM-FS tries to copy the missing object from the lower into the upper layer. On a lower layer cache miss, CernVM-FS download and stores objects either in both layers or in the upper layer only, depending on the configuration.
The parameters CVMFS_CACHE_$tieredInstanceName_UPPER and
CVMFS_CACHE_$tieredInstanceName_LOWER set the names of the upper and
the lower instances. The parameter
CVMFS_CACHE_$tieredInstanceName_LOWER_READONLY=[yesno] controls
whether the lower layer can be populated by the client or not.
Streaming Cache Manager¶
This mode uses a download manager and a backing cache manager to deliver data. Pinned files and catalogs use the backing cache manager. Regular data blocks are downloaded on read, the required data window copied to the user. In order to use the streaming cache manager, set:
CVMFS_STREAMING_CACHE=yes
Note: The streaming cache manager is not ideal when doing multiple small reads of a large chunk, as each read will trigger a re-download of the entire chunk.
External Cache Plugin¶
A CernVM-FS cache manager instance can be provided by an external process. The cache manager process and the CernVM-FS client are connected through a socket, whose address is called "locator". The locator can either address a UNIX domain socket on the local file system, or a TCP socket, as in the following examples
CVMFS_CACHE_instanceName_LOCATOR=unix=/var/lib/cvmfs/cache.socket
# or
CVMFS_CACHE_instanceName_LOCATOR=tcp=192.168.0.24:4242
If a UNIX domain socket is used, both the CernVM-FS client and the cache manager need to be able to access the socket file. Usually that means they have to run under the same user.
Instead of manually starting the cache manager, the CernVM-FS client can
optionally automatically start and stop the cache manager process. This
is called a "supervised cache manager". The first booting CernVM-FS
client starts the cache manager process, the last terminating client
stops the cache manager process. In order to start the cache manager in
supervised mode, use
CVMFS_CACHE_instanceName_CMDLINE=<executable and arguments>, using a
comma (,) instead of a space to separate the command line parameters.
Example¶
The following example configures a tiered cache with an external cache plugin as an upper layer and a read-only, network drive as a lower layer. The cache plugin uses memory to cache data and is part of the CernVM-FS client. This configuration could be used in a data center with diskless nodes and a preloaded cache on a network drive (see Chapter cpt_hpc)
CVMFS_WORKSPACE=/var/lib/cvmfs
CVMFS_CACHE_PRIMARY=hpc
CVMFS_CACHE_hpc_TYPE=tiered
CVMFS_CACHE_hpc_UPPER=memory
CVMFS_CACHE_hpc_LOWER=preloaded
CVMFS_CACHE_hpc_LOWER_READONLY=yes
CVMFS_CACHE_memory_TYPE=external
CVMFS_CACHE_memory_CMDLINE=/usr/libexec/cvmfs/cache/cvmfs_cache_ram,/etc/cvmfs/cache-mem.conf
CVMFS_CACHE_memory_LOCATOR=unix=/var/lib/cvmfs/cvmfs-cache.socket
CVMFS_CACHE_preloaded_TYPE=posix
CVMFS_CACHE_preloaded_ALIEN=/gpfs/cvmfs/alien
CVMFS_CACHE_preloaded_SHARED=no
CVMFS_CACHE_preloaded_QUOTA_LIMIT=-1
The example configuration for the in-memory cache plugin in /etc/cvmfs/cache-mem.conf is
CVMFS_CACHE_PLUGIN_LOCATOR=unix=/var/lib/cvmfs/cvmfs-cache.socket
# 2G RAM
CVMFS_CACHE_PLUGIN_SIZE=2000
NFS Server Mode¶
In case there is no local hard disk space available on a cluster of worker nodes, a single CernVM-FS client can be exported via NFS [Callaghan95] [Shepler03] to these worker nodes. This mode of deployment will inevitably introduce a performance bottleneck and a single point of failure and should be only used if necessary.
NFS export requires Linux kernel >= 2.6.27 on the NFS server. For
instance, exporting works for Scientific Linux 6 but not for Scientific
Linux 5. The NFS server should run a lock server as well. For proper NFS
support, set CVMFS_NFS_SOURCE=yes. On the client side, all available
NFS implementations should work.
In the NFS mode, upon mount an additional directory
nfs_maps.$repository_name appears in the CernVM-FS cache directory.
These NFS maps use leveldb to store the virtual inode CernVM-FS issues
for any accessed path. The virtual inode may be requested by NFS clients
anytime later. As the NFS server has no control over the lifetime of
client caches, entries in the NFS maps cannot be removed.
Typically, every entry in the NFS maps requires some 150-200 Bytes. A
recursive find on /cvmfs/atlas.cern.ch with 50 million entries, for
instance, would add up 8 GB in the cache directory. For a CernVM-FS
instance that is exported via NFS, the safety margin for the NFS maps
needs be taken into account. It also might be necessary to monitor the
actual space consumption.
Note
The NFS share should be mounted with the mount option nordirplus.
Without this option, traversals of directories with large number of
files can slow down significantly.
Tuning¶
The default settings in CernVM-FS are tailored to the normal, non-NFS
use case. For decent performance in the NFS deployment, the amount of
memory given to the metadata cache should be increased. By default, this
is 16M. It can be increased, for instance, to 256M by setting
CVMFS_MEMCACHE_SIZE to 256. Furthermore, the maximum number of
download retries should be increased to at least 2.
The number of NFS daemons should be increased as well. A value of 128
NFS daemons has shown perform well. In Scientific Linux, the number of
NFS daemons is set by the RPCNFSDCOUNT parameter in
/etc/sysconfig/nfs.
The performance will benefit from large RAM on the NFS server (>= 16 GB) and CernVM-FS caches hosted on an SSD hard drive.
Export of /cvmfs with Cray DVS¶
On Cray DVS and possibly other systems that export /cvmfs as a whole
instead of individual repositories as separate volumes, an additional
effort is needed to ensure that inodes are distinct from each other
across multiple repositories. The CVMFS_NFS_INTERLEAVED_INODES
parameter can be used to configure repositories to only issue inodes of
a particular residue class. To ensure pairwise distinct inodes across
repositories, each repository should be configured with a different
residue class. For instance, in order to avoid inode clashes between the
atlas.cern.ch and the cms.cern.ch repositories, there can be a
configuration file /etc/cvmfs/config.d/atlas.cern.ch.local with
CVMFS_NFS_INTERLEAVED_INODES=0%2 # issue inodes 0, 2, 4, ...
and a configuration file /etc/cvmfs/config.d/cms.cern.ch.local with
CVMFS_NFS_INTERLEAVED_INODES=1%2 # issue inodes 1, 3, 5, ...
The maximum number of possibly exported repositories needs to be known
in advance. The CVMFS_NFS_INTERLEAVED_INODES only has an effect in NFS
mode.
Shared NFS Maps (HA-NFS)¶
As an alternative to the existing,
leveldb managed NFS maps, the NFS
maps can optionally be managed out of the CernVM-FS cache directory by
SQLite. This allows the NFS maps to be placed on shared storage and
accessed by multiple CernVM-FS NFS export nodes simultaneously for
clustering and active high-availability setups. In order to enable
shared NFS maps, set CVMFS_NFS_SHARED to the path that should be used
to host the SQLite database. If the path is on shared storage, the
shared storage has to support POSIX file locks. The drawback of the
SQLite managed NFS maps is a significant performance penalty which in
practice can be covered by the memory caches.
Example¶
An example entry /etc/exports (note: the fsid needs to be different for every exported CernVM-FS repository)
/cvmfs/atlas.cern.ch 172.16.192.0/24(ro,sync,no_root_squash,\
no_subtree_check,fsid=101)
A sample entry /etc/fstab entry on a client:
172.16.192.210:/cvmfs/atlas.cern.ch /cvmfs/atlas.cern.ch nfs4 \
ro,ac,actimeo=60,lookupcache=all,nolock,rsize=1048576,wsize=1048576 0 0
File Ownership¶
By default, cvmfs presents all files and directories as belonging to the
mounting user, which for system mounts under /cvmfs is the user
cvmfs. Alternatively, CernVM-FS can present the uid and gid of file
owners as they have been at the time of publication by setting
CVMFS_CLAIM_OWNERSHIP=no.
If the real uid and gid values are shown, stable uid and gid values
across nodes are recommended; otherwise the owners shown on clients can
be confusing. The client can also dynamically remap uid and gid values.
To do so, the parameters CVMFS_UID_MAP and CVMFS_GID_MAP should
provide the path to text files that specify the mapping. The format of
the map files is identical to the map files used for
bulk changes of ownership on release manager machines.
Hotpatching and Reloading¶
Hotpatching a running CernVM-FS instance allows reloading most of the code without unmounting the file system. The current active code is unloaded and the code from the currently (newly) installed binaries is loaded. Hotpatching is logged to syslog. Since CernVM-FS is re-initialized during hotpatching and configuration parameters are re-read, hotpatching can be also seen as a "reload".
Note
During reload not all client config parameters can be changed, some
need a remount to take effect.
Since CernVM-FS 2.11, reloading the client considers the status of
CVMFS_DEBUGLOG. Independent of if the client runs in debug mode or not
before the reload, after the reload the debug mode is only selected if
CVMFS_DEBUGLOG is set. For earlier versions before CernVM-FS 2.11, the
client mode was static and reload was not able to switch from or to
debug mode.
Hotpatching has to be done for all repositories concurrently by
cvmfs_config reload [-c]
The optional parameter -c specifies if the CernVM-FS cache should be
wiped out during the hotpatch. Reloading of the parameters of a specific
repository can be done like
cvmfs_config reload atlas.cern.ch
In order to see the history of loaded CernVM-FS Fuse modules, run
cvmfs_talk hotpatch history
The currently loaded set of parameters can be shown by
cvmfs_talk parameters
The CernVM-FS packages use hotpatching in the package upgrade process.
Auxiliary Tools¶
cvmfs_fsck¶
CernVM-FS assumes that the local cache directory is trustworthy.
However, it might happen that files get corrupted in the cache directory
caused by errors outside the scope of CernVM-FS. CernVM-FS stores files
in the local disk cache with their cryptographic content hash key as
name, which makes it easy to verify file integrity. CernVM-FS contains
the cvmfs_fsck utility to do so for a specific cache directory. Its
return value is comparable to the system's fsck. For example,
cvmfs_fsck -j 8 /var/lib/cvmfs/shared
checks all the data files and catalogs in /var/lib/cvmfs/shared using
8 concurrent threads. Supported options are:
-v Produce more verbose output.
-j #threads Sets the number of concurrent threads that check files in the cache directory. Defaults to 4.
-p Tries to automatically fix problems.
-f Unlinks the cache database. The database will be automatically rebuilt by CernVM-FS on next mount.
The cvmfs_config fsck command can be used to verify all configured
repositories.
cvmfs_config¶
The cvmfs_config utility provides commands in order to set up the
system for use with CernVM-FS.
- setup
-
The
setupcommand takes care of basic setup tasks, such as creating the cvmfs user and allowing access to CernVM-FS mount points by all users. - chksetup
-
The
chksetupcommand inspects the system and the CernVM-FS configuration in/etc/cvmfsfor common problems. - showconfig
-
The
showconfigcommand prints the CernVM-FS parameters for all repositories or for the specific repository given as argument. With the [-s]{.title-ref} option, only non-empty parameters are shown. - stat
-
The
statcommand prints file system and network statistics for currently mounted repositories. - status
-
The
statuscommand shows all currently mounted repositories and the process ID (PID) of the CernVM-FS processes managing a mount point. - probe
-
The
probecommand tries to access/cvmfs/$repositoryfor all repositories specified inCVMFS_REPOSITORIESor the ones specified as a space separated list on the command line, respectively. - fsck
-
Run
cvmfs_fsckon all repositories specified inCVMFS_REPOSITORIES. - fuser
-
Identify all the processes that are accessing a cvmfs repository, preventing it from either being unmounted or mounted. See the remounting and namespaces/containers section.
- reload
-
The
reloadcommand is used to reload or hotpatch CernVM-FS instances; see Hotpatching and Reloading. - umount
-
The
umountcommand unmounts all currently mounted CernVM-FS repositories, which will only succeed if there are no open file handles on the repositories. - wipecache
-
The
wipecachecommand is an alias forreload -c. - killall
-
The
killallcommand immediately unmounts all repositories under/cvmfsand terminates the associated processes. It is meant to escape from a hung state without the need to reboot a machine. However, all processes that use CernVM-FS at the time will be terminated, too. The need to use this command very likely points to a network problem or a bug in cvmfs. - bugreport
-
The
bugreportcommand creates a tarball with collected system information which can be attached to a bug report.
cvmfs_talk¶
The cvmfs_talk command provides a way to control a currently running
CernVM-FS process and to extract information about the status of the
corresponding mount point. Most of the commands are for special purposes
only or covered by more convenient commands, such as
cvmfs_config showconfig or cvmfs_config stat. Four commands might be
of particular interest though.
cvmfs_talk cleanup 0
will, without interruption of service, immediately clean up the cache from all files that are not currently pinned in the cache.
cvmfs_talk cleanup rate 120
shows the number of cache cleanups in the last two hours (120 minutes). If this value is larger than one or two, the cache size is probably two small and the client experiences cache thrashing.
cvmfs_talk internal affairs
prints the internal status information and performance counters. It can be helpful for performance engineering. They can also be exported in regular intervals (see cpt_telemetry).
cvmfs_talk -i <repo> remount
starts the catalog update routine. When using remount sync the system
waits for the new file system snapshot to be served (if there is a new
one).
Kernel Cache Tuning¶
Using efficiently the kernel cache can increase the overall performance.
Requests that would normally be answered by cvmfs, can - if cached -be
directly answered by the kernel which shortens the overall request time.
There are multiple client config parameters that influence the kernel
cache behavior.
Parameter Meaning
CVMFS_KCACHE_TIMEOUT Timeout in seconds for path names and file attributes in the kernel file system buffers.
CVMFS_CACHE_SYMLINKS If set to yes, enables symlink caching in the kernel.
CVMFS_STATFS_CACHE_TIMEOUT seconds (no caching by default). can be expensive. Caching time of statfs() in Calling statfs() in high frequency
Caching of symlink in the kernel means that the mangled name is stored, so that there is no need to resolve it again when it is requested for another time. Activating this option makes only sense if symlinks are heavily accessed. First performance measurement showed a slightly slower performance on the very first access (cold cache) but a better performance for multiple accesses (warm and hot cache).
Warning
Symlink caching works best with kernel >= 6.2rc1 and
libfuse >= 3.16. It already works from version libfuse 3.10.0 on but
has restriction, e.g. mounts on top of mounts will be destroyed if
they are a symlink.
File System Information¶
Information about the current cache usage can be gathered using the df
utility. For repositories created with the CernVM-FS 2.1 toolchain,
information about the overall number of file system entries in the
repository as well as the number of entries covered by currently loaded
metadata can be gathered by df -i.
Monitoring¶
CernVM-FS offers multiple options to remotely monitor client status and behavior.
Since the early days, CernVM-FS supports the Nagios monitoring system [Schubert08]. A checker plugin is available on our website.
Since CernVM-FS 2.11 there are two more options: 1)
Telemetry Aggregator cvmfs_talk internal affairs, and 2) sending an extended CURL HTTP
header for each download request. For this, CVMFS_HTTP_TRACING must be
set. It will then include uid, gid, and pid with each download
request.
Note
Depending on which CernVM-FS component sends the CURL request, uid,
gid or pid might not be set. Based on the platform, their default
value -1 might change to a large number if the base type is
unsigned.
Furthermore, CVMFS_HTTP_TRACING_HEADERS can be set. This parameter
allows for user-defined, static key-value pairs to be added to the
header, e.g. to identify the client that send the request. As key, only
alphanumeric sequences are accepted and white space around the key is
ignored. Invalid keys are ignored. An example is given below
# client config
CVMFS_HTTP_TRACING=on #(default off)
# illegal headers are: CVMFS-X-h2:ff and X-CVMFS-h3:12_ad
CVMFS_HTTP_TRACING_HEADERS='h1:testCVMFS-X-h2:ffX-CVMFS-h3:12_ad h4 : 12fs_?'
# debug output
(download) CURL Header for URL: /data/81/7c882d4a2e9dd7f9c5c2bfb4e04ff316e436dfC is:
Connection: Keep-Alive
Pragma:
User-Agent: cvmfs Fuse 2.11.0
X-CVMFS-h1: test
X-CVMFS-h4: 12fs_?
X-CVMFS-PID: 561710
X-CVMFS-GID: 0
X-CVMFS-UID: 0
Debug Logs¶
The cvmfs2 binary forks a watchdog process on start. Using this
watchdog, CernVM-FS is able to create a stack trace in case certain
signals (such as a segmentation fault) are received. The watchdog writes
the stack trace into syslog as well as into a file stacktrace in the
cache directory.
CernVM-FS can be started in debug mode. In the debug mode, CernVM-FS
will log with high verbosity which makes the debug mode unsuitable for
production use. In order to turn on the debug mode, set
CVMFS_DEBUGLOG=/tmp/cvmfs.log.