Skip to content

GPU support

More information on the actions that must be performed to ensure that GPU software included in EESSI can use the GPU in your system is available below.

Please open a support issue if you need help or have questions regarding GPU support.

Make sure the ${EESSI_VERSION} version placeholder is defined!

In this page, we use ${EESSI_VERSION} as a placeholder for the version of the EESSI repository, for example:

/cvmfs/software.eessi.io/versions/${EESSI_VERSION}

Before inspecting paths, or executing any of the specified commands, you should define $EESSI_VERSION first, for example with:

export EESSI_VERSION=2023.06

Support for using NVIDIA GPUs

EESSI supports running CUDA-enabled software. All CUDA-enabled modules are marked with the (gpu) feature, which is visible in the output produced by module avail.

NVIDIA GPU drivers

For CUDA-enabled software to run, it needs to be able to find the NVIDIA GPU drivers of the host system. The challenge here is that the NVIDIA GPU drivers are not always in a standard system location, and that we can not install the GPU drivers in EESSI (since they are too closely tied to the client OS and GPU hardware).

Compiling CUDA software

An additional requirement is necessary if you want to be able to compile CUDA-enabled software using a CUDA installation included in EESSI. This requires a full CUDA SDK, but the CUDA SDK End User License Agreement (EULA) does not allow for full redistribution. In EESSI, we are (currently) only allowed to redistribute the files needed to run CUDA software.

Full CUDA SDK only needed to compile CUDA software

Without a full CUDA SDK on the host system, you will still be able to run CUDA-enabled software from the EESSI stack, you just won't be able to compile additional CUDA software.

Below, we describe how to make sure that the EESSI software stack can find your NVIDIA GPU drivers and (optionally) full installations of the CUDA SDK.

host_injections variant symlink

In the EESSI repository, a special directory has been prepared where system administrators can install files that can be picked up by software installations included in EESSI. This gives the ability to administrators to influence the behaviour (and capabilities) of the EESSI software stack.

This special directory is located in /cvmfs/software.eessi.io/host_injections, and it is a CernVM-FS Variant Symlink: a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see 'Variant Symlinks' in the official CernVM-FS documentation).

Default target for host_injections variant symlink

Unless otherwise configured in the CernVM-FS client configuration for the EESSI repository, the host_injections symlink points to /opt/eessi on the client system:

$ ls -l /cvmfs/software.eessi.io/host_injections
lrwxrwxrwx 1 cvmfs cvmfs 10 Oct  3 13:51 /cvmfs/software.eessi.io/host_injections -> /opt/eessi

As an example, let's imagine that we want to use a architecture-specific location on a shared filesystem as the target for the symlink. This has the advantage that one can make changes under host_injections that affect all nodes which share that CernVM-FS configuration. Configuring this in your CernVM-FS configuration would mean adding the following line in the client configuration file:

EESSI_HOST_INJECTIONS=/shared_fs/path

Don't forget to reload the CernVM-FS configuration

After making a change to a CernVM-FS configuration file, you also need to reload the configuration:

sudo cvmfs_config reload

All CUDA-enabled software in EESSI expects the CUDA drivers to be available in a specific subdirectory of this host_injections directory. In addition, installations of the CUDA SDK included EESSI are stripped down to the files that we are allowed to redistribute; all other files are replaced by symbolic links that point to another specific subdirectory of host_injections. For example:

$ ls -l /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc
lrwxrwxrwx 1 cvmfs cvmfs 109 Dec 21 14:49 /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc

If the corresponding full installation of the CUDA SDK is available there, the CUDA installation included in EESSI can be used to build CUDA software.

Using NVIDIA GPUs via a native EESSI installation

Here, we describe the steps to enable GPU support when you have a native EESSI installation on your system.

Required permissions

To enable GPU support for EESSI on your system, you will typically need to have system administration rights, since you need write permissions on the folder to the target directory of the host_injections symlink.

Exposing NVIDIA GPU drivers

To install the symlinks to your GPU drivers in host_injections, run the link_nvidia_host_libraries.sh script that is included in EESSI:

/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh

This script uses ldconfig on your host system to locate your GPU drivers, and creates symbolic links to them in the correct location under host_injections directory. It also stores the CUDA version supported by the driver that the symlinks were created for.

Re-run link_nvidia_host_libraries.sh after NVIDIA GPU driver update

You should re-run this script every time you update the NVIDIA GPU drivers on the host system.

Note that it is safe to re-run the script even if no driver updates were done: the script should detect that the current version of the drivers were already symlinked.

Installing full CUDA SDK (optional)

To install a full CUDA SDK under host_injections, use the install_cuda_host_injections.sh script that is included in EESSI:

/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh

For example, to install CUDA 12.1.1 in the directory that the host_injections variant symlink points to, using /tmp/$USER/EESSI as directory to store temporary files:

/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh --cuda-version 12.1.1 --temp-dir /tmp/$USER/EESSI --accept-cuda-eula
You should choose the CUDA version you wish to install according to what CUDA versions are included in EESSI; see the output of module avail CUDA/ after setting up your environment for using EESSI.

You can run /cvmfs/software.eessi.io/scripts/install_cuda_host_injections.sh --help to check all of the options.

Tip

This script uses EasyBuild to install the CUDA SDK. For this to work, two requirements need to be satisfied:

  • module load EasyBuild should work (or the eb command is already available in the environment);
  • The version of EasyBuild being used should provide the requested version of the CUDA easyconfig file (in the example case above, that's CUDA-12.1.1.eb).

You can rely on the EasyBuild installation that is included in EESSI for this.

Alternatively, you may load an EasyBuild module manually before running the install_cuda_host_injections.sh script to make an eb command available.

Using NVIDIA GPUs via EESSI in a container

We focus here on the Apptainer/Singularity use case, and have only tested the --nv option to enable access to GPUs from within the container.

If you are using the EESSI container to access the EESSI software, the procedure for enabling GPU support is slightly different and will be documented here eventually.

Exposing NVIDIA GPU drivers

When running a container with apptainer or singularity it is not necessary to run the install_cuda_host_injections.sh script since both these tools use $LD_LIBRARY_PATH internally in order to make the host GPU drivers available in the container.

The only scenario where this would be required is if $LD_LIBRARY_PATH is modified or undefined.

Testing the GPU support

The quickest way to test if software installations included in EESSI can access and use your GPU is to run the deviceQuery executable that is part of the CUDA-Samples module:

module load CUDA-Samples
deviceQuery
If both are successful, you should see information about your GPU printed to your terminal.