WARNING: development of the software test suite has only just started and is a work in progress. This page describes how the test suite will be designed, but many things are not implemented yet and the design may still change.
Description of the software test suite¶
The EESSI project uses the ReFrame framework for software testing. ReFrame is designed particularly for testing HPC software and thus has well integrated support for interacting with schedulers, as well as various launchers for MPI programs.
The EESSI software stack can be used in various ways, e.g. by using the container or when the CVMFS software stack is mounted natively. This means the commands that need to be run to test an application are different in both cases. Similarly, systems may have different hardware (CPUs v.s. GPUs, system size, etc). Thus, tests - e.g. a GROMACS test - may have different variants: one designed to run on CPUs, one on GPUs, one designed to run through the container, etc.
The main goal of the EESSI test suite is to test the software stack on systems that have the EESSI CVMFS mounted natively. Some tests may also have variants that can run the same test through the container, but note that this setup is technically much more difficult. Thus, the main focus is on tests that run with a native CVMFS mount of the EESSI stack.
By default, ReFrame runs all test variants it find. Thus, in our test suite, we prespecify a number of tags that can be used to select an appropriate subset of tests for your system. We recognize the following tags:
- container: tests that use the EESSI container to run the software. E.g. one variant of our GROMACS test uses
singularity execto launch the EESSI container, load the GROMACS module, and run the GROMACS test.
native: tests that rely on the EESSI software stack being available through the modules system. E.g. one variant of the GROMACS test loads the GROMACS module and runs the GROMACS test.
singlecore: tests designed to run on a single core
singlenode: tests designed to run on a single (multicore) node (note: may still use MPI for multiprocessing)
small: tests designed to run on 2-8 nodes.
large: tests designed to run on >9 nodes.
cpu: test designed to run on CPU.
gpu, gpu_nvidia, gpu_amd: test designed to run on GPUs / nvidia GPUs / AMD GPUs.
How to run the test suite¶
- A copy of the
testsdirectory from software repository
Requirements for container-based tests¶
Specifically for container-based tests, there are some requirements on the host system:
- An installation of ReFrame
- An MPI installation (to launch MPI tests) or PMIx-based launcher (e.g. SLURM compiled with PMIx support)
The container based tests will use a so-called shared alien CVMFS cache to store temporary data. In addition, they use a local CVMFS cache for speed. For this reason, the container tests need to be pointed to one directory that is shared between nodes on your system, and one directory that is node-specific (preferably a local disk). The
shared_alien_cache_minimal.sh script that is part of the test suite defines these, and sets up the correct CVMFS configuration. You will have to adapt the
LOCALSPACE variables in that script for your system, and point them to a shared and node-local directory.
Setting up a ReFrame configuration file¶
Once the prerequisites have been met, you'll need to create a ReFrame configuration file that matches your system (see the ReFrame documentation). If you want to use the container-based tests, you have to define a partition programming environment called
container and make sure it loads any modules needed to provide the MPI installation and singularity command. For an example configuration file, check the
tests/reframe/config/settings.py in the software-layer repository. Other than (potential) adaptations to the
container environment, you should only really need to change the
Adapting the tests to your system¶
For now, you will have to adapt the number of tasks specified in full-node tests to match the number of cores your machine has in a single node (in the future, you should be able to do this through the reframe configuration file). To do so, change all
self.num_tasks_per_node you find in the various tests to that core count (unless they are 1, in which case the test specifically intended for only 1 process per node).
An example run¶
In this example, we assume your current directory is the
tests/reframe folder. To list e.g. all single node, cpu-based application tests on a system that has the EESSI software environment available natively, you execute:
reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t native -t single -t cpu
config/settings.pyfor your system). This should list the tests that are selected based on the provided tags. To run the tests, change the
-largument into a
reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t native -t single -t cpu --performance-report
reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t container -t single -t cpu --performance-report