Hardware considerations

EXESS is designed to use GPUs as efficiently as possible. Currently, through CUDA and HIP we are able to support both NVIDIA and AMD GPUs.

Considerations to be taken into account depending on the hardware being used are outlined here:

NVIDIA

  • EXESS supports NVIDIA GPUs from Tesla (compute capability 70) onwards

  • If your consumer GPU has the required capability it will run EXESS

    • Additionally, a consumer GPU with less than 6 GB of memory will be very limiting in terms of systems and levels of theory

  • EXESS supports up to Hopper GPU architecture (compute capability 90) if you have access to newer hardware, please open an issue

  • EXESS supports from CUDA 11.1 onwards.

  • EXESS supports the NVHPC toolkit

Note that performance of EXESS is directly correlated with the Double Precision Floating Point performance of the GPU being used.

AMD

  • EXESS needs the MAGMA library to work on AMD

  • Performance varies wildly across ROCM versions on AMD GPUs, currently the most stable version is 5.7.0

  • EXESS is tested and developed for MI250x (gfx90a), other gfx architectures are NOT tested

  • There’s a current bug in the ROCM runtime that causes large 4 center kernels for gradients to crash the GPU

    • To address this on AMD, we recommend using RI-HF to circumvent the 4 centre kernels