Hardware considerations
EXESS is designed to use GPUs as efficiently as possible. Currently, through CUDA and HIP we are able to support both NVIDIA and AMD GPUs.
Considerations to be taken into account depending on the hardware being used are outlined here:
NVIDIA
EXESS supports NVIDIA GPUs from Tesla (compute capability 70) onwards
If your consumer GPU has the required capability it will run EXESS
Additionally, a consumer GPU with less than 6 GB of memory will be very limiting in terms of systems and levels of theory
EXESS supports up to Hopper GPU architecture (compute capability 90) if you have access to newer hardware, please open an issue
EXESS supports from CUDA 11.1 onwards.
EXESS supports the NVHPC toolkit
Note that performance of EXESS is directly correlated with the Double Precision Floating Point performance of the GPU being used.
AMD
EXESS needs the MAGMA library to work on AMD
Performance varies wildly across ROCM versions on AMD GPUs, currently the most stable version is 5.7.0
EXESS is tested and developed for MI250x (gfx90a), other gfx architectures are NOT tested
There’s a current bug in the ROCM runtime that causes large 4 center kernels for gradients to crash the GPU
To address this on AMD, we recommend using RI-HF to circumvent the 4 centre kernels