What happened?
Running ectrans on GPU with default arguments works fine.
But as soon as we scale up, we run into problems regarding memory transfers with the following error:
ACC: ERROR non contiguous transfer from ../../../pfs/lustrep2/scratch/project_462000713/ectrans-main-acc/src/ectrans-HEAD^1.6.0.src/build/src/trans/gpu/generated/ectrans_gpu_dp/internal/ltinv_mod.F90:366
This was the error message for double precision, but we observe the same behavior for single precision:
ACC: ERROR non contiguous transfer from ../../../pfs/lustrep2/scratch/project_462000713/ectrans-main-acc/src/ectrans-HEAD^1.6.0.src/build/src/trans/gpu/generated/ectrans_gpu_sp/internal/ltinv_mod.F90:366
This line corresponds to the following call to PRFI1B:
364 IF(PRESENT(PSPSC3A) .AND. NF_SC3A > 0) THEN
365 DO J3=1,UBOUND(PSPSC3A,3)
366 CALL PRFI1B(PSCALARS(IFIRST:IFIRST+2*NF_SC3A-1,:,:),PSPSC3A(:,:,J3),NF_SC3A,UBOUND(PSPSC3A,2))
367 IFIRST = IFIRST+2*NF_SC3A
368 ENDDO
369 ENDIF
What are the steps to reproduce the bug?
Environment
We are running on LUMI dev-g using 1 node, and load the following modules before the installation and execution:
module load LUMI/24.03 partition/G PrgEnv-cray
module load cpe/24.03 craype-x86-trento craype-accel-amd-gfx90a
module load cray-mpich cray-libsci cray-fftw cray-python
module load buildtools
module load rocm/6.0.3
module load cce/17.0.1
These are different from the ones we found in the GitHub actions, so we also tried those, but unfortunately, both led to the same error.
The ones from GitHub actions for reference:
module load CrayEnv
module load PrgEnv-cray
module load cce/17.0.1
module load craype-accel-amd-gfx90a
module load rocm/6.0.3
module load cray-fftw
module load buildtools
We also set the following environment variables:
export FC=ftn
export FC90=ftn
export CXX=CC
export CC=cc
Installation
The subsections below contain the installation instructions for each module.
Note that we set environment variables for linking OpenMP, which are isolated per module.
Ecbuild
Version: 3.8.5
Installation through git pull, followed by:
mkdir -p build
cd build
cmake ..
make clean
make -j16
make install
Fiat
Version: 1.4.1
Installation through git pull followed by:
export LDFLAGS="-fopenmp"
mkdir -p build
cd build
$DNB_INSTALL_DIR/ecbuild.bin/bin/ecbuild \
-DCMAKE_BUILD_TYPE=Release \
-DENABLE_MPI=ON \
-DENABLE_OMP=ON \
-DENABLE_TESTS=OFF \
..
make clean
make -j16
make install
Ectrans
Version: 1.6.1
Installation through git pull followed by:
export LDFLAGS="-fopenmp -lcraymp"
mkdir -p build
cd build
$DNB_INSTALL_DIR/ecbuild.bin/bin/ecbuild \
-DCMAKE_BUILD_TYPE=Release \
-Dfiat_ROOT=$DNB_INSTALL_DIR/fiat.bin \
-DENABLE_TESTS=OFF \
-DENABLE_SINGLE_PRECISION=ON \
-DENABLE_SINGLE_PRECISION=ON \
-DENABLE_MKL=OFF \
-DENABLE_FFTW=OFF \
-DENABLE_OMP=OFF \
-DENABLE_ACC=ON \
-DENABLE_ACC=ON \
-DENABLE_GPU_AWARE_MPI=ON \
..
make clean
make -j16
make install
Execution
These are examples of our executions using double precision, but it works the same for single precision.
All executions used 1 node, and the environment described above is loaded beforehand.
When we run with default arguments, the following line is executed:
numactl -l --all --physcpubind=49-55 -- ./bin/ectrans-benchmark-gpu-dp --norms --nlev 137 --vordiv --scders -t 600 --niter 20
The numactl is required on LUMI for optimal bindings.
For our "scaled-up" executions, we run the following:
numactl -l --all --physcpubind=49-55 -- ./bin/ectrans-benchmark-gpu-dp --vordiv --scders --uvders --nfld 1 --norms --niter 10 --nlev 79 --truncation 1279
Version
v1.6.0
Platform (OS and architecture)
LUMI, Cray-OS, running kernel 5.14.21-150500.55.49_13.0.56-cray_shasta_c
Relevant log output
ACC: ERROR non contiguous transfer from ../../../pfs/lustrep2/scratch/project_462000713/ectrans-main-acc/src/ectrans-HEAD^1.6.0.src/build/src/trans/gpu/generated/ectrans_gpu_dp/internal/ltinv_mod.F90:366
Accompanying data
No response
Organisation
Barcelona Supercomputing Center
What happened?
Running ectrans on GPU with default arguments works fine.
But as soon as we scale up, we run into problems regarding memory transfers with the following error:
This was the error message for double precision, but we observe the same behavior for single precision:
This line corresponds to the following call to
PRFI1B:What are the steps to reproduce the bug?
Environment
We are running on LUMI
dev-gusing 1 node, and load the following modules before the installation and execution:These are different from the ones we found in the GitHub actions, so we also tried those, but unfortunately, both led to the same error.
The ones from GitHub actions for reference:
We also set the following environment variables:
Installation
The subsections below contain the installation instructions for each module.
Note that we set environment variables for linking OpenMP, which are isolated per module.
Ecbuild
Version: 3.8.5
Installation through
git pull, followed by:mkdir -p build cd build cmake .. make clean make -j16 make installFiat
Version: 1.4.1
Installation through
git pullfollowed by:Ectrans
Version: 1.6.1
Installation through
git pullfollowed by:Execution
These are examples of our executions using double precision, but it works the same for single precision.
All executions used 1 node, and the environment described above is loaded beforehand.
When we run with default arguments, the following line is executed:
The
numactlis required on LUMI for optimal bindings.For our "scaled-up" executions, we run the following:
Version
v1.6.0
Platform (OS and architecture)
LUMI, Cray-OS, running kernel 5.14.21-150500.55.49_13.0.56-cray_shasta_c
Relevant log output
Accompanying data
No response
Organisation
Barcelona Supercomputing Center