Skip to content

"ACC: ERROR non contiguous transfer" from internal/ltinv_mod.F90:366 #262

@okkevaneck

Description

@okkevaneck

What happened?

Running ectrans on GPU with default arguments works fine.
But as soon as we scale up, we run into problems regarding memory transfers with the following error:

ACC: ERROR non contiguous transfer from ../../../pfs/lustrep2/scratch/project_462000713/ectrans-main-acc/src/ectrans-HEAD^1.6.0.src/build/src/trans/gpu/generated/ectrans_gpu_dp/internal/ltinv_mod.F90:366

This was the error message for double precision, but we observe the same behavior for single precision:

ACC: ERROR non contiguous transfer from ../../../pfs/lustrep2/scratch/project_462000713/ectrans-main-acc/src/ectrans-HEAD^1.6.0.src/build/src/trans/gpu/generated/ectrans_gpu_sp/internal/ltinv_mod.F90:366

This line corresponds to the following call to PRFI1B:

364         IF(PRESENT(PSPSC3A) .AND. NF_SC3A > 0) THEN
365           DO J3=1,UBOUND(PSPSC3A,3)
366             CALL PRFI1B(PSCALARS(IFIRST:IFIRST+2*NF_SC3A-1,:,:),PSPSC3A(:,:,J3),NF_SC3A,UBOUND(PSPSC3A,2))
367             IFIRST  = IFIRST+2*NF_SC3A
368           ENDDO
369         ENDIF

What are the steps to reproduce the bug?

Environment

We are running on LUMI dev-g using 1 node, and load the following modules before the installation and execution:

module load LUMI/24.03 partition/G PrgEnv-cray
module load cpe/24.03 craype-x86-trento craype-accel-amd-gfx90a
module load cray-mpich cray-libsci cray-fftw cray-python
module load buildtools
module load rocm/6.0.3
module load cce/17.0.1

These are different from the ones we found in the GitHub actions, so we also tried those, but unfortunately, both led to the same error.
The ones from GitHub actions for reference:

module load CrayEnv
module load PrgEnv-cray
module load cce/17.0.1
module load craype-accel-amd-gfx90a
module load rocm/6.0.3
module load cray-fftw
module load buildtools

We also set the following environment variables:

export FC=ftn
export FC90=ftn
export CXX=CC
export CC=cc

Installation

The subsections below contain the installation instructions for each module.
Note that we set environment variables for linking OpenMP, which are isolated per module.

Ecbuild

Version: 3.8.5
Installation through git pull, followed by:

mkdir -p build
cd build
cmake .. 
make clean
make -j16
make install

Fiat

Version: 1.4.1
Installation through git pull followed by:

export LDFLAGS="-fopenmp"
mkdir -p build
cd build
$DNB_INSTALL_DIR/ecbuild.bin/bin/ecbuild \
    -DCMAKE_BUILD_TYPE=Release \
    -DENABLE_MPI=ON \
    -DENABLE_OMP=ON \
    -DENABLE_TESTS=OFF \
    .. 
make clean
make -j16
make install

Ectrans

Version: 1.6.1
Installation through git pull followed by:

export LDFLAGS="-fopenmp -lcraymp"
mkdir -p build
cd build
$DNB_INSTALL_DIR/ecbuild.bin/bin/ecbuild \
    -DCMAKE_BUILD_TYPE=Release \
    -Dfiat_ROOT=$DNB_INSTALL_DIR/fiat.bin \
    -DENABLE_TESTS=OFF \
    -DENABLE_SINGLE_PRECISION=ON \
    -DENABLE_SINGLE_PRECISION=ON \
    -DENABLE_MKL=OFF \
    -DENABLE_FFTW=OFF \
    -DENABLE_OMP=OFF \
    -DENABLE_ACC=ON \
    -DENABLE_ACC=ON \
    -DENABLE_GPU_AWARE_MPI=ON \
    .. 
make clean
make -j16
make install

Execution

These are examples of our executions using double precision, but it works the same for single precision.
All executions used 1 node, and the environment described above is loaded beforehand.

When we run with default arguments, the following line is executed:

numactl -l --all --physcpubind=49-55 -- ./bin/ectrans-benchmark-gpu-dp --norms --nlev 137 --vordiv --scders -t 600 --niter 20

The numactl is required on LUMI for optimal bindings.
For our "scaled-up" executions, we run the following:

numactl -l --all --physcpubind=49-55 -- ./bin/ectrans-benchmark-gpu-dp --vordiv --scders --uvders --nfld 1 --norms --niter 10 --nlev 79 --truncation 1279

Version

v1.6.0

Platform (OS and architecture)

LUMI, Cray-OS, running kernel 5.14.21-150500.55.49_13.0.56-cray_shasta_c

Relevant log output

ACC: ERROR non contiguous transfer from ../../../pfs/lustrep2/scratch/project_462000713/ectrans-main-acc/src/ectrans-HEAD^1.6.0.src/build/src/trans/gpu/generated/ectrans_gpu_dp/internal/ltinv_mod.F90:366

Accompanying data

No response

Organisation

Barcelona Supercomputing Center

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions