nfld0 segfaults on GPU with GPU-aware comms for NPROC > 2

### What happened?

Of the nfld0 tests on GPU, the mpi2 cases currently fail on ECMWF's Grace-Hopper platform:

```
      Start 183: ectrans-benchmark-gpu-dp_T47_O48_mpi0_omp1_callmode1_nfld0
 1/12 Test #183: ectrans-benchmark-gpu-dp_T47_O48_mpi0_omp1_callmode1_nfld0 ...   Passed    3.61 sec
      Start 188: ectrans-benchmark-gpu-dp_T47_O48_mpi0_omp1_callmode2_nfld0
 2/12 Test #188: ectrans-benchmark-gpu-dp_T47_O48_mpi0_omp1_callmode2_nfld0 ...   Passed    3.87 sec
      Start 193: ectrans-benchmark-gpu-dp_T47_O48_mpi1_omp1_callmode1_nfld0
 3/12 Test #193: ectrans-benchmark-gpu-dp_T47_O48_mpi1_omp1_callmode1_nfld0 ...   Passed    4.33 sec
      Start 198: ectrans-benchmark-gpu-dp_T47_O48_mpi1_omp1_callmode2_nfld0
 4/12 Test #198: ectrans-benchmark-gpu-dp_T47_O48_mpi1_omp1_callmode2_nfld0 ...   Passed    5.15 sec
      Start 203: ectrans-benchmark-gpu-dp_T47_O48_mpi2_omp1_callmode1_nfld0
 5/12 Test #203: ectrans-benchmark-gpu-dp_T47_O48_mpi2_omp1_callmode1_nfld0 ...***Failed    7.42 sec
      Start 208: ectrans-benchmark-gpu-dp_T47_O48_mpi2_omp1_callmode2_nfld0
 6/12 Test #208: ectrans-benchmark-gpu-dp_T47_O48_mpi2_omp1_callmode2_nfld0 ...***Failed    6.88 sec
      Start 213: ectrans-benchmark-gpu-sp_T47_O48_mpi0_omp1_callmode1_nfld0
 7/12 Test #213: ectrans-benchmark-gpu-sp_T47_O48_mpi0_omp1_callmode1_nfld0 ...   Passed   10.15 sec
      Start 218: ectrans-benchmark-gpu-sp_T47_O48_mpi0_omp1_callmode2_nfld0
 8/12 Test #218: ectrans-benchmark-gpu-sp_T47_O48_mpi0_omp1_callmode2_nfld0 ...   Passed    7.47 sec
      Start 223: ectrans-benchmark-gpu-sp_T47_O48_mpi1_omp1_callmode1_nfld0
 9/12 Test #223: ectrans-benchmark-gpu-sp_T47_O48_mpi1_omp1_callmode1_nfld0 ...   Passed   11.22 sec
      Start 228: ectrans-benchmark-gpu-sp_T47_O48_mpi1_omp1_callmode2_nfld0
10/12 Test #228: ectrans-benchmark-gpu-sp_T47_O48_mpi1_omp1_callmode2_nfld0 ...   Passed    5.43 sec
      Start 233: ectrans-benchmark-gpu-sp_T47_O48_mpi2_omp1_callmode1_nfld0
11/12 Test #233: ectrans-benchmark-gpu-sp_T47_O48_mpi2_omp1_callmode1_nfld0 ...***Failed   13.04 sec
      Start 238: ectrans-benchmark-gpu-sp_T47_O48_mpi2_omp1_callmode2_nfld0
12/12 Test #238: ectrans-benchmark-gpu-sp_T47_O48_mpi2_omp1_callmode2_nfld0 ...***Failed    7.40 sec
```

These tests pass when `GPU_AWARE_MPI` is disabled.

The crash is a segfault occurring [here](https://github.com/ecmwf-ifs/ectrans/blob/develop/src/trans/gpu/internal/trltog_mod.F90#L802) (line 802). It can be reproduced by
```
mpiexec -n 2 ./bin/ectrans-benchmark-gpu-dp -t 47 --nfld 0
```
Interestingly, higher resolutions (e.g. `-t 95`) don't show the crash. The lowest resolution I can run without experiencing the segfault is T70.

If I add `--nlev 2` or `--nprtrv 2` the crash goes away. So, probably related to the W-set splitting of grid point arrays.

Our CI suite on ECMWF's AC cluster (A100-based) doesn't show this crash.

Possibilities:
- There is a bug in the NVHPC we're using, 25.9
- There is a bug in the way `ZCOMBUFS` is allocated which only manifests for edge cases such as low resolution and / or minimal fields.

Suspicious:
- The [code for allocating the send buffer of `TRLTOG`](https://github.com/ecmwf-ifs/ectrans/blob/develop/src/trans/gpu/internal/trltog_mod.F90#L42) references `D%NLENGTF` which is a Fourier-space related dimension. This doesn't make sense to me, because by this point in the code we're no longer in Fourier space.

### What are the steps to reproduce the bug?

ECMWF / AG (Grace Hopper cluster)

Currently Loaded Modules:
  1) prgenv/expert   2) nvidia/25.9   3) hpcx-openmpi/2.21.3-cuda:nvidia:25.9   4) fftw/3.3.10:nvidia:25.11   5) cmake/3.31.6

FIAT at version develop:230b015.

Configure with `-DENABLE_GPU=ON -DENABLE_ACC=ON`.

### Version

develop:40d6bc2

### Platform (OS and architecture)

ECMWF / AG

### Relevant log output

```shell

```

### Accompanying data

_No response_

### Organisation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nfld0 segfaults on GPU with GPU-aware comms for NPROC > 2 #362

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nfld0 segfaults on GPU with GPU-aware comms for NPROC > 2 #362

Description

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions