Skip to content

add big-endian CI run and fix endian issues in test and hdf5var.c#3285

Open
edhartnett wants to merge 40 commits intoUnidata:mainfrom
edhartnett:master
Open

add big-endian CI run and fix endian issues in test and hdf5var.c#3285
edhartnett wants to merge 40 commits intoUnidata:mainfrom
edhartnett:master

Conversation

@edhartnett
Copy link
Contributor

@edhartnett edhartnett commented Feb 21, 2026

Part of #3283
Fixes #3282
Fixes #3286
Fixes #3284
Fixes #3287
Fixes #1687
Fixes #3062
Fixes #1896
Fixes #2770
Fixes #1802
Fixes #2696
Fixes #2670

This adds a big endian workflow to the CI system. It turns up some bugs, so this is just a draft PR until I get the fixes...

Also fix the run_reclaim_tests problem in the autotools build, so that I can run with --disable-utilities.

These changes were generated with AI assistance but with human in the loop (me!) for every decision. I have personally reviewed every line of change in the PR and it is all of benefit to netcdf-c. These changes represent a serious attempt to improve netCDF and are in accordance with project practices and customs.

@edhartnett edhartnett changed the title add big-endian CI run add big-endian CI run and fix endian issues in test and nc4hdf5.c Feb 21, 2026
@edhartnett edhartnett changed the title add big-endian CI run and fix endian issues in test and nc4hdf5.c add big-endian CI run and fix endian issues in test and hdf5var.c Feb 21, 2026
@edhartnett edhartnett marked this pull request as ready for review February 21, 2026 23:36
@WardF
Copy link
Member

WardF commented Feb 23, 2026

Have you tried running the tests like this through a local qemu installation? I'd be curious to know what sort of timings you see. I'm running a test against this PR using docker + --platform linux/s390x to get a rough idea of the timing, given the optimized nature of the qemu engine they use under the hood.

@edhartnett
Copy link
Contributor Author

No, I was happier to get them running on GitHub since it can be made part of the CI, to prevent future regressions.

The endiness bugs were quite subtle.

@WardF
Copy link
Member

WardF commented Feb 23, 2026

I'll need to take a look and check the timing; the last time I ran similar tests through qemu on my home hardware (pre-Docker Desktop licensing) it took around 4 hours. Making changes to the image to install libhdf5 via package manager if requested instead of building from source, and then I'll generate a timing for that as well. The CI is great although if it takes multiple hours to run (one hopes it doesn't, but we'll need to check) then we'll need to evaluate that.

@edhartnett
Copy link
Contributor Author

This CI run? It's not the slowest in the current netcdf-c CI collection. The MacOS runs take longer. This run finishes in about 30 minutes.

It could be sped up by using --disable-utilities, but since it is not the slowest, it seems better to test the utilities on big-endian anyway.

@WardF
Copy link
Member

WardF commented Feb 23, 2026

I see now it ran, thanks. Agreed re: testing the utilities.

@WardF
Copy link
Member

WardF commented Feb 25, 2026

After reviewing this, I'm going to delay merging it; there are still some test failures if we enable ncZarr and DAP testing. For an easy way to examine these locally, assuming you have access to Docker Desktop, you can run the following (assuming you are in the top level netcdf-c/ directory, in a *nix shell):

$ docker run --rm -it -v $(pwd):/netcdf-c -e USE_BUILDSYSTEM=autotools -e H5PACKAGE=TRUE --platform linux/s390x unidata/nctests

The H5PACKAGE environmental variable means we will install libhdf5 from the package manager. You can also use this docker image interactively if you pass --entrypoint bash as an argument; if you do this, you can run the run_tests.sh script in the top level directory you find yourself in, and then investigate the results in the directory pointed to by the working_directory symlink created after run_tests.sh completes.

@edhartnett
Copy link
Contributor Author

edhartnett commented Feb 26, 2026

I may be misunderstanding something, but all tests are passing with this PR. Are there additional tests or configurations you’re running that I might not be aware of?

This fix is entirely within the netCDF-4 codebase and is unrelated to Zarr or DAP. If you have a moment, please take a look at the specific line in hdf5var.c. It’s actually correcting an old mistake of mine from many years ago—one I’ve been trying to track down and resolve properly.

I’m aware there have been CI instabilities, particularly on Windows platforms. Many of those issues are being addressed in other open PRs. However, some of those fixes will need to be merged in order to stabilize CI. In several cases, the failures appear to be dependency-related bugs that surface under parallel execution, especially on slower platforms like Windows.

If you’re seeing CI instability, a good first step would be merging #3273. That PR addresses a dependency issue where a test script can run concurrently with the two programs that generate the file it depends on. The fix ensures the build system waits for those programs to complete before running the test script.

I’m trying to understand any concerns that are preventing this PR from being merged. Since this issue stems from my earlier mistake, I’d really appreciate the opportunity to correct it. I’m confident this is the right fix, and I’m happy to address any questions or concerns you may have.

Thank you for taking the time to review it.

@WardF
Copy link
Member

WardF commented Feb 26, 2026

Ed, the PR disables nczarr and dap tests; if you re-enable them, you'll see that big-endian related issues persist. The change in hdf5var.c is a good one, but it is misleading to say this PR fixes big endian issues. We should enable all of the tests we wish to run, and then introduce a PR which fixes all the test failures we are observing.

@edhartnett
Copy link
Contributor Author

edhartnett commented Feb 26, 2026

OK, but this PR does not claim to fix ALL endianness issues, just the HDF5 ones, and that is a worthy goal. So none of the Zarr or DAP endianness issues are closed by this PR. Note that the PR does not disable any existing testing, it adds big-endian testing which was not present before.

The Zarr and DAP problems are totally separate issues. Once we have this PR merged and the big endian test for HDF5 working, I'll be happy to swing by again and take a look at the zarr and dap code.

It's hard to solve all bugs at once. This PR solves a big bug, and closes many issues. It's certainly a big step forward. Let's get it merged and them I will move on to the other big-endian issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants