zfs-2.4.2 patchset by tonyhutter · Pull Request #18462 · openzfs/zfs

tonyhutter · 2026-04-24T23:44:39Z

Motivation and Context

Proposed patchset for zfs-2.4.2

Description

7.0 kernel support
CI updates
dRAID fixes

How Has This Been Tested?

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes openzfs#18077

The upcoming 7.0 kernel will no longer fall back to generic_setlease(), instead returning EINVAL if .setlease is NULL. So, we set it explicitly. To ensure that we catch any future kernel change, adds a sanity test for F_SETLEASE and F_GETLEASE too. Since this is a Linux-specific test, also a small adjustment to the test runner to allow OS-specific helper programs. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18215

On systems where `$kernelsrc` is different than `$kernelbuild`, the objtool binary will be located in `$kernelbuild` as it's the result of running `make prepare` during kernel build. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Louis Leseur <louis.leseur@gmail.com> Closes openzfs#18248 Closes openzfs#18249

It does exactly the same thing, just inverts the return. Detect its presence or absence and call the right one. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18216

Kernel devs noted that almost all callers to posix_acl_to_xattr() would check the ACL value size and allocate a buffer before make the call. To reduce the repetition, they've changed it to allocate this buffer internally and return it. Unfortunately that's not true for us; most of our calls are from xattr_handler->get() to convert a stored ACL to an xattr, and that call provides a buffer. For now we have no other option, so this commit detects the new version and wraps to copy the value back into the provided buffer and then free it. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18216

The traditional mount API has been removed, so detect when its not available and instead use a small adapter to allow our existing mount functions to keep working. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18216

Fedora 41 was deprecated on Dec 15 2025. Remove it from CI tests. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#18261

This API has been available since kernel 5.2, and having it available (almost) everywhere should give us a lot more flexibility for mount management in the future. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18260

Checking for LD_VERSION in unreliable as not all distros define it on the compiler's preprocessor. Explicitly check it via autoconf. This fixes support for Ubuntu 18.04 on arm64. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Juhyung Park <qkrwngud825@gmail.com> Closes openzfs#18262

Provide intuitive log search keywords and increased system consistency. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org> Closes openzfs#18290

When a namespace property is changed via zfs set, libzfs remounts the filesystem to propagate the new VFS mount flags. The current approach uses mount(2) with MS_REMOUNT, which reads all namespace properties from ZFS and applies them together. This has two problems: 1. Linux VFS resets unspecified per-mount flags on remount. If an administrator sets a temporary flag (e.g. mount -o remount,noatime), a subsequent zfs set on any namespace property clobbers it. 2. Two concurrent zfs set operations on different namespace properties can overwrite each other's mount flags. Additionally, legacy datasets (mountpoint=legacy) were never remounted on namespace property changes since zfs_is_mountable() returns false for them. Add zfs_mount_setattr() which uses mount_setattr(2) to selectively update only the mount flags that correspond to the changed property. For legacy datasets, /proc/mounts is iterated to update all mountpoints. On kernels without mount_setattr (ENOSYS), non-legacy datasets fall back to a full remount; legacy mounts are skipped to avoid clobbering temporary flags. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes openzfs#18257

The autoconf checks are more than enough to decide whether or not we can work with this kernel or not. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18295

Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18295

vdev_rebuild() is always called with spa_config_lock held in RW_WRITER mode. However, when it tries to call dmu_tx_assign() the latter may hang on dmu_tx_wait() waiting for available txg. But that available txg may not happen because txg_sync takes spa_config_lock in order to process the current txg. So we have a deadlock case here: - dmu_tx_assign() waits for txg holding spa_config_lock; - txg_sync waits for spa_config_lock not progressing with txg. Here are the stacks: __schedule+0x24e/0x590 schedule+0x69/0x110 cv_wait_common+0xf8/0x130 [spl] __cv_wait+0x15/0x20 [spl] dmu_tx_wait+0x8e/0x1e0 [zfs] dmu_tx_assign+0x49/0x80 [zfs] vdev_rebuild_initiate+0x39/0xc0 [zfs] vdev_rebuild+0x84/0x90 [zfs] spa_vdev_attach+0x305/0x680 [zfs] zfs_ioc_vdev_attach+0xc7/0xe0 [zfs] cv_wait_common+0xf8/0x130 [spl] __cv_wait+0x15/0x20 [spl] spa_config_enter+0xf9/0x120 [zfs] spa_sync+0x6d/0x5b0 [zfs] txg_sync_thread+0x266/0x2f0 [zfs] The solution is to pass txg returned by spa_vdev_enter(spa) at the top of spa_vdev_attach() to vdev_rebuild() and call dmu_tx_create_assigned(txg) which doesn't wait for txg. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Closes openzfs#18210 Closes openzfs#18258

Currently, when there there are several faulted disks with attached dRAID spares, and one of those disks is cleared from errors (zpool clear), followed by its spare being detached, the data in all the remaining spares that were attached while the cleared disk was in FAULTED state might get corrupted (which can be seen by running scrub). In some cases, when too many disks get cleared at a time, this can result in data corruption/loss. dRAID spare is a virtual device whose blocks are distributed among other disks. Those disks can be also in FAULTED state with attached spares on their own. When a disk gets sequentially resilvered (rebuilt), the changes made by that resilvering won't get captured in the DTL (Dirty Time Log) of other FAULTED disks with the attached spares to which the data is written during the resilvering (as it would normally be done for the changes made by the user if a new file is written or some existing one is deleted). It is because sequential resilvering works on the block level, without touching or looking into metadata, so it doesn't know anything about the old BPs or transactions groups that it is resilvering. So later on, when that disk gets cleared from errors and healing resilvering is trying to sync all the data from its spare onto it, all the changes made on its spare during the resilvering of other disks will be missed because they won't be captured in its DTL. That's why other dRAID spares may get corrupted. Here's another way to explain it that might be helpful. Imagine a scenario: 1. d1 fails and gets resilvered to some spare s1 - OK. 2. d2 fails and gets sequentially resilvered on draid spare s2. Now, in some slices, s2 would map to d1, which is failed. But d1 has s1 spare attached, so the data from that resilvering goes to s1, but not recorded in d1's DTL. 3. Now, d1 gets cleared and its s1 gets detached. All the changes done by the user (writes or deletions) have their txgs captured in d1's DTL, so they will be resilvered by the healing resilver from its spare (s1) - that part works fine. But the data which was written during resilvering of d2 and went to s1 - that one will be missed from d1's DTL and won't get resilvered to it. So here we are: 4. s2 under d2 is corrupted in the slices which map to d1, because d1 doesn't have that data resilvered from s1. Now, if there are more failed disks with draid spares attached which were sequentially resilvered while d1 was failed, d3+s3, d4+s4 and so on - all their spares will be corrupted. Because, in some slices, each of them will map to d1 which will miss their data. Solution: add all known txgs starting from TXG_INITIAL to DTLs of non-writable devices during sequential resilvering so when healing resilver starts on disk clear, it would be able to check and heal blocks from all txgs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Closes openzfs#18286 Closes openzfs#18294

statx(2) requires _GNU_SOURCE to be defined in order for sys/stat.h to produce a definition for struct statx and the STATX_* defines. We get that at compile time because we pass -D_GNU_SOURCE through to everything, but in the configure check we aren't setting _GNU_SOURCE, so we don't find STATX_MNT_ID, and so don't set HAVE_STATX_MNT_ID. (This was fine before ccf5a8a, because linux/stat.h does not require _GNU_SOURCE). Simple fix: in the check, define _GNU_SOURCE before including sys/stat.h. Sponsored-by: TrueNAS Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18312

Update the redundancy_draid_spare1 exception to reference an issue which describes the failure. Remove the exception for the redundancy_draid_spare3 test. I have not observed it in local testing. If it reproduces in the CI we can create a new issue for it and put back the exception. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#18308

Observed again in the CI. Put the maybe exception back in place and reference a newly created issue for this sporadic failure. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#18320

It turns out the kernel can also take directory leases, most notably in the NFS server. Without a setlease handler on the directory file ops, attempts to open a directory over NFS can fail with EINVAL. Adding a directory setlease handler was missed in 168023b. This fixes that, allowing directories to be properly accessed over NFS. Sponsored-by: TrueNAS Reported-by: Satadru Pramanik <satadru@gmail.com> Signed-off-by: Rob Norris <rob.norris@truenas.com> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18339

Removing the HAVE_FS_CONTEXT gates and anything that would be used if it wasn't set. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18339

With the old API gone, there's no need to massage new-style calls into its shape and call another function; we can just make those handlers work directly. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18339

Target of opportunity; with no other callers, there's no need for it to be a static function. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18339

Allow restricting ZTS OS targets by setting the vars.ZTS_OS_OVERRIDE repository variable (e.g. '["debian13"]') to reduce shared runner contention when running the full OS matrix is unnecessary. When unset, the existing ci_type-based OS selection is used unchanged. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes openzfs#18342

Do a ZFS build inside of an ARM runner. This only does a simple build, it does not run the test suite. The build runs on the runner itself rather than in a VM, since nesting is not supported on Github ARM runners. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#18343

…fs#18348) (openzfs#18354) This function was removed in c6442bd: "Removing old code outside of 4.18 kernsls", but fails at present on PowerPC builds due to the recent inclusion of 6bc9c0a90522: "powerpc: fix KUAP warning in VMX usercopy path" in the upstream kernel, which introduces a use of cpu_feature_keys[], which is a GPL-only symbol. Removing the API check as it doesn't appear necessary. Signed-off-by: John Cabaj <john.cabaj@canonical.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>

Update FreeBSD CI targets from 14.3 to 14.4 in both the QEMU start script and the workflow configuration. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes openzfs#18362

Normally, kernel gives any LSM registering a `sb_eat_lsm_opts` hook a first look at mount options coming in from a userspace mount request. The LSM may process and/or remove any options. Whatever is left is passed to the filesystem. This is how the dataset properties `context`, `fscontext`, `defcontext` and `rootcontext` are used to configure ZFS mounts for SELinux. libzfs will fetch those properties from the dataset, then add them to the mount options. In 0f608aa (openzfs#18216) we added our own mount shims to cover the loss of the kernel-provided ones. It turns out that if a filesystem provides a `.parse_monolithic callback`, it is expected to do _all_ mount option parameter processing - the kernel will not get involved at all. Because of that, LSMs are never given a chance to process mount options. The `context` properties are never seen by SELinux, nor are any other options targetting other LSMs. Fix this by calling `security_sb_eat_lsm_opts()` in `zpl_parse_monolithic()`, before we stash the remaining options for `zfs_domount()`. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18376

Free 35GB of unused files, mostly from unused development environments. This helps with the out of disk space problems we were seeing on FreeBSD runners. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#18400

Similar to FreeBSD stop issuing prefetches on POSIX_FADV_SEQUENTIAL. It should not have this semantics, only hint speculative prefetcher, if access ever happen later. Instead after POSIX_FADV_WILLNEED handling call generic_fadvise(), if available, to do all the generic stuff, including setting f_mode in struct file, that we could later use to control prefetcher as part of read/write operations. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes openzfs#18395

When sequentially resilvering allow a dRAID child to be read as long as the DTLs indicate it should have a good copy of the data and the leaf isn't being rebuilt. The previous check was slightly too broad and would skip dRAID spare and replacing vdevs if one of their children was being replaced. As long as there exists enough additional redundancy this is fine, but when there isn't this vdev must be read in order to correctly reconstruct the missing data. A new test case has been added which exhausts the available redundancy, faults another device causing it to be degraded, and then performs a sequential resilver for the degraded device. In such a situation enough redundancy exists to perform the replacement and a scrub should detect no checksum errors. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#18405

ztest can enable and disable the multihost property when testing. This can result in a failure when attempting to import an existing pool when multihost=on but no /etc/hostid file exists. Update the workflow to use zgenhostid to create /etc/hostid when not present. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#18413

The GH artifacts action now lets you disable auto-zipping your artifacts. Previously, GH would always automatically put your artifacts in a ZIP file. This is annoying when your artifacts are already in a tarball. Also update the following action versions checkout: v4 -> v6 upload-artifact: v4 -> v7 download-artifact: v4 -> v8 Lastly, fix a issue where zfs-qmeu-packages now needs to power cycle the VM. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#18411

Currently, when more than nparity disks get faulted during the rebuild, only first nparity disks would go to faulted state, and all the remaining disks would go to degraded state. When a hot spare is attached to that degraded disk for rebuild creating the spare mirror, only that hot spare is getting rebuilt, but not the degraded device. So when later during scrub some other attached draid spare happens to map to that spare, it will end up with cksum error. Moreover, if the user clears the degraded disk from errors, the data won't be resilvered to it, hot spare will be detached almost immediately and the data that was resilvered only to it will be lost. Solution: write to all mirrored devices during rebuild, similar to traditional/healing resilvering, but only if we can verify the integrity of the data, or when it's the draid spare we are writing to, in which case we are writing to a reserved spare space, and there is no danger to overwrite any good data. The argument that writing only to rebuilding draid spare vdev is faster than writing to normal device doesn't hold since, at a specific offset being rebuilt, draid spare will be mapped to a normal device anyway. redundancy_draid_degraded2 automation test is added also to cover the scenario. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andriy Tkachuk <atkachuk@wasabi.com> Closes openzfs#18414

Replace semicolons with && so build failures are not masked by the subsequent lockfile cleanup. Use trap to ensure the lockfile is removed on both success and failure. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes openzfs#18206 Closes openzfs#18424

Update the META file to reflect compatibility with the 7.0 kernel. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#18435

We've seen some qemu-1-setup failures while trying to change the runner's block device scheduler value to 'none': We have a single 150GB block device Setting up swapspace version 1, size = 16 GiB (17179865088 bytes) no label, UUID=7a790bfe-79e5-4e38-b208-9c63fe523294 tee: '/sys/block/s*/queue/scheduler': No such file or directory Luckily, we don't need to set the scheduler anymore on modern kernels: openzfs#9778 (comment) This commit just removes the code that sets the scheduler. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#18437

When a VM fails to launch or is unreachable the qemu-7-prepare.sh script will fail to collect the artifacts due to the missing vm* directories. We want to collect as much diagnostic information as possible, when missing create the directory to allow the subsequent steps to proceed normally. Additionally, we don't want to fail if the /tmp/summary.txt file is missing. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#18438

- Remove line where we disable stdout at the end of qemu-1-setup.sh - Fix comment switching the 2x75GB -> 1x150GB cases - Add some more debug to the end of the script Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#18441

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Alan Somers <asomers@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@axcient.com> Closes openzfs#18403

When getzfsvfs() succeeds (incrementing s_active via zfs_vfs_ref()), but z_unmounted is subsequently found to be B_TRUE, zfsvfs_hold() returns EBUSY without calling zfs_vfs_rele(). This permanently leaks the VFS superblock s_active reference, preventing generic_shutdown_super() from ever firing, which blocks dmu_objset_disown() and makes the pool permanently unexportable (EBUSY). Add the missing zfs_vfs_rele() call, guarded by zfs_vfs_held() to handle the zfsvfs_create() fallback path where no VFS reference exists. This matches the existing cleanup pattern in zfsvfs_rele(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: mischivus <1205832+mischivus@users.noreply.github.com> Closes openzfs#18309 Closes openzfs#18310

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes openzfs#18415

zfsctl_snapshot_mount() holds z_teardown_lock(R) across call_usermodehelper(), which spawns a mount process that needs namespace_sem(W) via move_mount. Reading /proc/self/mountinfo holds namespace_sem(R) and needs z_teardown_lock(R) via zpl_show_devname. When zfs_suspend_fs (from zfs recv or zfs rollback) queues z_teardown_lock(W), the rrwlock blocks new readers, completing the deadlock cycle. Fix by releasing z_teardown_lock(R) after gathering the dataset name and mount path, before any blocking operation. Everything after the release operates on local string copies or uses its own synchronization. The parent zfsvfs pointer remains valid because the caller holds a path reference to the automount trigger dentry. Releasing the lock allows zfs_suspend_fs to proceed concurrently with the mount helper, so dmu_objset_hold in zpl_get_tree can transiently fail with ENOENT during the clone swap. The mount helper fails, EISDIR is returned, and the VFS falls back to the ctldir stub (empty directory) until the next access retries. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes openzfs#18415

When copy_file_range overwrites a recent truncation, subsequent reads can incorrectly determine that it is read hole instead of reading the cloned blocks. This can happen when the following conditions are met: - Truncate adds blkid to dn_free_ranges - A new TXG is created - copy_file_range calls dmu_brt_clone which override the block pointer and set DB_NOFILL - Subsequent read, given DB_NOFILL, hits dbuf_read_impl and dbuf_read_hole - dbuf_read_hole calls dnode_block_freed, which returns TRUE because the truncated blkids are still in dn_free_ranges This will not happen if the clone and truncate are in the same TXG, because the block clone would update the current TXG's dn_free_ranges, which is why this bug only triggers under high IO load (such as compilation). Fix this by skipping the dnode_block_freed call if the block is overridden. The fix shouldn't cause an issue when the cloned block is subsequently freed in later TXGs, as dbuf_undirty would remove the override. This requires a dedicated test program as it is much harder to trigger with scripts (this needs to generate a lot of I/O in short period of time for the bug to trigger reliably). Assisted-by: Gemini:gemini-3.1-pro Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Gary Guo <gary@kernel.org> Closes openzfs#18412 Closes openzfs#18421

The Makefile.am files from libshare, libtpool, libunicode, and libuutil do not have SPDX lines. This is because those Makefiles only got SPDX lines after the big Makefile merge in commits like 309006a and 0d44b58 (which have not been ported to this branch). Add the Makefiles to the whitelist here so spdxcheck.pl passes. Signed-off-by: Tony Hutter <hutter2@llnl.gov>

satmandu · 2026-04-25T13:57:17Z

Apologies if I'm missing that these have already been included, but I've been using these patches on top of 2.4.1 with Linux kernel 7.0 without issues:

Linux 7.0: also set setlease handler on directories #18331
range_tree: use zfs_panic_recover() for partial-overlap remove #18255
prevent range tree corruption race by updating dnode_sync() #18235

I've also been using these on top of 2.4.1 with Linux kernel 7.0, but I can also see these being held back for the next major release:

Bridge speculative and prescient prefetchers #18351
Add support for POSIX_FADV_DONTNEED #18399 (But I guess this has maybe been causing issues on FreeBSD as per #18459 )

These more recent bug fixes might also be nice to have:

draid: fix cksum errors after rebuild with degraded disks #18414
dmu_direct: avoid UAF in dmu_write_direct_done() #18440
Fix 'kernel BUG at mm/usercopy.c' #18408

zfs_range_tree_remove_impl() used a bare panic() when a segment to be removed was not completely overlapped by an existing tree entry. Every other consistency check in range_tree.c uses zfs_panic_recover(), which respects the zfs_recover tunable and allows pools with on-disk corruption to be imported and recovered. This one call was inconsistent, making the partial-overlap case unrecoverable regardless of zfs_recover. Replace panic() with zfs_panic_recover() so that operators can set zfs_recover=1 to import a corrupted pool and reclaim data, consistent with all other range tree error paths. Related-to: openzfs#13483 Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Closes openzfs#18255

Switch to incremental range tree processing in dnode_sync() to avoid unsafe lock dropping during zfs_range_tree_walk(). This also ensures the free ranges remain visible to dnode_block_freed() throughout the sync process, preventing potential stale data reads. This patch: - Keeps the range tree attached during processing for visibility. - Processes segments one-by-one by restarting from the tree head. - Uses zfs_range_tree_clear() to safely handle ranges that may have been modified while the lock was dropped. - adds ASSERT()s to document that we don't expect dn_free_ranges modification outside of sync context. Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@axcient.com> Issue openzfs#18186 Closes openzfs#18235

dmu_write_direct_done() passes dmu_sync_arg_t to dmu_sync_done(), which updates the override state and frees the completion context. The Direct I/O error path then still dereferences dsa->dsa_tx while rolling the dirty record back with dbuf_undirty(), resulting in a use-after-free. Save dsa->dsa_tx in a local variable before calling dmu_sync_done() and use that saved tx for the error rollback. This preserves the existing ownership model for dsa and does not change the Direct I/O write semantics. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: gality369 <gality369@example.com> Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Closes openzfs#18440

Fix a bug where an cgroup-OOM-killed process can cause a panic: usercopy: Kernel memory exposure attempt detected from vmalloc (offset 1007584, size 217120)! kernel BUG at mm/usercopy.c:102! This was caused by zfs_uiomove() not correctly returning EFAULT for short copies. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#15918 Closes openzfs#18408

- We've seen occasional 'ERROR 502: Bad Gateway' from the runner trying to download an image with axel. Axel can open multiple connections for a faster download, so maybe that's causing problems. This commit adds in a fallback to curl if the axel download doesn't work. - Update merge_summary.awk to print out killed tests in the summary. We've seen cases where the summary page was red but there were no test failures printed. This is because one of the VMs had too may killed tests, which caused the total test time to run too long and caused the runner to timeout qemu-6-test.sh. When the runner kills off qemu-6-tests.sh, it means we never generate the nice summary page for that VM listing the killed off tests. This commit parses the partial test logs for killed off tests and includes them in the merge_summary.awk output. - Print an error message in the summary page if one of the VMs didn't complete ZTS. This helps draw attention to a VM crash. - FreeBSD sometimes has broken links to their CI image. When that happens, select the newest nightly snapshot image as an alternative. This is needed right now, since the current images in the FreeBSD 16 "current/" directory are returning 404 errors. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes openzfs#18460

Automake's default tar formats (v7 pre-1.18, ustar since) impose path length limits that drop several long test filenames from the release tarball when `make dist` runs. Pax format has no such limit and is read by GNU tar 1.14+ and libarchive/bsdtar. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes: openzfs#17276 Closes: openzfs#18465

META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>

satmandu · 2026-04-28T00:39:01Z

Looks like #18469 fixes a regression introduced by 2b930f6, which is included in this patchset...

tonyhutter · 2026-04-28T01:04:16Z

@satmandu thanks for the heads-up on 2b930f6. 1644e2f may have issues as well: (#18421 (comment)). Both of those are in the proposed 2.3.7 and 2.4.2 patchsets, so we'll hold the releases until those get resolved.

robn and others added 30 commits April 23, 2026 14:29

build: add SPDX license tags to build system files

fc44c73

Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes openzfs#18077

spdxcheck: enforce SPDX license tags on build system files

4234660

Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes openzfs#18077

README: describe specific kernels/distros we target

12cd6ff

Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18295

ZTS: Add back redundancy_draid_spare3 exception

1bc9225

Observed again in the CI. Put the maybe exception back in place and reference a newly created issue for this sporadic failure. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#18320

config: refuse to build without fs_context

bec56a4

Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18339

linux/super: remove support for old mount API

0edbfbf

Removing the HAVE_FS_CONTEXT gates and anything that would be used if it wasn't set. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes openzfs#18339

tonyhutter and others added 17 commits April 23, 2026 14:59

Linux 7.0 compat: META

eb3331a

Update the META file to reflect compatibility with the 7.0 kernel. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#18435

fix memleak in spa_errlog.c

aba3ed3

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Alan Somers <asomers@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@axcient.com> Closes openzfs#18403

Fix options memory leak in zfsctl_snapshot_mount

5d56935

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes openzfs#18415

clefru and others added 7 commits April 27, 2026 10:57

Tag zfs-2.4.2

602ed78

META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>

tonyhutter force-pushed the zfs-2.4.2-staging branch from 6ce9e2c to 602ed78 Compare April 27, 2026 18:06

satmandu mentioned this pull request Apr 28, 2026

libspl/mnttab: follow symlinks when resolving path via statx #18469

Merged

14 tasks

andrey-podko mentioned this pull request Apr 30, 2026

Linux 7.1 compat #18471

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs-2.4.2 patchset#18462

zfs-2.4.2 patchset#18462
tonyhutter wants to merge 62 commits intoopenzfs:zfs-2.4-releasefrom
tonyhutter:zfs-2.4.2-staging

tonyhutter commented Apr 24, 2026

Uh oh!

satmandu commented Apr 25, 2026

Uh oh!

satmandu commented Apr 28, 2026

Uh oh!

tonyhutter commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Conversation

tonyhutter commented Apr 24, 2026

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

satmandu commented Apr 25, 2026

Uh oh!

satmandu commented Apr 28, 2026

Uh oh!

tonyhutter commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants