- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.9k
2.2.8 staging prep #17325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.2.8 staging prep #17325
Conversation
For UIO_ITER, we are just wrapping a kernel iterator. It will take care of its own offsets if necessary. We don't need to do anything, and if we do try to do anything with it (like advancing the iterator by the skip in zfs_uio_advance) we're just confusing the kernel iterator, ending up at the wrong position or worse, off the end of the memory region. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17298 (cherry picked from commit 2ee5b51)
This is a convenience for filesystems that need the inode of their parent or their own name, as its often complicated to get that information. We don't need those things, so this is just detecting which prototype is expected and adjusting our callback to match. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 7ef6b70)
According to the upstream change, all callers set it, and all block devices either honoured it or ignored it, so removing it entirely allows a bunch of handling for the "unset" case to be removed, and it becomes effectively implied. We follow suit, and keep setting it for older kernels. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 2ca91ba)
since 4.10, bio->bi_opf needs to be checked to determine all kinds of flush requests. this was the case prior to the commit referenced below, but the order of ifdefs was not the usual one (newest up top), which might have caused this to slip through. this fixes a regression when using zvols as Qemu block devices, but might have broken other use cases as well. the symptoms are that all sync writes from within a VM configured to use such a virtual block devices are ignored and treated as async writes by the host ZFS layer. this can be verified using fio in sync mode inside the VM, for example with fio \ --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \ --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \ --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \ --numjobs=1 --sync=1 which shows an IOPS number way above what the physical device underneath supports, with "zpool iostat -r 1" on the hypervisor side showing no sync IO occuring during the benchmark. with the regression fixed, both fio inside the VM and the IO stats on the host show the expected numbers. Fixes: 846b598 "config: remove HAVE_REQ_OP_* and HAVE_REQ_*" Signed-off-by: Fabian-Gruenbichler <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Closes openzfs#17131 (cherry picked from commit 41823a0)
It turns out that approach taken in the original version of the patch was wrong. So now, we're taking approach in-line with how kernel actually does it - when sb is being torn down, access to it is serialized via sb->s_umount rwsem, only when that lock is taken is it okay to work with s_flags - and the other mistake I was doing was trying to make SB_ACTIVE work, but apparently the kernel checks the negative variant - not SB_DYING and not SB_BORN. Kernels pre-6.6 don't have SB_DYING, but check if sb is hashed instead. Signed-off-by: Pavel Snajdr <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Closes: openzfs#17121 (cherry picked from commit a0e6271)
Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723 (cherry picked from commit 7b6e967)
The 6.0 kernel removes the 'migratepage' VFS op. Check for migratepage. Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected] Closes openzfs#17217 (cherry picked from commit ab9bb19)
| @tonyhutter I dunno if a 2.2.8 is planned (probably entirely dependent on demand). If it does happen, I'm very happy to assist with testing to get it out the door. | 
| I tried this on Fedora 42 (6.14 kernel) and wasn't able to get it to build. Pulling in 155847c helped, but then I hit:  | 
| 
 I didn't have any plans one way or the other, but I'd be open to a 2.2.8 release. | 
| It seems #17334 would be good too. Though it is not in 2.3 yet. | 
| We might as well include #17353 once it is merged. | 
| What about #17255 ? For what it is worth, I welcome a 2.2.8 release - maybe even as the new default/stable branch for EL 9 (which is at 2.1.16 actually). | 
I've found that QEMU/KVM guest memory accounted as shared also included into NR_FILE_PAGES. But it is actually a non-evictable anonymous memory. Using it as a base for zfs_arc_pc_percent parameter makes ARC to ignore shrinker requests while page cache does not really have anything to evict, ending up in OOM killer killing the QEMU process. Instead use of NR_ACTIVE_FILE + NR_INACTIVE_FILE should represent the part of a page cache that is actually evictable, which should be safer to use as a reference for ARC scaling. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17334 (cherry picked from commit 0aa83dc)
Bisecting identified the redacted send/receive as the source of the bug for issue openzfs#12014. Specifically the call to dsl_dataset_hold_obj(&fromds) has been replaced by dsl_dataset_hold_obj_flags() which passes a DECRYPT flag and creates a key mapping. A subsequent dsl_dataset_rele_flag(&fromds) is missing and the key mapping is not cleared. This may be inadvertedly used, which results in arc_untransform failing with ECKSUM in: arc_untransform+0x96/0xb0 [zfs] dbuf_read_verify_dnode_crypt+0x196/0x350 [zfs] dbuf_read+0x56/0x770 [zfs] dmu_buf_hold_by_dnode+0x4a/0x80 [zfs] zap_lockdir+0x87/0xf0 [zfs] zap_lookup_norm+0x5c/0xd0 [zfs] zap_lookup+0x16/0x20 [zfs] zfs_get_zplprop+0x8d/0x1d0 [zfs] setup_featureflags+0x267/0x2e0 [zfs] dmu_send_impl+0xe7/0xcb0 [zfs] dmu_send_obj+0x265/0x360 [zfs] zfs_ioc_send+0x10c/0x280 [zfs] Fix this by restoring the call to dsl_dataset_hold_obj(). The same applies for to_ds: here replace dsl_dataset_rele(&to_ds) with dsl_dataset_rele_flags(). Both leaked key mappings will cause a panic when exporting the sending pool or unloading the zfs module after a non-raw send from an encrypted filesystem. Contributions-by: Hank Barta <[email protected]> Contributions-by: Paul Dagnelie <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Richard Yao <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#12014 Closes openzfs#17340 (cherry picked from commit ea74cde)
d634d20 had been intended to fix a potential information leak issue where the compiler's optimization passes appeared to remove `memset()` operations that sanitize sensitive data before memory is freed for use by the rest of the kernel. When I wrote it, I had assumed that the compiler would not remove the other `memset()` operations, but upon reflection, I have realized that this was a bad assumption to make. I would rather have a very slight amount of additional overhead when calling `gcm_clear_ctx()` than risk a future compiler remove `memset()` calls. This is likely to happen if someone decides to try doing link time optimization and the person will not think to audit the assembly output for issues like this, so it is best to preempt the possibility before it happens. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Jorgen Lundman <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#17343 (cherry picked from commit d8a33bc)
The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit bb740d6)
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit 841be1d) (cherry picked from commit 1dc9bec)
This was caught when doing a manual check to see if openzfs#17352 needed to be improved to catch mismatches across stack frames of the kind that were first found in openzfs#17340. Reviewed-by: George Amanakis <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#17353 (cherry picked from commit 83fa80a) (cherry picked from commit 07d815f)
With certain combinations of target ARC states balance and ghost hit rates it was possible to get the fractions outside of allowed range. This patch limits maximum balance adjustment speed, which should make it impossible, and also asserts it. Fixes openzfs#17210 Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit b1ccab1) (cherry picked from commit 677cb91)
| All suggestions included, top posted updated. Looks like we're putting a release together! I went back through the ~600 commits on master that aren't on the 2.2 series, and there's a good amount of fixes for leaks, overflows, null derefs and so on. I didn't really want to load up 2.2 this late though, feels like we should be very sparing with on the LTS branch. Anyway, feel free to nominate stuff I guess :) | 
| Could you include 8bf1e83 please too? It fixes some ZTS failures on Debian. | 
| Is #17015 too much for a point stable release? | 
| Merged as: | 
Motivation and Context
Bringing various things back to 2.2. Linux kernel version update, the recent dataset hold fixes, handful of other bits.
Description
Includes:
UIO_ITER#17298How Has This Been Tested?
Compile checked latest release of all LTS kernels back to 4.19, and all 6.x kernels.
Also compile checked on FreeBSD 14.2.
Types of changes
Checklist:
Signed-off-by.