summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
47 hoursMerge tag 'v6.17-rc-part1-smb3-client-fixes' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull smb client updates from Steve French: - Fix network namespace refcount leak - Multichannel reconnect fix - Perf improvement to not do unneeded EA query on native symlinks - Performance improvement for directory leases to allow extending lease for actively queried directories - Improve debugging of directory leases by adding pseudofile to show them - Five minor mount cleanup patches - Minor directory lease cleanup patch - Allow creating special files via reparse points over SMB1 - Two minor improvements to FindFirst over SMB1 - Two NTLMSSP session setup fixes * tag 'v6.17-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3 client: add way to show directory leases for improved debugging smb: client: get rid of kstrdup() when parsing iocharset mount option smb: client: get rid of kstrdup() when parsing domain mount option smb: client: get rid of kstrdup() when parsing pass2 mount option smb: client: get rid of kstrdup() when parsing pass mount option smb: client: get rid of kstrdup() when parsing user mount option cifs: Add support for creating reparse points over SMB1 cifs: Do not query WSL EAs for native SMB symlink cifs: Optimize CIFSFindFirst() response when not searching cifs: Fix calling CIFSFindFirst() for root path without msearch smb: client: fix session setup against servers that require SPN smb: client: allow parsing zero-length AV pairs cifs: add new field to track the last access time of cfid smb: change return type of cached_dir_lease_break() to bool cifs: reset iface weights when we cannot find a candidate smb: client: fix netns refcount leak after net_passive changes
2 daysMerge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
2 daysMerge tag 'fsnotify_for_v6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "A couple of small improvements for fsnotify subsystem. The most interesting is probably Amir's change modifying the meaning of fsnotify fmode bits (and I spell it out specifically because I know you care about those). There's no change for the common cases of no fsnotify watches or no permission event watches. But when there are permission watches (either for open or for pre-content events) but no FAN_ACCESS_PERM watch (which nobody uses in practice) we are now able optimize away unnecessary cache loads from the read path" * tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook samples: fix building fs-monitor on musl systems fanotify: sanitize handle_type values when reporting fid
2 daysMerge tag 'jfs-6.17' of github.com:kleikamp/linux-shaggyLinus Torvalds
Pull jfs updates from Dave Kleikamp: "Fixes and cleanups for JFS filesystem" * tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy: jfs: fix metapage reference count leak in dbAllocCtl jfs: stop using write_cache_pages jfs: truncate good inode pages when hard link is 0 jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage() jfs: Regular file corruption check jfs: upper bound check of tree index in dbAllocAG
2 daysMerge tag 'for-linus-6.17-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Fixes for string handling in debugfs and sysfs: - change scnprintf to sysfs_emit in sysfs code. - change sprintf to scnprintf in debugfs code. - refactor debugfs mask-to-string code for readability and slightly improved functionality" * tag 'for-linus-6.17-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: fs/orangefs: Allow 2 more characters in do_c_string() fs: orangefs: replace scnprintf() with sysfs_emit() fs/orangefs: use snprintf() instead of sprintf()
2 daysMerge tag 'ubifs-for-linus-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: "UBIFS: - No longer use write_cache_pages() UBI: - Remove an unused function" * tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: stop using write_cache_pages mtd: ubi: Remove unused ubi_flush
2 daysMerge tag 'ext4_for_linus_6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major ext4 changes for 6.17: - Better scalability for ext4 block allocation - Fix insufficient credits when writing back large folios Miscellaneous bug fixes, especially when handling exteded attriutes, inline data, and fast commit" * tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (39 commits) ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr ext4: implement linear-like traversal across order xarrays ext4: refactor choose group to scan group ext4: convert free groups order lists to xarrays ext4: factor out ext4_mb_scan_group() ext4: factor out ext4_mb_might_prefetch() ext4: factor out __ext4_mb_scan_group() ext4: fix largest free orders lists corruption on mb_optimize_scan switch ext4: fix zombie groups in average fragment size lists ext4: merge freed extent with existing extents before insertion ext4: convert sbi->s_mb_free_pending to atomic_t ext4: fix typo in CR_GOAL_LEN_SLOW comment ext4: get rid of some obsolete EXT4_MB_HINT flags ext4: utilize multiple global goals to reduce contention ext4: remove unnecessary s_md_lock on update s_mb_last_group ext4: remove unnecessary s_mb_last_start ext4: separate stream goal hits from s_bal_goals for better tracking ext4: add ext4_try_lock_group() to skip busy groups ext4: initialize superblock fields in the kballoc-test.c kunit tests ext4: refactor the inline directory conversion and new directory codepaths ...
3 dayssmb3 client: add way to show directory leases for improved debuggingSteve French
When looking at performance issues around directory caching, or debugging directory lease issues, it is helpful to be able to display the current directory leases (as we can e.g. or open files). Create pseudo-file /proc/fs/cifs/open_dirs that displays current directory leases. Here is sample output: cat /proc/fs/cifs/open_dirs Version:1 Format: <tree id> <sess id> <persistent fid> <path> Num entries: 3 0xce4c1c68 0x7176aa54 0xd95ef58e \dira valid file info, valid dirents 0xce4c1c68 0x7176aa54 0xd031e211 \dir5 valid file info, valid dirents 0xce4c1c68 0x7176aa54 0x96533a90 \dir1 valid file info Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysMerge tag 'bpf-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Remove usermode driver (UMD) framework (Thomas Weißschuh) - Introduce Strongly Connected Component (SCC) in the verifier to detect loops and refine register liveness (Eduard Zingerman) - Allow 'void *' cast using bpf_rdonly_cast() and corresponding '__arg_untrusted' for global function parameters (Eduard Zingerman) - Improve precision for BPF_ADD and BPF_SUB operations in the verifier (Harishankar Vishwanathan) - Teach the verifier that constant pointer to a map cannot be NULL (Ihor Solodrai) - Introduce BPF streams for error reporting of various conditions detected by BPF runtime (Kumar Kartikeya Dwivedi) - Teach the verifier to insert runtime speculation barrier (lfence on x86) to mitigate speculative execution instead of rejecting the programs (Luis Gerhorst) - Various improvements for 'veristat' (Mykyta Yatsenko) - For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to improve bug detection by syzbot (Paul Chaignon) - Support BPF private stack on arm64 (Puranjay Mohan) - Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's node (Song Liu) - Introduce kfuncs for read-only string opreations (Viktor Malik) - Implement show_fdinfo() for bpf_links (Tao Chen) - Reduce verifier's stack consumption (Yonghong Song) - Implement mprog API for cgroup-bpf programs (Yonghong Song) * tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits) selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite selftests/bpf: Add selftest for attaching tracing programs to functions in deny list bpf: Add log for attaching tracing programs to functions in deny list bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions bpf: Fix various typos in verifier.c comments bpf: Add third round of bounds deduction selftests/bpf: Test invariants on JSLT crossing sign selftests/bpf: Test cross-sign 64bits range refinement selftests/bpf: Update reg_bound range refinement logic bpf: Improve bounds when s64 crosses sign boundary bpf: Simplify bounds refinement from s32 selftests/bpf: Enable private stack tests for arm64 bpf, arm64: JIT support for private stack bpf: Move bpf_jit_get_prog_name() to core.c bpf, arm64: Fix fp initialization for exception boundary umd: Remove usermode driver framework bpf/preload: Don't select USERMODE_DRIVER selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure selftests/bpf: Increase xdp data size for arm64 64K page size ...
3 daysMerge tag 'net-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
4 daysMerge tag 'driver-core-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...
5 daysjfs: fix metapage reference count leak in dbAllocCtlZheng Yu
In dbAllocCtl(), read_metapage() increases the reference count of the metapage. However, when dp->tree.budmin < 0, the function returns -EIO without calling release_metapage() to decrease the reference count, leading to a memory leak. Add release_metapage(mp) before the error return to properly manage the metapage reference count and prevent the leak. Fixes: a5f5e4698f8abbb25fe4959814093fb5bfa1aa9d ("jfs: fix shift-out-of-bounds in dbSplit") Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
5 daysMerge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds
Pull fscrypt updates from Eric Biggers: "Simplify how fscrypt uses the crypto API, resulting in some significant performance improvements: - Drop the incomplete and problematic support for asynchronous algorithms. These drivers are bug-prone, and it turns out they are actually much slower than the CPU-based code as well. - Allocate crypto requests on the stack instead of the heap. This improves encryption and decryption performance, especially for filenames. This also eliminates a point of failure during I/O" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: ceph: Remove gfp_t argument from ceph_fscrypt_encrypt_*() fscrypt: Remove gfp_t argument from fscrypt_encrypt_block_inplace() fscrypt: Remove gfp_t argument from fscrypt_crypt_data_unit() fscrypt: Switch to sync_skcipher and on-stack requests fscrypt: Drop FORBID_WEAK_KEYS flag for AES-ECB fscrypt: Don't use asynchronous CryptoAPI algorithms fscrypt: Don't use problematic non-inline crypto engines fscrypt: Drop obsolete recommendation to enable optimized SHA-512 fscrypt: Explicitly include <linux/export.h>
5 daysMerge tag 'libcrypto-conversions-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library conversions from Eric Biggers: "Convert fsverity and apparmor to use the SHA-2 library functions instead of crypto_shash. This is simpler and also slightly faster" * tag 'libcrypto-conversions-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: fsverity: Switch from crypto_shash to SHA-2 library fsverity: Explicitly include <linux/export.h> apparmor: use SHA-256 library API instead of crypto_shash API
5 daysMerge tag 'crc-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: - Reorganize the architecture-optimized CRC code It now lives in lib/crc/$(SRCARCH)/ rather than arch/$(SRCARCH)/lib/, and it is no longer artificially split into separate generic and arch modules. This allows better inlining and dead code elimination The generic CRC code is also no longer exported, simplifying the API. (This mirrors the similar changes to SHA-1 and SHA-2 in lib/crypto/, which can be found in the "Crypto library updates" pull request) - Improve crc32c() performance on newer x86_64 CPUs on long messages by enabling the VPCLMULQDQ optimized code - Simplify the crypto_shash wrappers for crc32_le() and crc32c() Register just one shash algorithm for each that uses the (fully optimized) library functions, instead of unnecessarily providing direct access to the generic CRC code - Remove unused and obsolete drivers for hardware CRC engines - Remove CRC-32 combination functions that are no longer used - Add kerneldoc for crc32_le(), crc32_be(), and crc32c() - Convert the crc32() macro to an inline function * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (26 commits) lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial lib/crc: x86: Reorganize crc-pclmul static_call initialization lib/crc: crc64: Add include/linux/crc64.h to kernel-api.rst lib/crc: crc32: Change crc32() from macro to inline function and remove cast nvmem: layouts: Switch from crc32() to crc32_le() lib/crc: crc32: Document crc32_le(), crc32_be(), and crc32c() lib/crc: Explicitly include <linux/export.h> lib/crc: Remove ARCH_HAS_* kconfig symbols lib/crc: x86: Migrate optimized CRC code into lib/crc/ lib/crc: sparc: Migrate optimized CRC code into lib/crc/ lib/crc: s390: Migrate optimized CRC code into lib/crc/ lib/crc: riscv: Migrate optimized CRC code into lib/crc/ lib/crc: powerpc: Migrate optimized CRC code into lib/crc/ lib/crc: mips: Migrate optimized CRC code into lib/crc/ lib/crc: loongarch: Migrate optimized CRC code into lib/crc/ lib/crc: arm64: Migrate optimized CRC code into lib/crc/ lib/crc: arm: Migrate optimized CRC code into lib/crc/ lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/ lib/crc: Move files into lib/crc/ lib/crc32: Remove unused combination support ...
5 daysMerge tag 'hardening-v6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Introduce and start using TRAILING_OVERLAP() helper for fixing embedded flex array instances (Gustavo A. R. Silva) - mux: Convert mux_control_ops to a flex array member in mux_chip (Thorsten Blum) - string: Group str_has_prefix() and strstarts() (Andy Shevchenko) - Remove KCOV instrumentation from __init and __head (Ritesh Harjani, Kees Cook) - Refactor and rename stackleak feature to support Clang - Add KUnit test for seq_buf API - Fix KUnit fortify test under LTO * tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits) sched/task_stack: Add missing const qualifier to end_of_stack() kstack_erase: Support Clang stack depth tracking kstack_erase: Add -mgeneral-regs-only to silence Clang warnings init.h: Disable sanitizer coverage for __init and __head kstack_erase: Disable kstack_erase for all of arm compressed boot code x86: Handle KCOV __init vs inline mismatches arm64: Handle KCOV __init vs inline mismatches s390: Handle KCOV __init vs inline mismatches arm: Handle KCOV __init vs inline mismatches mips: Handle KCOV __init vs inline mismatch powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON configs/hardening: Enable CONFIG_KSTACK_ERASE stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth stackleak: Rename STACKLEAK to KSTACK_ERASE seq_buf: Introduce KUnit tests string: Group str_has_prefix() and strstarts() kunit/fortify: Add back "volatile" for sizeof() constants acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings ...
5 daysMerge tag 'execve-v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Introduce regular REGSET note macros arch-wide (Dave Martin) - Remove arbitrary 4K limitation of program header size (Yin Fengwei) - Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi) * tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (25 commits) fork: reorder function qualifiers for copy_clone_args_from_user binfmt_elf: remove the 4k limitation of program header size binfmt_elf: Warn on missing or suspicious regset note names xtensa: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names um: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names x86/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names sparc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names sh: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names s390/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names riscv: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names powerpc/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names parisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names openrisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names nios2: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names MIPS: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names m68k: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names ...
5 daysMerge tag 'zonefs-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs update from Damien Le Moal: - Use ZONEFS_SUPER_SIZE instead of PAGE_SIZE to read from disk the super block (Johannes). * tag 'zonefs-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: use ZONEFS_SUPER_SIZE instead of PAGE_SIZE
5 daysMerge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - MD pull request via Yu: - call del_gendisk synchronously (Xiao) - cleanup unused variable (John) - cleanup workqueue flags (Ryo) - fix faulty rdev can't be removed during resync (Qixing) - NVMe pull request via Christoph: - try PCIe function level reset on init failure (Keith Busch) - log TLS handshake failures at error level (Maurizio Lombardi) - pci-epf: do not complete commands twice if nvmet_req_init() fails (Rick Wertenbroek) - misc cleanups (Alok Tiwari) - Removal of the pktcdvd driver This has been more than a decade coming at this point, and some recently revealed breakages that had it causing issues even for cases where it isn't required made me re-pull the trigger on this one. It's known broken and nobody has stepped up to maintain the code - Series for ublk supporting batch commands, enabling the use of multishot where appropriate - Speed up ublk exit handling - Fix for the two-stage elevator fixing which could leak data - Convert NVMe to use the new IOVA based API - Increase default max transfer size to something more reasonable - Series fixing write operations on zoned DM devices - Add tracepoints for zoned block device operations - Prep series working towards improving blk-mq queue management in the presence of isolated CPUs - Don't allow updating of the block size of a loop device that is currently under exclusively ownership/open - Set chunk sectors from stacked device stripe size and use it for the atomic write size limit - Switch to folios in bcache read_super() - Fix for CD-ROM MRW exit flush handling - Various tweaks, fixes, and cleanups * tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits) block: restore two stage elevator switch while running nr_hw_queue update cdrom: Call cdrom_mrw_exit from cdrom_release function sunvdc: Balance device refcount in vdc_port_mpgroup_check nvme-pci: try function level reset on init failure dm: split write BIOs on zone boundaries when zone append is not emulated block: use chunk_sectors when evaluating stacked atomic write limits dm-stripe: limit chunk_sectors to the stripe size md/raid10: set chunk_sectors limit md/raid0: set chunk_sectors limit block: sanitize chunk_sectors for atomic write limits ilog2: add max_pow_of_two_factor() nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails nvme-tcp: log TLS handshake failures at error level docs: nvme: fix grammar in nvme-pci-endpoint-target.rst nvme: fix typo in status code constant for self-test in progress nvmet: remove redundant assignment of error code in nvmet_ns_enable() nvme: fix incorrect variable in io cqes error message nvme: fix multiple spelling and grammar issues in host drivers block: fix blk_zone_append_update_request_bio() kernel-doc md/raid10: fix set but not used variable in sync_request_write() ...
5 daysMerge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring updates from Jens Axboe: - Optimization to avoid reference counts on non-cloned registered buffers. This is how these buffers were handled prior to having cloning support, and we can still use that approach as long as the buffers haven't been cloned to another ring. - Cleanup and improvement for uring_cmd, where btrfs was the only user of storing allocated data for the lifetime of the uring_cmd. Clean that up so we can get rid of the need to do that. - Avoid unnecessary memory copies in uring_cmd usage. This is particularly important as a lot of uring_cmd usage necessitates the use of 128b SQEs. - A few updates for recv multishot, where it's now possible to add fairness limits for limiting how much is transferred for each retry loop. Additionally, recv multishot now supports an overall cap as well, where once reached the multishot recv will terminate. The latter is useful for buffer management and juggling many recv streams at the same time. - Add support for returning the TX timestamps via a new socket command. This feature can work in either singleshot or multishot mode, where the latter triggers a completion whenever new timestamps are available. This is an alternative to using the existing error queue. - Add support for an io_uring "mock" file, which is the start of being able to do 100% targeted testing in terms of exercising io_uring request handling. The idea is to have a file type that can be anything the tester would like, and behave exactly how you want it to behave in terms of hitting the code paths you want. - Improve zcrx by using sgtables to de-duplicate and improve dma address handling. - Prep work for supporting larger pages for zcrx. - Various little improvements and fixes. * tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits) io_uring/zcrx: fix leaking pages on sg init fail io_uring/zcrx: don't leak pages on account failure io_uring/zcrx: fix null ifq on area destruction io_uring: fix breakage in EXPERT menu io_uring/cmd: remove struct io_uring_cmd_data btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag io_uring/zcrx: account area memory io_uring: export io_[un]account_mem io_uring/net: Support multishot receive len cap io_uring: deduplicate wakeup handling io_uring/net: cast min_not_zero() type io_uring/poll: cleanup apoll freeing io_uring/net: allow multishot receive per-invocation cap io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags io_uring/net: use passed in 'len' in io_recv_buf_select() io_uring/zcrx: prepare fallback for larger pages io_uring/zcrx: assert area type in io_zcrx_iov_page io_uring/zcrx: allocate sgtable for umem areas io_uring/zcrx: introduce io_populate_area_dma ...
5 daysMerge tag 'v6.17-rc-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server updates from Steve French: - Fix mtime/ctime reporting issue - Auth fixes, including two session setup race bugs reported by ZDI - Locking improvement in query directory - Fix for potential deadlock in creating hardlinks - Improvements to path name processing * tag 'v6.17-rc-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix corrupted mtime and ctime in smb2_open ksmbd: fix Preauh_HashValue race condition ksmbd: check return value of xa_store() in krb5_authenticate ksmbd: fix null pointer dereference error in generate_encryptionkey smb/server: add ksmbd_vfs_kern_path() smb/server: avoid deadlock when linking with ReplaceIfExists smb/server: simplify ksmbd_vfs_kern_path_locked() smb/server: use lookup_one_unlocked()
5 daysMerge tag 'hfs-v6.17-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs Pull hfs/hfsplus updates from Viacheslav Dubeyko: "Johannes Thumshirn has made nice cleanup in hfsplus_submit_bio(). Tetsuo Handa has fixed the syzbot reported issue in hfsplus_create_attributes_file() for the case of corruption the Attributes File's metadata. Yangtao Li has fixed the syzbot reported issue by removing the uneccessary WARN_ON() in hfsplus_free_extents(). Other fixes: - restore generic/001 successful execution by erasing deleted b-tree nodes - eliminate slab-out-of-bounds issue in hfs_bnode_read() and hfsplus_bnode_read() by checking correctness of offset and length when accessing b-tree node contents - eliminate slab-out-of-bounds read in hfsplus_uni2asc() if the b-tree node record has corrupted length of a name that could be bigger than HFSPLUS_MAX_STRLEN - eliminate general protection fault in hfs_find_init() for the case of initial b-tree object creation" * tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs: hfs: fix general protection fault in hfs_find_init() hfs: fix slab-out-of-bounds in hfs_bnode_read() hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read() hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc() hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file() hfsplus: don't set REQ_SYNC for hfsplus_submit_bio() hfsplus: remove mutex_lock check in hfsplus_free_extents hfs: make splice write available again hfsplus: make splice write available again hfs: fix not erasing deleted b-tree node issue
5 daysMerge tag 'fs_for_v6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull udf and ext2 updates from Jan Kara: "A few udf and ext2 fixes and cleanups" * tag 'fs_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Verify partition map count udf: stop using write_cache_pages ext2: Handle fiemap on empty files to prevent EINVAL
5 daysfuse: remove page alignment check for writeback lenJoanne Koong
Remove incorrect page alignment check for the writeback len arg in fuse_iomap_writeback_range(). len will always be block-aligned as passed in by iomap. On regular fuse filesystems, i_blkbits is set to PAGE_SHIFT so this is not a problem but for fuseblk filesystems, the block size is set to a default of 512 bytes or a block size passed in at mount time. Please note that non-page-aligned lengths are fine for the logic in fuse_iomap_writeback_range(). The check was originally added as a safeguard to detect conspicuously wrong ranges. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Fixes: ef7e7cbb323f ("fuse: use iomap for writeback") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/linux-fsdevel/CA+G9fYs5AdVM-T2Tf3LciNCwLZEHetcnSkHsjZajVwwpM2HmJw@mail.gmail.com/ Reported-by: Sasha Levin <sashal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 daysMerge tag 'vfs-6.17-rc1.iomap' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: - Refactor the iomap writeback code and split the generic and ioend/bio based writeback code. There are two methods that define the split between the generic writeback code, and the implemementation of it, and all knowledge of ioends and bios now sits below that layer. - Add fuse iomap support for buffered writes and dirty folio writeback. This is needed so that granular uptodate and dirty tracking can be used in fuse when large folios are enabled. This has two big advantages. For writes, instead of the entire folio needing to be read into the page cache, only the relevant portions need to be. For writeback, only the dirty portions need to be written back instead of the entire folio. * tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fuse: refactor writeback to use iomap_writepage_ctx inode fuse: hook into iomap for invalidating and checking partial uptodateness fuse: use iomap for folio laundering fuse: use iomap for writeback fuse: use iomap for buffered writes iomap: build the writeback code without CONFIG_BLOCK iomap: add read_folio_range() handler for buffered writes iomap: improve argument passing to iomap_read_folio_sync iomap: replace iomap_folio_ops with iomap_write_ops iomap: export iomap_writeback_folio iomap: move folio_unlock out of iomap_writeback_folio iomap: rename iomap_writepage_map to iomap_writeback_folio iomap: move all ioend handling to ioend.c iomap: add public helpers for uptodate state manipulation iomap: hide ioends from the generic writeback code iomap: refactor the writeback interface iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks iomap: pass more arguments using the iomap writeback context iomap: header diet
5 daysMerge tag 'vfs-6.17-rc1.super' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock callback update from Christian Brauner: "Currently all filesystems which implement super_operations::shutdown() can not afford losing a device. Thus fs_bdev_mark_dead() will just call the ->shutdown() callback for the involved filesystem. But it will no longer be the case, as multi-device filesystems like btrfs can handle certain device loss without the need to shutdown the whole filesystem. To allow those multi-device filesystems to be integrated to use fs_holder_ops: - Add a new super_operations::remove_bdev() callback - Try ->remove_bdev() callback first inside fs_bdev_mark_dead(). If the callback returned 0, meaning the fs can handling the device loss, then exit without doing anything else. If there is no such callback or the callback returned non-zero value, continue to shutdown the filesystem as usual. This means the new remove_bdev() should only do the check on whether the operation can continue, and if so do the fs specific handlings. The shutdown handling should still be handled by the existing ->shutdown() callback. For all existing filesystems with shutdown callback, there is no change to the code nor behavior. Btrfs is going to implement both the ->remove_bdev() and ->shutdown() callbacks soon" * tag 'vfs-6.17-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add a new remove_bdev() callback
5 daysMerge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
5 daysMerge tag 'vfs-6.17-rc1.bpf' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs bpf updates from Christian Brauner: "These changes allow bpf to read extended attributes from cgroupfs. This is useful in redirecting AF_UNIX socket connections based on cgroup membership of the socket. One use-case is the ability to implement log namespaces in systemd so services and containers are redirected to different journals" * tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/kernfs: test xattr retrieval selftests/bpf: Add tests for bpf_cgroup_read_xattr bpf: Mark cgroup_subsys_state->cgroup RCU safe bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node kernfs: remove iattr_mutex
5 daysMerge tag 'vfs-6.17-rc1.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: - persistent info Persist exit and coredump information independent of whether anyone currently holds a pidfd for the struct pid. The current scheme allocated pidfs dentries on-demand repeatedly. This scheme is reaching it's limits as it makes it impossible to pin information that needs to be available after the task has exited or coredumped and that should not be lost simply because the pidfd got closed temporarily. The next opener should still see the stashed information. This is also a prerequisite for supporting extended attributes on pidfds to allow attaching meta information to them. If someone opens a pidfd for a struct pid a pidfs dentry is allocated and stashed in pid->stashed. Once the last pidfd for the struct pid is closed the pidfs dentry is released and removed from pid->stashed. So if 10 callers create a pidfs dentry for the same struct pid sequentially, i.e., each closing the pidfd before the other creates a new one then a new pidfs dentry is allocated every time. Because multiple tasks acquiring and releasing a pidfd for the same struct pid can race with each another a task may still find a valid pidfs entry from the previous task in pid->stashed and reuse it. Or it might find a dead dentry in there and fail to reuse it and so stashes a new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry but only one will succeed, the other ones will put their dentry. The current scheme aims to ensure that a pidfs dentry for a struct pid can only be created if the task is still alive or if a pidfs dentry already existed before the task was reaped and so exit information has been was stashed in the pidfs inode. That's great except that it's buggy. If a pidfs dentry is stashed in pid->stashed after pidfs_exit() but before __unhash_process() is called we will return a pidfd for a reaped task without exit information being available. The pidfds_pid_valid() check does not guard against this race as it doens't sync at all with pidfs_exit(). The pid_has_task() check might be successful simply because we're before __unhash_process() but after pidfs_exit(). Introduce a new scheme where the lifetime of information associated with a pidfs entry (coredump and exit information) isn't bound to the lifetime of the pidfs inode but the struct pid itself. The first time a pidfs dentry is allocated for a struct pid a struct pidfs_attr will be allocated which will be used to store exit and coredump information. If all pidfs for the pidfs dentry are closed the dentry and inode can be cleaned up but the struct pidfs_attr will stick until the struct pid itself is freed. This will ensure minimal memory usage while persisting relevant information. The new scheme has various advantages. First, it allows to close the race where we end up handing out a pidfd for a reaped task for which no exit information is available. Second, it minimizes memory usage. Third, it allows to remove complex lifetime tracking via dentries when registering a struct pid with pidfs. There's no need to get or put a reference. Instead, the lifetime of exit and coredump information associated with a struct pid is bound to the lifetime of struct pid itself. - extended attributes Now that we have a way to persist information for pidfs dentries we can start supporting extended attributes on pidfds. This will allow userspace to attach meta information to tasks. One natural extension would be to introduce a custom pidfs.* extended attribute space and allow for the inheritance of extended attributes across fork() and exec(). The first simple scheme will allow privileged userspace to set trusted extended attributes on pidfs inodes. - Allow autonomous pidfs file handles Various filesystems such as pidfs and drm support opening file handles without having to require a file descriptor to identify the filesystem. The filesystem are global single instances and can be trivially identified solely on the information encoded in the file handle. This makes it possible to not have to keep or acquire a sentinal file descriptor just to pass it to open_by_handle_at() to identify the filesystem. That's especially useful when such sentinel file descriptor cannot or should not be acquired. For pidfs this means a file handle can function as full replacement for storing a pid in a file. Instead a file handle can be stored and reopened purely based on the file handle. Such autonomous file handles can be opened with or without specifying a a file descriptor. If no proper file descriptor is used the FD_PIDFS_ROOT sentinel must be passed. This allows us to define further special negative fd sentinels in the future. Userspace can trivially test for support by trying to open the file handle with an invalid file descriptor. - Allow pidfds for reaped tasks with SCM_PIDFD messages This is a logical continuation of the earlier work to create pidfds for reaped tasks through the SO_PEERPIDFD socket option merged in 923ea4d4482b ("Merge patch series "net, pidfs: enable handing out pidfds for reaped sk->sk_peer_pid""). - Two minor fixes: * Fold fs_struct->{lock,seq} into a seqlock * Don't bother with path_{get,put}() in unix_open_file() * tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits) don't bother with path_get()/path_put() in unix_open_file() fold fs_struct->{lock,seq} into a seqlock selftests: net: extend SCM_PIDFD test to cover stale pidfds af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD af_unix: stash pidfs dentry when needed af_unix/scm: fix whitespace errors af_unix: introduce and use scm_replace_pid() helper af_unix: introduce unix_skb_to_scm helper af_unix: rework unix_maybe_add_creds() to allow sleep selftests/pidfd: decode pidfd file handles withou having to specify an fd fhandle, pidfs: support open_by_handle_at() purely based on file handle uapi/fcntl: add FD_PIDFS_ROOT uapi/fcntl: add FD_INVALID fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP uapi/fcntl: mark range as reserved fhandle: reflow get_path_anchor() pidfs: add pidfs_root_path() helper fhandle: rename to get_path_anchor() fhandle: hoist copy_from_user() above get_path_from_fd() fhandle: raise FILEID_IS_DIR in handle_type ...
5 daysMerge tag 'vfs-6.17-rc1.mmap_prepare' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_*mmap() users to .mmap_prepare() fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
5 daysMerge tag 'vfs-6.17-rc1.fallocate' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fallocate updates from Christian Brauner: "fallocate() currently supports creating preallocated files efficiently. However, on most filesystems fallocate() will preallocate blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified. The extent state must later be converted to a written state when the user writes data into this range, which can trigger numerous metadata changes and journal I/O. This may leads to significant write amplification and performance degradation in synchronous write mode. At the moment, the only method to avoid this is to create an empty file and write zero data into it (for example, using 'dd' with a large block size). However, this method is slow and consumes a considerable amount of disk bandwidth. Now that more and more flash-based storage devices are available it is possible to efficiently write zeros to SSDs using the unmap write zeroes command if the devices do not write physical zeroes to the media. For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support the DEAC bit[1], the write zeroes command does not write actual data to the device, instead, NVMe converts the zeroed range to a deallocated state, which works fast and consumes almost no disk write bandwidth. This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices. fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a way that subsequent writes to that range do not require further changes to the file mapping metadata. This flag is beneficial for subsequent pure overwriting within this range, as it can save on block allocation and, consequently, significant metadata changes" * tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ext4: add FALLOC_FL_WRITE_ZEROES support block: add FALLOC_FL_WRITE_ZEROES support block: factor out common part in blkdev_fallocate() fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate dm: clear unmap write zeroes limits when disabling write zeroes scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP nvmet: set WZDS and DRB if device enables unmap write zeroes operation nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits
5 daysMerge tag 'vfs-6.17-rc1.async.dir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull async directory updates from Christian Brauner: "This contains preparatory changes for the asynchronous directory locking scheme. While the locking scheme is still very much controversial and we're still far away from landing any actual changes in that area the preparatory work that we've been upstreaming for a while now has been very useful. This is another set of minor changes and cleanups" * tag 'vfs-6.17-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: exportfs: use lookup_one_unlocked() coda: use iterate_dir() in coda_readdir() VFS: Minor fixes for porting.rst VFS: merge lookup_one_qstr_excl_raw() back into lookup_one_qstr_excl()
5 daysMerge tag 'vfs-6.17-rc1.nsfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains namespace updates. This time specifically for nsfs: - Userspace heavily relies on the root inode numbers for namespaces to identify the initial namespaces. That's already a hard dependency. So we cannot change that anymore. Move the initial inode numbers to a public header and align the only two namespaces that currently don't do that with all the other namespaces. - The root inode of /proc having a fixed inode number has been part of the core kernel ABI since its inception, and recently some userspace programs (mainly container runtimes) have started to explicitly depend on this behaviour. The main reason this is useful to userspace is that by checking that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is PROCFS_ROOT_INO, they can then use openat2() together with RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a bind-mount that replaces some procfs file with a different one. This kind of attack has lead to security issues in container runtimes in the past (such as CVE-2019-19921) and libraries like libpathrs[1] use this feature of procfs to provide safe procfs handling functions" * tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: uapi: export PROCFS_ROOT_INO mntns: use stable inode number for initial mount ns netns: use stable inode number for initial mount ns nsfs: move root inode number to uapi
5 daysMerge tag 'vfs-6.17-rc1.ovl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull overlayfs updates from Christian Brauner: "This contains overlayfs updates for this cycle. The changes for overlayfs in here are primarily focussed on preparing for some proposed changes to directory locking. Overlayfs currently will sometimes lock a directory on the upper filesystem and do a few different things while holding the lock. This is incompatible with the new potential scheme. This series narrows the region of code protected by the directory lock, taking it multiple times when necessary. This theoretically opens up the possibilty of other changes happening on the upper filesytem between the unlock and the lock. To some extent the patches guard against that by checking the dentries still have the expect parent after retaking the lock. In general, concurrent changes to the upper and lower filesystems aren't supported properly anyway" * tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) ovl: properly print correct variable ovl: rename ovl_cleanup_unlocked() to ovl_cleanup() ovl: change ovl_create_real() to receive dentry parent ovl: narrow locking in ovl_check_rename_whiteout() ovl: narrow locking in ovl_whiteout() ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed ovl: narrow locking on ovl_remove_and_whiteout() ovl: change ovl_workdir_cleanup() to take dir lock as needed. ovl: narrow locking in ovl_workdir_cleanup_recurse() ovl: narrow locking in ovl_indexdir_cleanup() ovl: narrow locking in ovl_workdir_create() ovl: narrow locking in ovl_cleanup_index() ovl: narrow locking in ovl_cleanup_whiteouts() ovl: narrow locking in ovl_rename() ovl: simplify gotos in ovl_rename() ovl: narrow locking in ovl_create_over_whiteout() ovl: narrow locking in ovl_clear_empty() ovl: narrow locking in ovl_create_upper() ovl: narrow the locked region in ovl_copy_up_workdir() ovl: Call ovl_create_temp() without lock held. ...
5 daysMerge tag 'vfs-6.17-rc1.coredump' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull coredump updates from Christian Brauner: "This contains an extension to the coredump socket and a proper rework of the coredump code. - This extends the coredump socket to allow the coredump server to tell the kernel how to process individual coredumps. This allows for fine-grained coredump management. Userspace can decide to just let the kernel write out the coredump, or generate the coredump itself, or just reject it. * COREDUMP_KERNEL The kernel will write the coredump data to the socket. * COREDUMP_USERSPACE The kernel will not write coredump data but will indicate to the parent that a coredump has been generated. This is used when userspace generates its own coredumps. * COREDUMP_REJECT The kernel will skip generating a coredump for this task. * COREDUMP_WAIT The kernel will prevent the task from exiting until the coredump server has shutdown the socket connection. The flexible coredump socket can be enabled by using the "@@" prefix instead of the single "@" prefix for the regular coredump socket: @@/run/systemd/coredump.socket - Cleanup the coredump code properly while we have to touch it anyway. Split out each coredump mode in a separate helper so it's easy to grasp what is going on and make the code easier to follow. The core coredump function should now be very trivial to follow" * tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits) cleanup: add a scoped version of CLASS() coredump: add coredump_skip() helper coredump: avoid pointless variable coredump: order auto cleanup variables at the top coredump: add coredump_cleanup() coredump: auto cleanup prepare_creds() cred: add auto cleanup method coredump: directly return coredump: auto cleanup argv coredump: add coredump_write() coredump: use a single helper for the socket coredump: move pipe specific file check into coredump_pipe() coredump: split pipe coredumping into coredump_pipe() coredump: move core_pipe_count to global variable coredump: prepare to simplify exit paths coredump: split file coredumping into coredump_file() coredump: rename do_coredump() to vfs_coredump() selftests/coredump: make sure invalid paths are rejected coredump: validate socket path in coredump_parse() coredump: don't allow ".." in coredump socket path ...
5 daysMerge tag 'vfs-6.17-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc VFS updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add ext4 IOCB_DONTCACHE support This refactors the address_space_operations write_begin() and write_end() callbacks to take const struct kiocb * as their first argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the filesystem's buffered I/O path. Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and advertises support via the FOP_DONTCACHE file operation flag. Additionally, the i915 driver's shmem write paths are updated to bypass the legacy write_begin/write_end interface in favor of directly calling write_iter() with a constructed synchronous kiocb. Another i915 change replaces a manual write loop with kernel_write() during GEM shmem object creation. Cleanups: - don't duplicate vfs_open() in kernel_file_open() - proc_fd_getattr(): don't bother with S_ISDIR() check - fs/ecryptfs: replace snprintf with sysfs_emit in show function - vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() - filelock: add new locks_wake_up_waiter() helper - fs: Remove three arguments from block_write_end() - VFS: change old_dir and new_dir in struct renamedata to dentrys - netfs: Remove unused declaration netfs_queue_write_request() Fixes: - eventpoll: Fix semi-unbounded recursion - eventpoll: fix sphinx documentation build warning - fs/read_write: Fix spelling typo - fs: annotate data race between poll_schedule_timeout() and pollwake() - fs/pipe: set FMODE_NOWAIT in create_pipe_files() - docs/vfs: update references to i_mutex to i_rwsem - fs/buffer: remove comment about hard sectorsize - fs/buffer: remove the min and max limit checks in __getblk_slow() - fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable - fs_context: fix parameter name in infofc() macro - fs: Prevent file descriptor table allocations exceeding INT_MAX" * tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits) netfs: Remove unused declaration netfs_queue_write_request() eventpoll: fix sphinx documentation build warning ext4: support uncached buffered I/O mm/pagemap: add write_begin_get_folio() helper function fs: change write_begin/write_end interface to take struct kiocb * drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter drm/i915: Use kernel_write() in shmem object create eventpoll: Fix semi-unbounded recursion vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable fs/buffer: remove the min and max limit checks in __getblk_slow() fs: Prevent file descriptor table allocations exceeding INT_MAX fs: Remove three arguments from block_write_end() fs/ecryptfs: replace snprintf with sysfs_emit in show function fs: annotate suspected data race between poll_schedule_timeout() and pollwake() docs/vfs: update references to i_mutex to i_rwsem fs/buffer: remove comment about hard sectorsize fs_context: fix parameter name in infofc() macro VFS: change old_dir and new_dir in struct renamedata to dentrys proc_fd_getattr(): don't bother with S_ISDIR() check ...
5 daysMerge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs mount updates from Al Viro: - mount hash conflicts rudiments are gone now - we do not allow multiple mounts with the same parent/mountpoint to be hashed at the same time. - 'struct mount' changes: - mnt_umounting is gone - mnt_slave_list/mnt_slave is an hlist now - overmounts are kept track of by explicit pointer in mount - a bunch of flags moved out of mnt_flags to a new field, with only namespace_sem for protection - mnt_expiry is protected by mount_lock now (instead of namespace_sem) - MNT_LOCKED is used only for mounts that need to remain attached to their parents to prevent mountpoint exposure - no more overloading it for absolute roots - all mnt_list uses are transient now - it's used only to represent temporary sets during umount_tree() - mount refcounting change: children no longer pin parents for any mounts, whether they'd passed through umount_tree() or not - 'struct mountpoint' changes: - refcount is no more; what matters is ->m_list emptiness - instead of temporary bumping the refcount, we insert a new object (pinned_mountpoint) into ->m_list - new calling conventions for lock_mount() and friends - do_move_mount()/attach_recursive_mnt() seriously cleaned up - globals in fs/pnode.c are gone - propagate_mnt(), change_mnt_propagation() and propagate_umount() cleaned up (in the last case - pretty much completely rewritten). - freeing of emptied mnt_namespace is done in namespace_unlock(). For one thing, there are subtle ordering requirements there; for another it simplifies cleanups. - assorted cleanups - restore the machinery for long-term mounts from accumulated bitrot. This is going to get a followup come next cycle, when the change of vfs_fs_parse_string() calling conventions goes into -next * tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (48 commits) statmount_mnt_basic(): simplify the logics for group id invent_group_ids(): zero ->mnt_group_id always implies !IS_MNT_SHARED() get rid of CL_SHARE_TO_SLAVE take freeing of emptied mnt_namespace to namespace_unlock() copy_tree(): don't link the mounts via mnt_list change_mnt_propagation(): move ->mnt_master assignment into MS_SLAVE case mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node turn do_make_slave() into transfer_propagation() do_make_slave(): choose new master sanely change_mnt_propagation(): do_make_slave() is a no-op unless IS_MNT_SHARED() change_mnt_propagation() cleanups, step 1 propagate_mnt(): fix comment and convert to kernel-doc, while we are at it propagate_mnt(): get rid of last_dest fs/pnode.c: get rid of globals propagate_one(): fold into the sole caller propagate_one(): separate the "what should be the master for this copy" part propagate_one(): separate the "do we need secondary here?" logics propagate_mnt(): handle all peer groups in the same loop propagate_one(): get rid of dest_master mount: separate the flags accessed only under namespace_sem ...
5 daysMerge tag 'pull-ceph-d_name-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ceph dentry->d_name fixes from Al Viro: "Stuff that had fallen through the cracks back in February; ceph folks tested that pile and said they prefer to have it go through my tree..." * tag 'pull-ceph-d_name-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ceph: fix a race with rename() in ceph_mdsc_build_path() prep for ceph_encode_encrypted_fname() fixes [ceph] parse_longname(): strrchr() expects NUL-terminated string
5 daysMerge tag 'pull-rpc_pipefs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rpc_pipefs updates from Al Viro: "Massage rpc_pipefs to use saner primitives and clean up the APIs provided to the rest of the kernel" * tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rpc_create_client_dir(): return 0 or -E... rpc_create_client_dir(): don't bother with rpc_populate() rpc_new_dir(): the last argument is always NULL rpc_pipe: expand the calls of rpc_mkdir_populate() rpc_gssd_dummy_populate(): don't bother with rpc_populate() rpc_mkpipe_dentry(): switch to simple_start_creating() rpc_pipe: saner primitive for creating regular files rpc_pipe: saner primitive for creating subdirectories rpc_pipe: don't overdo directory locking rpc_mkpipe_dentry(): saner calling conventions rpc_unlink(): saner calling conventions rpc_populate(): lift cleanup into callers rpc_unlink(): use simple_recursive_removal() rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead rpc_pipe: clean failure exits in fill_super new helper: simple_start_creating()
5 daysMerge tag 'pull-simple_recursive_removal' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull simple_recursive_removal() update from Al Viro: "Removing subtrees of kernel filesystems is done in quite a few places; unfortunately, it's easy to get wrong. A number of open-coded attempts are out there, with varying amount of bogosities. simple_recursive_removal() had been introduced for doing that with all precautions needed; it does an equivalent of rm -rf, with sufficient locking, eviction of anything mounted on top of the subtree, etc. This series converts a bunch of open-coded instances to using that" * tag 'pull-simple_recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: functionfs, gadgetfs: use simple_recursive_removal() kill binderfs_remove_file() fuse_ctl: use simple_recursive_removal() pstore: switch to locked_recursive_removal() binfmt_misc: switch to locked_recursive_removal() spufs: switch to locked_recursive_removal() add locked_recursive_removal() better lockdep annotations for simple_recursive_removal() simple_recursive_removal(): saner interaction with fsnotify
5 daysMerge tag 'pull-dcache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dentry d_flags updates from Al Viro: "The current exclusion rules for dentry->d_flags stores are rather unpleasant. The basic rules are simple: - stores to dentry->d_flags are OK under dentry->d_lock - stores to dentry->d_flags are OK in the dentry constructor, before becomes potentially visible to other threads Unfortunately, there's a couple of exceptions to that, and that's where the headache comes from. The main PITA comes from d_set_d_op(); that primitive sets ->d_op of dentry and adjusts the flags that correspond to presence of individual methods. It's very easy to misuse; existing uses _are_ safe, but proof of correctness is brittle. Use in __d_alloc() is safe (we are within a constructor), but we might as well precalculate the initial value of 'd_flags' when we set the default ->d_op for given superblock and set 'd_flags' directly instead of messing with that helper. The reasons why other uses are safe are bloody convoluted; I'm not going to reproduce it here. See [1] for gory details, if you care. The critical part is using d_set_d_op() only just prior to d_splice_alias(), which makes a combination of d_splice_alias() with setting ->d_op, etc a natural replacement primitive. Better yet, if we go that way, it's easy to take setting ->d_op and modifying 'd_flags' under ->d_lock, which eliminates the headache as far as 'd_flags' exclusion rules are concerned. Other exceptions are minor and easy to deal with. What this series does: - d_set_d_op() is no longer available; instead a new primitive (d_splice_alias_ops()) is provided, equivalent to combination of d_set_d_op() and d_splice_alias(). - new field of struct super_block - 's_d_flags'. This sets the default value of 'd_flags' to be used when allocating dentries on this filesystem. - new primitive for setting 's_d_op': set_default_d_op(). This replaces stores to 's_d_op' at mount time. All in-tree filesystems converted; out-of-tree ones will get caught by the compiler ('s_d_op' is renamed, so stores to it will be caught). 's_d_flags' is set by the same primitive to match the 's_d_op'. - a lot of filesystems had sb->s_d_op->d_delete equal to always_delete_dentry; that is equivalent to setting DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well set that bit in 's_d_flags' and drop 'd_delete()' from dentry_operations. In quite a few cases that results in empty dentry_operations, which means that we can get rid of those. - kill simple_dentry_operations - not needed anymore - massage d_alloc_parallel() to get rid of the other exception wrt 'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we allocate the new dentry; no need to delay that until we commit to using the sucker. As the result, 'd_flags' stores are all either under ->d_lock or done before the dentry becomes visible in any shared data structures" Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1] * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits) configfs: use DCACHE_DONTCACHE debugfs: use DCACHE_DONTCACHE efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry() 9p: don't bother with always_delete_dentry ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE kill simple_dentry_operations devpts, sunrpc, hostfs: don't bother with ->d_op shmem: no dentry retention past the refcount reaching zero d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier make d_set_d_op() static simple_lookup(): just set DCACHE_DONTCACHE tracefs: Add d_delete to remove negative dentries set_default_d_op(): calculate the matching value for ->d_flags correct the set of flags forbidden at d_set_d_op() time split d_flags calculation out of d_set_d_op() new helper: set_default_d_op() fuse: no need for special dentry_operations for root dentry switch procfs from d_set_d_op() to d_splice_alias_ops() new helper: d_splice_alias_ops() procfs: kill ->proc_dops ...
5 daysfsnotify: optimize FMODE_NONOTIFY_PERM for the common casesAmir Goldstein
The most unlikely watched permission event is FAN_ACCESS_PERM, because at the time that it was introduced there were no evictable ignore mark, so subscribing to FAN_ACCESS_PERM would have incured a very high overhead. Yet, when we set the fmode to FMODE_NOTIFY_HSM(), we never skip trying to send FAN_ACCESS_PERM, which is almost always a waste of cycles. We got to this logic because of bundling FAN_OPEN*_PERM and FAN_ACCESS_PERM in the same category and because FAN_OPEN_PERM is a commonly used event. By open coding fsnotify_open_perm() in fsnotify_open_perm_and_set_mode(), we no longer need to regard FAN_OPEN*_PERM when calculating fmode. This leaves the case of having pre-content events and not having any other permission event in the object masks a more likely case than the other way around. Rework the fmode macros and code so that their meaning now refers only to hooks on an already open file: - FMODE_NOTIFY_NONE() skip all events - FMODE_NOTIFY_ACCESS_PERM() send all permission events including FAN_ACCESS_PERM - FMODE_NOTIFY_HSM() send pre-content permission events Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250708143641.418603-3-amir73il@gmail.com
5 daysfsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hookAmir Goldstein
Create helper fsnotify_open_perm_and_set_mode() that moves the fsnotify_open_perm() hook into file_set_fsnotify_mode_from_watchers(). This will allow some more optimizations. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250708143641.418603-2-amir73il@gmail.com
5 daysMerge tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "NFSD is finally able to offer write delegations to clients that open files with O_WRONLY, thanks to patches from Dai Ngo. We're expecting this to accelerate a few interesting corner cases. The cap on the number of operations per NFSv4 COMPOUND has been lifted. Now, clients that send COMPOUNDs containing dozens of operations (for example, a long stream of LOOKUP operations to walk a pathname in a single round trip) will no longer be rejected. This release re-enables the ability for NFSD to perform NFSv4.2 COPY operations asynchronously. This feature has been disabled to mitigate the risk of denial-of-service when too many such requests arrive. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.17 development cycle" * tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (32 commits) nfsd: Drop dprintk in blocklayout xdr functions sunrpc: make svc_tcp_sendmsg() take a signed sentp pointer sunrpc: rearrange struct svc_rqst for fewer cachelines sunrpc: return better error in svcauth_gss_accept() on alloc failure sunrpc: reset rq_accept_statp when starting a new RPC sunrpc: remove SVC_SYSERR sunrpc: fix handling of unknown auth status codes NFSD: Simplify struct knfsd_fh NFSD: Access a knfsd_fh's fsid by pointer Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous" NFSD: Avoid multiple -Wflex-array-member-not-at-end warnings NFSD: Use vfs_iocb_iter_write() NFSD: Use vfs_iocb_iter_read() NFSD: Clean up kdoc for nfsd_open_local_fh() NFSD: Clean up kdoc for nfsd_file_put_local() NFSD: Remove definition for trace_nfsd_ctl_maxconn NFSD: Remove definition for trace_nfsd_file_gc_recent NFSD: Remove definitions for unused trace_nfsd_file_lru trace points NFSD: Remove definition for trace_nfsd_file_unhash_and_queue nfsd: Use correct error code when decoding extents ...
5 daysMerge tag 'gfs2-for-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Prevent cluster nodes from trying to recover their own filesystems during a withdraw - Add two missing migrate_folio aops and an additional exhash directory consistency check (both triggered by syzbot bug reports) - Sanitize how dlm results are processed and clean up a few quirks in the glock code - Minor stuff: Get rid of the GIF_ALLOC_FAILED flag; use SECTOR_SIZE and SECTOR_SHIFT * tag 'gfs2-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: No more self recovery gfs2: Validate i_depth for exhash directories gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops gfs2: a minor finish_xmote cleanup gfs2: simplify finish_xmote gfs2: sanitize the gdlm_ast -> finish_xmote interface gfs2: Minor do_xmote cancelation fix gfs2: Remove GIF_ALLOC_FAILED flag gfs2: Use SECTOR_SIZE and SECTOR_SHIFT
5 daysMerge tag 'xfs-merge-6.17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Carlos Maiolino: "This doesn't contain any new features. It mostly is a collection of clean ups and code refactoring that I preferred to postpone to the merge window. It includes removal of several unused tracepoints, refactoring key comparing routines under the B-Trees management and cleanup of xfs journaling code" * tag 'xfs-merge-6.17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits) xfs: don't use a xfs_log_iovec for ri_buf in log recovery xfs: don't use a xfs_log_iovec for attr_item names and values xfs: use better names for size members in xfs_log_vec xfs: cleanup the ordered item logic in xlog_cil_insert_format_items xfs: don't pass the old lv to xfs_cil_prepare_item xfs: remove unused trace event xfs_reflink_cow_enospc xfs: remove unused trace event xfs_discard_rtrelax xfs: remove unused trace event xfs_log_cil_return xfs: remove unused trace event xfs_dqreclaim_dirty fs/xfs: replace strncpy with memtostr_pad() xfs: Remove unused label in xfs_dax_notify_dev_failure xfs: improve the comments in xfs_select_zone_nowait xfs: improve the comments in xfs_max_open_zones xfs: stop passing an inode to the zone space reservation helpers xfs: rename oz_write_pointer to oz_allocated xfs: use a uint32_t to cache i_used_blocks in xfs_init_zone xfs: improve the xg_active_ref check in xfs_group_free xfs: remove the xlog_ticket_t typedef xfs: remove xrep_trans_{alloc,cancel}_hook_dummy xfs: return the allocated transaction from xchk_trans_alloc_empty ...
5 daysMerge tag 'erofs-for-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "We now support metadata compression. It can be useful for embedded use cases or archiving a large number of small files. Additionally, readdir performance has been improved by enabling readahead (note that it was already common practice for ext3/4 non-dx and f2fs directories). We may consider further improvements later to align with ext4's s_inode_readahead_blks behavior for slow devices too. The remaining commits are minor. Summary: - Add support for metadata compression - Enable readahead for directories to improve readdir performance - Minor fixes and cleanups" * tag 'erofs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: support to readahead dirent blocks in erofs_readdir() erofs: implement metadata compression erofs: add on-disk definition for metadata compression erofs: fix build error with CONFIG_EROFS_FS_ZIP_ACCEL=y erofs: remove ENOATTR definition erofs: refine erofs_iomap_begin() erofs: unify meta buffers in z_erofs_fill_inode() erofs: remove need_kmap in erofs_read_metabuf() erofs: do sanity check on m->type in z_erofs_load_compact_lcluster() erofs: get rid of {get,put}_page() for ztailpacking data
5 daysMerge tag 'ntfs3_for_6.17' of ↵Linus Torvalds
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: "Added: - sanity check for file name - mark live inode as bad and avoid any operations Fixed: - handling of symlinks created in windows - creation of symlinks for relative path Changed: - cancel setting inode as bad after removing name fails - revert 'replace inode_trylock with inode_lock'" * tag 'ntfs3_for_6.17' of https://github.com/Paragon-Software-Group/linux-ntfs3: Revert "fs/ntfs3: Replace inode_trylock with inode_lock" fs/ntfs3: Exclude call make_bad_inode for live nodes. fs/ntfs3: cancle set bad inode after removing name fails fs/ntfs3: Add sanity check for file name fs/ntfs3: correctly create symlink for relative path fs/ntfs3: fix symlinks cannot be handled correctly
5 daysMerge tag 'for-6.17-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "A number of usability and feature updates, scattered performance improvements and fixes. Highlight of the core changes is getting closer to enabling large folios (now behind a config option). User visible changes: - update defrag ioctl, add new flag to request no compression on existing extents - restrict writes to block devices after mount - in experimental config, enable large folios for data, almost complete but not widely tested - add stats tracking duration of critical section in transaction commit to /sys/fs/btrfs/FSID/commit_stats Performance improvements: - caching of lookup results of free space bitmap (20% runtime improvement on an empty file creation benchmark) - accessors to metadata (b-tree items) simplified and optimized, minor improvement in metadata-heavy workloads - readahead on compressed data improves sequential read - the xarray for extent buffers is indexed by denser keys, leading to better packing of the nodes (50-70% reduction of leaf nodes) Notable fixes: - stricter compression mount option parsing - send properly emits fallocate command for file holes when protocol v2 is used - fix overallocation of chunks with mount option 'ssd_spread', due to interaction with size classes not finding the right chunk (workaround: manual reclaim by 'usage' balance filter) - various quota enable/disable races with rescan, more verbose notifications about inconsistent state - populate otime in tree-log during log replay - handle ENOSPC when NOCOW file is used with mmap() Core: - large data folios enabled in experimental config - improved error handling, transaction abort call sites - in zoned mode, allocate reloc block group on mount to make sure there's always one available for zone reclaim under heavy load - rework device opening, they're always open as read-only and delayed until the super block is created, allowing the restricted writes after mount - preparatory work for adding blk_holder_ops, allowing device freeze/thaw in the future Cleanups, refactoring: - type and naming unifications (int/bool, return variables) - rb-tree helper refactoring and simplifications - reorder memory allocations to less critical places - RCU string (used for device name) refactoring and API removal - replace all remaining use of strcpy()" * tag 'for-6.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (209 commits) btrfs: send: use fallocate for hole punching with send stream v2 btrfs: unfold transaction aborts when writing dirty block groups btrfs: use saner variable type and name to indicate extrefs at add_inode_ref() btrfs: don't skip remaining extrefs if dir not found during log replay btrfs: don't ignore inode missing when replaying log tree btrfs: enable large data folios for data reloc inode btrfs: output more info when btrfs_subpage_assert() failed btrfs: reloc: unconditionally invalidate the page cache for each cluster btrfs: defrag: add flag to force no-compression btrfs: fix ssd_spread overallocation btrfs: zoned: requeue to unused block group list if zone finish failed btrfs: zoned: do not remove unwritten non-data block group btrfs: remove btrfs_clear_extent_bits() btrfs: use cached state when falling back from NOCoW write to CoW write btrfs: set EXTENT_NORESERVE before range unlock in btrfs_truncate_block() btrfs: don't print relocation messages from auto reclaim btrfs: remove redundant auto reclaim log message btrfs: make btrfs_check_nocow_lock() check more than one extent btrfs: assert we can NOCOW the range in btrfs_truncate_block() btrfs: update function comment for btrfs_check_nocow_lock() ...
6 dayssmb: client: get rid of kstrdup() when parsing iocharset mount optionPaulo Alcantara
Steal string reference from @param->string rather than duplicating it. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>