summaryrefslogtreecommitdiff
path: root/fs/nfsd/nfsctl.c
AgeCommit message (Collapse)Author
3 daysMerge tag 'net-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
2025-07-14NFSD: Make nfsd_genl_rqstp::rq_ops array best-effortChuck Lever
To enable NFSD to handle NFSv4 COMPOUNDs of unrestricted size, resize the array in struct nfsd_genl_rqstp so it saves only up to 16 operations per COMPOUND. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Rename a function parameterChuck Lever
Clean up: A function parameter called "rqstp" typically refers to an object of type "struct svc_rqst", so it's confusing when such an parameter refers to a different struct type with field names that are very similar to svc_rqst. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-02netlink: introduce type-checking attribute iteration for nlmsgCarolina Jubran
Add the nlmsg_for_each_attr_type() macro to simplify iteration over attributes of a specific type in a Netlink message. Convert existing users in vxlan and nfsd to use the new macro. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250629142138.361537-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19nfsd: use threads array as-is in netlink interfaceJeff Layton
The old nfsdfs interface for starting a server with multiple pools handles the special case of a single entry array passed down from userland by distributing the threads over every NUMA node. The netlink control interface however constructs an array of length nfsd_nrpools() and fills any unprovided slots with 0's. This behavior defeats the special casing that the old interface relies on. Change nfsd_nl_threads_set_doit() to pass down the array from userland as-is. Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink") Cc: stable@vger.kernel.org Reported-by: Mike Snitzer <snitzer@kernel.org> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/ Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Add /sys/kernel/debug/nfsdChuck Lever
Create a small sandbox under /sys/kernel/debug for experimental NFS server feature settings. There is no API/ABI compatibility guarantee for these settings. The only documentation for such settings, if any documentation exists, is in the kernel source code. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: fix race between nfsd registration and exports_procManinder Singh
As of now nfsd calls create_proc_exports_entry() at start of init_nfsd and cleanup by remove_proc_entry() at last of exit_nfsd. Which causes kernel OOPs if there is race between below 2 operations: (i) exportfs -r (ii) mount -t nfsd none /proc/fs/nfsd for 5.4 kernel ARM64: CPU 1: el1_irq+0xbc/0x180 arch_counter_get_cntvct+0x14/0x18 running_clock+0xc/0x18 preempt_count_add+0x88/0x110 prep_new_page+0xb0/0x220 get_page_from_freelist+0x2d8/0x1778 __alloc_pages_nodemask+0x15c/0xef0 __vmalloc_node_range+0x28c/0x478 __vmalloc_node_flags_caller+0x8c/0xb0 kvmalloc_node+0x88/0xe0 nfsd_init_net+0x6c/0x108 [nfsd] ops_init+0x44/0x170 register_pernet_operations+0x114/0x270 register_pernet_subsys+0x34/0x50 init_nfsd+0xa8/0x718 [nfsd] do_one_initcall+0x54/0x2e0 CPU 2 : Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 PC is at : exports_net_open+0x50/0x68 [nfsd] Call trace: exports_net_open+0x50/0x68 [nfsd] exports_proc_open+0x2c/0x38 [nfsd] proc_reg_open+0xb8/0x198 do_dentry_open+0x1c4/0x418 vfs_open+0x38/0x48 path_openat+0x28c/0xf18 do_filp_open+0x70/0xe8 do_sys_open+0x154/0x248 Sometimes it crashes at exports_net_open() and sometimes cache_seq_next_rcu(). and same is happening on latest 6.14 kernel as well: [ 0.000000] Linux version 6.14.0-rc5-next-20250304-dirty ... [ 285.455918] Unable to handle kernel paging request at virtual address 00001f4800001f48 ... [ 285.464902] pc : cache_seq_next_rcu+0x78/0xa4 ... [ 285.469695] Call trace: [ 285.470083] cache_seq_next_rcu+0x78/0xa4 (P) [ 285.470488] seq_read+0xe0/0x11c [ 285.470675] proc_reg_read+0x9c/0xf0 [ 285.470874] vfs_read+0xc4/0x2fc [ 285.471057] ksys_read+0x6c/0xf4 [ 285.471231] __arm64_sys_read+0x1c/0x28 [ 285.471428] invoke_syscall+0x44/0x100 [ 285.471633] el0_svc_common.constprop.0+0x40/0xe0 [ 285.471870] do_el0_svc_compat+0x1c/0x34 [ 285.472073] el0_svc_compat+0x2c/0x80 [ 285.472265] el0t_32_sync_handler+0x90/0x140 [ 285.472473] el0t_32_sync+0x19c/0x1a0 [ 285.472887] Code: f9400885 93407c23 937d7c27 11000421 (f86378a3) [ 285.473422] ---[ end trace 0000000000000000 ]--- It reproduced simply with below script: while [ 1 ] do /exportfs -r done & while [ 1 ] do insmod /nfsd.ko mount -t nfsd none /proc/fs/nfsd umount /proc/fs/nfsd rmmod nfsd done & So exporting interfaces to user space shall be done at last and cleanup at first place. With change there is no Kernel OOPs. Co-developed-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: unregister filesystem in case genl_register_family() failsManinder Singh
With rpc_status netlink support, unregister of register_filesystem() was missed in case of genl_register_family() fails. Correcting it by making new label. Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support") Cc: stable@vger.kernel.org Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: don't ignore the return code of svc_proc_register()Jeff Layton
Currently, nfsd_proc_stat_init() ignores the return value of svc_proc_register(). If the procfile creation fails, then the kernel will WARN when it tries to remove the entry later. Fix nfsd_proc_stat_init() to return the same type of pointer as svc_proc_register(), and fix up nfsd_net_init() to check that and fail the nfsd_net construction if it occurs. svc_proc_register() can fail if the dentry can't be allocated, or if an identical dentry already exists. The second case is pretty unlikely in the nfsd_net construction codepath, so if this happens, return -ENOMEM. Reported-by: syzbot+e34ad04f27991521104c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-nfs/67a47501.050a0220.19061f.05f9.GAE@google.com/ Cc: stable@vger.kernel.org # v6.9 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: fix management of listener transportsOlga Kornievskaia
Currently, when no active threads are running, a root user using nfsdctl command can try to remove a particular listener from the list of previously added ones, then start the server by increasing the number of threads, it leads to the following problem: [ 158.835354] refcount_t: addition on 0; use-after-free. [ 158.835603] WARNING: CPU: 2 PID: 9145 at lib/refcount.c:25 refcount_warn_saturate+0x160/0x1a0 [ 158.836017] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace overlay isofs uinput snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables qrtr sunrpc vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_hda_codec_generic mc e1000e snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore sg loop dm_multipath dm_mod nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs libcrc32c crct10dif_ce ghash_ce vmwgfx sha2_ce sha256_arm64 sr_mod sha1_ce cdrom nvme drm_client_lib drm_ttm_helper ttm nvme_core drm_kms_helper nvme_auth drm fuse [ 158.840093] CPU: 2 UID: 0 PID: 9145 Comm: nfsd Kdump: loaded Tainted: G B W 6.13.0-rc6+ #7 [ 158.840624] Tainted: [B]=BAD_PAGE, [W]=WARN [ 158.840802] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024 [ 158.841220] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 158.841563] pc : refcount_warn_saturate+0x160/0x1a0 [ 158.841780] lr : refcount_warn_saturate+0x160/0x1a0 [ 158.842000] sp : ffff800089be7d80 [ 158.842147] x29: ffff800089be7d80 x28: ffff00008e68c148 x27: ffff00008e68c148 [ 158.842492] x26: ffff0002e3b5c000 x25: ffff600011cd1829 x24: ffff00008653c010 [ 158.842832] x23: ffff00008653c000 x22: 1fffe00011cd1829 x21: ffff00008653c028 [ 158.843175] x20: 0000000000000002 x19: ffff00008653c010 x18: 0000000000000000 [ 158.843505] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 158.843836] x14: 0000000000000000 x13: 0000000000000001 x12: ffff600050a26493 [ 158.844143] x11: 1fffe00050a26492 x10: ffff600050a26492 x9 : dfff800000000000 [ 158.844475] x8 : 00009fffaf5d9b6e x7 : ffff000285132493 x6 : 0000000000000001 [ 158.844823] x5 : ffff000285132490 x4 : ffff600050a26493 x3 : ffff8000805e72bc [ 158.845174] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000098588000 [ 158.845528] Call trace: [ 158.845658] refcount_warn_saturate+0x160/0x1a0 (P) [ 158.845894] svc_recv+0x58c/0x680 [sunrpc] [ 158.846183] nfsd+0x1fc/0x348 [nfsd] [ 158.846390] kthread+0x274/0x2f8 [ 158.846546] ret_from_fork+0x10/0x20 [ 158.846714] ---[ end trace 0000000000000000 ]--- nfsd_nl_listener_set_doit() would manipulate the list of transports of server's sv_permsocks and close the specified listener but the other list of transports (server's sp_xprts list) would not be changed leading to the problem above. Instead, determined if the nfsdctl is trying to remove a listener, in which case, delete all the existing listener transports and re-create all-but-the-removed ones. Fixes: 16a471177496 ("NFSD: add listener-{set,get} netlink command") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-28Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Enable using direct IO with localio - Added localio related tracepoints Bugfixes: - Sunrpc fixes for working with a very large cl_tasks list - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client() - Fixes for handling reconnections with localio - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT - Fix COPY_NOTIFY xdr_buf size calculations - pNFS/Flexfiles fix for retrying requesting a layout segment for reads - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired Cleanups: - Various other nfs & nfsd localio cleanups - Prepratory patches for async copy improvements that are under development - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts - Add netns inum and srcaddr to debugfs rpc_xprt info" * tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits) SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info pnfs/flexfiles: retry getting layout segment for reads NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE NFSv4.2: fix COPY_NOTIFY xdr buf size calculation NFS: Rename struct nfs4_offloadcancel_data NFS: Fix typo in OFFLOAD_CANCEL comment NFS: CB_OFFLOAD can return NFS4ERR_DELAY nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it nfs: fix incorrect error handling in LOCALIO nfs: probe for LOCALIO when v3 client reconnects to server nfs: probe for LOCALIO when v4 client reconnects to server nfs/localio: remove redundant code and simplify LOCALIO enablement nfs_common: add nfs_localio trace events nfs_common: track all open nfsd_files per LOCALIO nfs_client nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ nfsd: update percpu_ref to manage references on nfsd_net ...
2025-01-14nfs_common: track all open nfsd_files per LOCALIO nfs_clientMike Snitzer
This tracking enables __nfsd_file_cache_purge() to call nfs_localio_invalidate_clients(), upon shutdown or export change, to nfs_close_local_fh() all open nfsd_files that are still cached by the LOCALIO nfs clients associated with nfsd_net that is being shutdown. Now that the client must track all open nfsd_files there was more work than necessary being done with the global nfs_uuids_lock contended. This manifested in various RCU issues, e.g.: hrtimer: interrupt took 47969440 ns rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Use nfs_uuid->lock to protect all nfs_uuid_t members, instead of nfs_uuids_lock, once nfs_uuid_is_local() adds the client to nn->local_clients. Also add 'local_clients_lock' to 'struct nfsd_net' to protect nn->local_clients. And store a pointer to spinlock in the 'list_lock' member of nfs_uuid_t so nfs_localio_disable_client() can use it to avoid taking the global nfs_uuids_lock. In combination, these split out locks eliminate the use of the single nfslocalio.c global nfs_uuids_lock in the IO paths (open and close). Also refactored associated fs/nfs_common/nfslocalio.c methods' locking to reduce work performed with spinlocks held in general. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14nfs_common: rename functions that invalidate LOCALIO nfs_clientsMike Snitzer
Rename nfs_uuid_invalidate_one_client to nfs_localio_disable_client. Rename nfs_uuid_invalidate_clients to nfs_localio_invalidate_clients. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-06sunrpc: remove all connection limit configurationNeilBrown
Now that the connection limit only apply to unconfirmed connections, there is no need to configure it. So remove all the configuration and fix the number of unconfirmed connections as always 64 - which is now given a name: XPT_MAX_TMP_CONN Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-23nfsd: add LOCALIO supportWeston Andros Adamson
Add server support for bypassing NFS for localhost reads, writes, and commits. This is only useful when both the client and server are running on the same host. If nfsd_open_local_fh() fails then the NFS client will both retry and fallback to normal network-based read, write and commit operations if localio is no longer supported. Care is taken to ensure the same NFS security mechanisms are used (authentication, etc) regardless of whether localio or regular NFS access is used. The auth_domain established as part of the traditional NFS client access to the NFS server is also used for localio. Store auth_domain for localio in nfsd_uuid_t and transfer it to the client if it is local to the server. Relative to containers, localio gives the client access to the network namespace the server has. This is required to allow the client to access the server's per-namespace nfsd_net struct. This commit also introduces the use of NFSD's percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh and other LOCALIO client code. CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: replace program list with program arrayNeilBrown
A service created with svc_create_pooled() can be given a linked list of programs and all of these will be served. Using a linked list makes it cumbersome when there are several programs that can be optionally selected with CONFIG settings. After this patch is applied, API consumers must use only svc_create_pooled() when creating an RPC service that listens for more than one RPC program. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-20sunrpc: change sp_nrthreads from atomic_t to unsigned int.NeilBrown
sp_nrthreads is only ever accessed under the service mutex nlmsvc_mutex nfs_callback_mutex nfsd_mutex so these is no need for it to be an atomic_t. The fact that all code using it is single-threaded means that we can simplify svc_pool_victim and remove the temporary elevation of sp_nrthreads. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: don't allocate the versions array.NeilBrown
Instead of using kmalloc to allocate an array for storing active version info, just declare an array to the max size - it is only 5 or so. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-01nfsd: move nfsd_pool_stats_open into nfsctl.cNeilBrown
nfsd_pool_stats_open() is used in nfsctl.c, so move it there. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-22nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd socketsJeff Layton
When creating nfsd sockets via the netlink interface, we do want to register with the portmapper. Don't set SVC_SOCK_ANONYMOUS. Reported-by: Steve Dickson <steved@redhat.com> Fixes: 16a471177496 ("NFSD: add listener-{set,get} netlink command") Cc: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08nfsd: new netlink ops to get/set server pool_modeJeff Layton
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08nfsd: allow passing in array of thread counts via netlinkJeff Layton
Now that nfsd_svc can handle an array of thread counts, fix up the netlink threads interface to construct one from the netlink call and pass it through so we can start a pooled server the same way we would start a normal one. Note that any unspecified values in the array are considered zeroes, so it's possible to shut down a pooled server by passing in a short array that has only zeros, or even an empty array. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08nfsd: make nfsd_svc take an array of thread countsJeff Layton
Now that the refcounting is fixed, rework nfsd_svc to use the same thread setup as the pool_threads interface. Have it take an array of thread counts instead of just a single value, and pass that from the netlink threads set interface. Since the new netlink interface doesn't have the same restriction as pool_threads, move the guard against shutting down all threads to write_pool_threads. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-06-25nfsd: initialise nfsd_info.mutex early.NeilBrown
nfsd_info.mutex can be dereferenced by svc_pool_stats_start() immediately after the new netns is created. Currently this can trigger an oops. Move the initialisation earlier before it can possibly be dereferenced. Fixes: 7b207ccd9833 ("svc: don't hold reference for poolstats, only mutex.") Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com> Closes: https://lore.kernel.org/all/c2e9f6de-1ec4-4d3a-b18d-d5a6ec0814a0@linux.ibm.com/ Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-06-17NFSD: grab nfsd_mutex in nfsd_nl_rpc_status_get_dumpit()Lorenzo Bianconi
Grab nfsd_mutex lock in nfsd_nl_rpc_status_get_dumpit routine and remove nfsd_nl_rpc_status_get_start() and nfsd_nl_rpc_status_get_done(). This patch fix the syzbot log reported below: INFO: task syz-executor.1:17770 blocked for more than 143 seconds. Not tainted 6.10.0-rc3-syzkaller-00022-gcea2a26553ac #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor.1 state:D stack:23800 pid:17770 tgid:17767 ppid:11381 flags:0x00000006 Call Trace: <TASK> context_switch kernel/sched/core.c:5408 [inline] __schedule+0x17e8/0x4a20 kernel/sched/core.c:6745 __schedule_loop kernel/sched/core.c:6822 [inline] schedule+0x14b/0x320 kernel/sched/core.c:6837 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6894 __mutex_lock_common kernel/locking/mutex.c:684 [inline] __mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752 nfsd_nl_listener_get_doit+0x115/0x5d0 fs/nfsd/nfsctl.c:2124 genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0xb16/0xec0 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x1e5/0x430 net/netlink/af_netlink.c:2564 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0x7ec/0x980 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x223/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585 ___sys_sendmsg net/socket.c:2639 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f24ed27cea9 RSP: 002b:00007f24ee0080c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f24ed3b3f80 RCX: 00007f24ed27cea9 RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000005 RBP: 00007f24ed2ebff4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 Fixes: 1bd773b4f0c9 ("nfsd: hold nfsd_mutex across entire netlink operation") Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: add listener-{set,get} netlink commandLorenzo Bianconi
Introduce write_ports netlink command. For listener-set, userspace is expected to provide a NFS listeners list it wants enabled. All other sockets will be closed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Co-developed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: add write_version to netlink commandLorenzo Bianconi
Introduce write_version netlink command through a "declarative" interface. This patch introduces a change in behavior since for version-set userspace is expected to provide a NFS major/minor version list it wants to enable while all the other ones will be disabled. (procfs write_version command implements imperative interface where the admin writes +3/-3 to enable/disable a single version. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: convert write_threads to netlink commandLorenzo Bianconi
Introduce write_threads netlink command similar to the one available through the procfs. Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Co-developed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: allow callers to pass in scope string to nfsd_svcJeff Layton
Currently admins set this by using unshare to create a new uts namespace, and then resetting the hostname. With the new netlink interface we can just pass this in directly. Prepare nfsd_svc for this change. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: move nfsd_mutex handling into nfsd_svc callersJeff Layton
Currently nfsd_svc holds the nfsd_mutex over the whole function. For some of the later netlink patches though, we want to do some other things to the server before starting it. Move the mutex handling into the callers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06nfsd: don't create nfsv4recoverydir in nfsdfs when not used.NeilBrown
When CONFIG_NFSD_LEGACY_CLIENT_TRACKING is not set, the virtual file /proc/fs/nfsd/nfsv4recoverydir is created but responds EINVAL to any access. This is not useful, is somewhat surprising, and it causes ltp to complain. The only known user of this file is in nfs-utils, which handles non-existence and read-failure equally well. So there is nothing to gain from leaving the file present but inaccessible. So this patch removes the file when its content is not available - i.e. when that config option is not selected. Also remove the #ifdef which hides some of the enum values when CONFIG_NFSD_V$ not selection. simple_fill_super() quietly ignores array entries that are not present, so having slots in the array that don't get used is perfectly acceptable. So there is no value in this #ifdef. Reported-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06fs: nfsd: use group allocation/free of per-cpu counters APIKefeng Wang
Use group allocation/free of per-cpu counters api to accelerate nfsd percpu_counters init/destroy(), and also squash the nfsd_percpu_counters_init/reset/destroy() and nfsd_counters_init/destroy() into callers to simplify code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: prepare for supporting admin-revocation of stateNeilBrown
The NFSv4 protocol allows state to be revoked by the admin and has error codes which allow this to be communicated to the client. This patch - introduces a new state-id status SC_STATUS_ADMIN_REVOKED which can be set on open, lock, or delegation state. - reports NFS4ERR_ADMIN_REVOKED when these are accessed - introduces a per-client counter of these states and returns SEQ4_STATUS_ADMIN_STATE_REVOKED when the counter is not zero. Decrements this when freeing any admin-revoked state. - introduces stub code to find all interesting states for a given superblock so they can be revoked via the 'unlock_filesystem' file in /proc/fs/nfsd/ No actual states are handled yet. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: make svc_stat per-network namespace instead of globalJosef Bacik
The final bit of stats that is global is the rpc svc_stat. Move this into the nfsd_net struct and use that everywhere instead of the global struct. Remove the unused global struct. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: make all of the nfsd stats per-network namespaceJosef Bacik
We have a global set of counters that we modify for all of the nfsd operations, but now that we're exposing these stats across all network namespaces we need to make the stats also be per-network namespace. We already have some caching stats that are per-network namespace, so move these definitions into the same counter and then adjust all the helpers and users of these stats to provide the appropriate nfsd_net struct so that the stats are maintained for the per-network namespace objects. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: expose /proc/net/sunrpc/nfsd in net namespacesJosef Bacik
We are running nfsd servers inside of containers with their own network namespace, and we want to monitor these services using the stats found in /proc. However these are not exposed in the proc inside of the container, so we have to bind mount the host /proc into our containers to get at this information. Separate out the stat counters init and the proc registration, and move the proc registration into the pernet operations entry and exit points so that these stats can be exposed inside of network namespaces. This is an intermediate step, this just exposes the global counters in the network namespace. Subsequent patches will move these counters into the per-network namespace container. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-11Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc filesystem updates from Al Viro: "Misc cleanups (the part that hadn't been picked by individual fs trees)" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: apparmorfs: don't duplicate kfree_link() orangefs: saner arguments passing in readdir guts ocfs2_find_match(): there's no such thing as NULL or negative ->d_parent reiserfs_add_entry(): get rid of pointless namelen checks __ocfs2_add_entry(), ocfs2_prepare_dir_for_insert(): namelen checks ext4_add_entry(): ->d_name.len is never 0 befs: d_obtain_alias(ERR_PTR(...)) will do the right thing affs: d_obtain_alias(ERR_PTR(...)) will do the right thing /proc/sys: use d_splice_alias() calling conventions to simplify failure exits hostfs: use d_splice_alias() calling conventions to simplify failure exits udf_fiiter_add_entry(): check for zero ->d_name.len is bogus... udf: d_obtain_alias(ERR_PTR(...)) will do the right thing... udf: d_splice_alias() will do the right thing on ERR_PTR() inode nfsd: kill stale comment about simple_fill_super() requirements bfs_add_entry(): get rid of pointless ->d_name.len checks nilfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing... zonefs: d_splice_alias() will do the right thing on ERR_PTR() inode
2024-01-11Merge tag 'pull-dcache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc)" * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) dcache: remove unnecessary NULL check in dget_dlock() kill DCACHE_MAY_FREE __d_unalias() doesn't use inode argument d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant get rid of DCACHE_GENOCIDE d_genocide(): move the extern into fs/internal.h simple_fill_super(): don't bother with d_genocide() on failure nsfs: use d_make_root() d_alloc_pseudo(): move setting ->d_op there from the (sole) caller kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller retain_dentry(): introduce a trimmed-down lockless variant __dentry_kill(): new locking scheme d_prune_aliases(): use a shrink list switch select_collect{,2}() to use of to_shrink_list() to_shrink_list(): call only if refcount is 0 fold dentry_kill() into dput() don't try to cut corners in shrink_lock_dentry() fold the call of retain_dentry() into fast_dput() Call retain_dentry() with refcount 0 dentry_kill(): don't bother with retain_dentry() on slow path ...
2024-01-07nfsd: rename nfsd_last_thread() to nfsd_destroy_serv()NeilBrown
As this function now destroys the svc_serv, this is a better name. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07SUNRPC: discard sv_refcnt, and svc_get/svc_putNeilBrown
sv_refcnt is no longer useful. lockd and nfs-cb only ever have the svc active when there are a non-zero number of threads, so sv_refcnt mirrors sv_nrthreads. nfsd also keeps the svc active between when a socket is added and when the first thread is started, but we don't really need a refcount for that. We can simply not destroy the svc while there are any permanent sockets attached. So remove sv_refcnt and the get/put functions. Instead of a final call to svc_put(), call svc_destroy() instead. This is changed to also store NULL in the passed-in pointer to make it easier to avoid use-after-free situations. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07svc: don't hold reference for poolstats, only mutex.NeilBrown
A future patch will remove refcounting on svc_serv as it is of little use. It is currently used to keep the svc around while the pool_stats file is open. Change this to get the pointer, protected by the mutex, only in seq_start, and the release the mutex in seq_stop. This means that if the nfsd server is stopped and restarted while the pool_stats file it open, then some pool stats info could be from the first instance and some from the second. This might appear odd, but is unlikely to be a problem in practice. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07nfsd: new Kconfig option for legacy client trackingJeff Layton
We've had a number of attempts at different NFSv4 client tracking methods over the years, but now nfsdcld has emerged as the clear winner since the others (recoverydir and the usermodehelper upcall) are problematic. As a case in point, the recoverydir backend uses MD5 hashes to encode long form clientid strings, which means that nfsd repeatedly gets dinged on FIPS audits, since MD5 isn't considered secure. Its use of MD5 is not cryptographically significant, so there is no danger there, but allowing us to compile that out allows us to sidestep the issue entirely. As a prelude to eventually removing support for these client tracking methods, add a new Kconfig option that enables them. Mark it deprecated and make it default to N. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-05Merge tag 'nfsd-6.7-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix another regression in the NFSD administrative API * tag 'nfsd-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: drop the nfsd_put helper
2024-01-04nfsd: drop the nfsd_put helperJeff Layton
It's not safe to call nfsd_put once nfsd_last_thread has been called, as that function will zero out the nn->nfsd_serv pointer. Drop the nfsd_put helper altogether and open-code the svc_put in its callers instead. That allows us to not be reliant on the value of that pointer when handling an error. Fixes: 2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()") Reported-by: Zhi Li <yieli@redhat.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeffrey Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-20Merge tag 'nfsd-6.7-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Address a few recently-introduced issues * tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806 NFSD: Revert 738401a9bd1ac34ccd5723d69640a4adbb1a4bc0 NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500d nfsd: hold nfsd_mutex across entire netlink operation nfsd: call nfsd_last_thread() before final nfsd_put()
2023-12-20nfsd: kill stale comment about simple_fill_super() requirementsAl Viro
That went into the tree back in 2005; the comment used to be true for predecessor of simple_fill_super() that happened to live in nfsd; that one didn't take care to skip the array entries with NULL ->name, so it could not tolerate any gaps. That had been fixed in 2003 when nfsd_fill_super() had been abstracted into simple_fill_super(); if Neil's patch lived out of tree during that time, he probably replaced the name of function when rebasing it and didn't notice that restriction in question was no longer there. Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-12-15nfsd: hold nfsd_mutex across entire netlink operationNeilBrown
Rather than using svc_get() and svc_put() to hold a stable reference to the nfsd_svc for netlink lookups, simply hold the mutex for the entire time. The "entire" time isn't very long, and the mutex is not often contented. This makes way for us to remove the refcounts of svc, which is more confusing than useful. Reported-by: Jeff Layton <jlayton@kernel.org> Closes: https://lore.kernel.org/linux-nfs/5d9bbb599569ce29f16e4e0eef6b291eda0f375b.camel@kernel.org/T/#u Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-15nfsd: call nfsd_last_thread() before final nfsd_put()NeilBrown
If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put() without calling nfsd_last_thread(). This leaves nn->nfsd_serv pointing to a structure that has been freed. So remove 'static' from nfsd_last_thread() and call it when the nfsd_serv is about to be destroyed. Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-11-18switch nfsd_client_rmdir() to use of simple_recursive_removal()Al Viro
nfsd_client_rmdir() open-codes a subset of simple_recursive_removal(). Conversion to calling simple_recursive_removal() allows to clean things up quite a bit. While we are at it, nfsdfs_create_files() doesn't need to mess with "pick the reference to struct nfsdfs_client from the already created parent" - the caller already knows it (that's where the parent got it from, after all), so we might as well just pass it as an explicit argument. So __get_nfsdfs_client() is only needed in get_nfsdfs_client() and can be folded in there. Incidentally, the locking in get_nfsdfs_client() is too heavy - we don't need ->i_rwsem for that, ->i_lock serves just fine. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-10-30Merge tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "This release completes the SunRPC thread scheduler work that was begun in v6.6. The scheduler can now find an svc thread to wake in constant time and without a list walk. Thanks again to Neil Brown for this overhaul. Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD control plane. The long-term plan is to provide the same functionality as found in /proc/fs/nfsd, plus some interesting additions, and then migrate the NFSD user space utilities to netlink. A long series to overhaul NFSD's NFSv4 operation encoding was applied in this release. The goals are to bring this family of encoding functions in line with the matching NFSv4 decoding functions and with the NFSv2 and NFSv3 XDR functions, preparing the way for better memory safety and maintainability. A further improvement to NFSD's write delegation support was contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the server to retrieve cached size and mtime data from clients holding write delegations. If the server can retrieve this information, it does not have to recall the delegation in some cases. The usual panoply of bug fixes and minor improvements round out this release. As always I am grateful to all contributors, reviewers, and testers" * tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (127 commits) svcrdma: Fix tracepoint printk format svcrdma: Drop connection after an RDMA Read error NFSD: clean up alloc_init_deleg() NFSD: Fix frame size warning in svc_export_parse() NFSD: Rewrite synopsis of nfsd_percpu_counters_init() nfsd: Clean up errors in nfs3proc.c nfsd: Clean up errors in nfs4state.c NFSD: Clean up errors in stats.c NFSD: simplify error paths in nfsd_svc() NFSD: Clean up nfsd4_encode_seek() NFSD: Clean up nfsd4_encode_offset_status() NFSD: Clean up nfsd4_encode_copy_notify() NFSD: Clean up nfsd4_encode_copy() NFSD: Clean up nfsd4_encode_test_stateid() NFSD: Clean up nfsd4_encode_exchange_id() NFSD: Clean up nfsd4_do_encode_secinfo() NFSD: Clean up nfsd4_encode_access() NFSD: Clean up nfsd4_encode_readdir() NFSD: Clean up nfsd4_encode_entry4() NFSD: Add an nfsd4_encode_nfs_cookie4() helper ...