summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-23pnfs/flexfiles: enable localio supportTrond Myklebust
If the DS is local to this client use localio to write the data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: enable localio for non-pNFS IOTrond Myklebust
Try a local open of the file being written to, and if it succeeds, then use localio to issue IO. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: add LOCALIO supportWeston Andros Adamson
Add client support for bypassing NFS for localhost reads, writes, and commits. This is only useful when the client and the server are running on the same host. nfs_local_probe() is stubbed out, later commits will enable client and server handshake via a Linux-only LOCALIO auxiliary RPC protocol. This has dynamic binding with the nfsd module (via nfs_localio module which is part of nfs_common). LOCALIO will only work if nfsd is already loaded. The "localio_enabled" nfs kernel module parameter can be used to disable and enable the ability to use LOCALIO support. CONFIG_NFS_LOCALIO enables NFS client support for LOCALIO. Lastly, LOCALIO uses an nfsd_file to initiate all IO. To make proper use of nfsd_file (and nfsd's filecache) its lifetime (duration before nfsd_file_put is called) must extend until after commit, read and write operations. So rather than immediately drop the nfsd_file reference in nfs_local_open_fh(), that doesn't happen until nfs_local_pgio_release() for read/write and not until nfs_local_release_commit_data() for commit. The same applies to the reference held on nfsd's nn->nfsd_serv. Both objects' lifetimes and associated references are managed through calls to nfs_to->nfsd_file_put_local(). Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> # nfs_open_local_fh Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commitMike Snitzer
The nfsd_file will be passed, in future commits, by callers that enable LOCALIO support (for both regular NFS and pNFS IO). [Derived from patch authored by Weston Andros Adamson, but switched from passing struct file to struct nfsd_file] Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: implement server support for NFS_LOCALIO_PROGRAMMike Snitzer
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL" RPC method that allows the Linux NFS client to verify the local Linux NFS server can see the nonce (single-use UUID) the client generated and made available in nfs_common. The server expects this protocol to use the same transport as NFS and NFSACL for its RPCs. This protocol isn't part of an IETF standard, nor does it need to be considering it is Linux-to-Linux auxiliary RPC protocol that amounts to an implementation detail. The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode XDR methods are used instead of the less efficient variable sized methods. The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ): Linux Kernel Organization 400122 nfslocalio Signed-off-by: Mike Snitzer <snitzer@kernel.org> [neilb: factored out and simplified single localio protocol] Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add LOCALIO supportWeston Andros Adamson
Add server support for bypassing NFS for localhost reads, writes, and commits. This is only useful when both the client and server are running on the same host. If nfsd_open_local_fh() fails then the NFS client will both retry and fallback to normal network-based read, write and commit operations if localio is no longer supported. Care is taken to ensure the same NFS security mechanisms are used (authentication, etc) regardless of whether localio or regular NFS access is used. The auth_domain established as part of the traditional NFS client access to the NFS server is also used for localio. Store auth_domain for localio in nfsd_uuid_t and transfer it to the client if it is local to the server. Relative to containers, localio gives the client access to the network namespace the server has. This is required to allow the client to access the server's per-namespace nfsd_net struct. This commit also introduces the use of NFSD's percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh and other LOCALIO client code. CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: prepare for the NFS client to use nfsd_file for LOCALIOMike Snitzer
The next commit will introduce nfsd_open_local_fh() which returns an nfsd_file structure. This commit exposes LOCALIO's required NFSD symbols to the NFS client: - Make nfsd_open_local_fh() symbol and other required NFSD symbols available to NFS in a global 'nfs_to' nfsd_localio_operations struct (global access suggested by Trond, nfsd_localio_operations suggested by NeilBrown). The next commit will also introduce nfsd_localio_ops_init() that init_nfsd() will call to initialize 'nfs_to'. - Introduce nfsd_file_file() that provides access to nfsd_file's backing file. Keeps nfsd_file structure opaque to NFS client (as suggested by Jeff Layton). - Introduce nfsd_file_put_local() that will put the reference to the nfsd_file's associated nn->nfsd_serv and then put the reference to the nfsd_file (as suggested by NeilBrown). Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to Suggested-by: NeilBrown <neilb@suse.de> # nfsd_localio_operations Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: add NFS LOCALIO auxiliary protocol enablementMike Snitzer
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client to generate a nonce (single-use UUID) and associated nfs_uuid_t struct, register it with nfs_common for subsequent lookup and verification by the NFS server and if matched the NFS server populates members in the nfs_uuid_t struct. nfs_common's nfs_uuids list is the basis for localio enablement, as such it has members that point to nfsd memory for direct use by the client (e.g. 'net' is the server's network namespace, through it the client can access nn->nfsd_serv). This commit also provides the base nfs_uuid_t interfaces to allow proper net namespace refcounting for the LOCALIO use case. CONFIG_NFS_LOCALIO controls the nfs_common, NFS server and NFS client enablement for LOCALIO. If both NFS_FS=m and NFSD=m then NFS_COMMON_LOCALIO_SUPPORT=m and nfs_localio.ko is built (and provides nfs_common's LOCALIO support). # lsmod | grep nfs_localio nfs_localio 12288 2 nfsd,nfs sunrpc 745472 35 nfs_localio,nfsd,auth_rpcgss,lockd,nfsv3,nfs Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: replace program list with program arrayNeilBrown
A service created with svc_create_pooled() can be given a linked list of programs and all of these will be served. Using a linked list makes it cumbersome when there are several programs that can be optionally selected with CONFIG settings. After this patch is applied, API consumers must use only svc_create_pooled() when creating an RPC service that listens for more than one RPC program. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: add svcauth_map_clnt_to_svc_cred_localWeston Andros Adamson
Add new funtion svcauth_map_clnt_to_svc_cred_local which maps a generic cred to a svc_cred suitable for use in nfsd. This is needed by the localio code to map nfs client creds to nfs server credentials. Following from net/sunrpc/auth_unix.c:unx_marshal() it is clear that ->fsuid and ->fsgid must be used (rather than ->uid and ->gid). In addition, these uid and gid must be translated with from_kuid_munged() so local client uses correct uid and gid when acting as local server. Jeff Layton noted: This is where the magic happens. Since we're working in kuid_t/kgid_t, we don't need to worry about further idmapping. Suggested-by: NeilBrown <neilb@suse.de> # to approximate unx_marshal() Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: remove call_allocate() BUG_ONsMike Snitzer
Remove BUG_ON if p_arglen=0 to allow RPC with void arg. Remove BUG_ON if p_replen=0 to allow RPC with void return. The former was needed for the first revision of the LOCALIO protocol which had an RPC that took a void arg: /* raw RFC 9562 UUID */ typedef u8 uuid_t<UUID_SIZE>; program NFS_LOCALIO_PROGRAM { version LOCALIO_V1 { void NULL(void) = 0; uuid_t GETUUID(void) = 1; } = 1; } = 400122; The latter is needed for the final revision of the LOCALIO protocol which has a UUID_IS_LOCAL RPC which returns a void: /* raw RFC 9562 UUID */ typedef u8 uuid_t<UUID_SIZE>; program NFS_LOCALIO_PROGRAM { version LOCALIO_V1 { void NULL(void) = 0; void UUID_IS_LOCAL(uuid_t) = 1; } = 1; } = 400122; There is really no value in triggering a BUG_ON in response to either of these previously unsupported conditions. NeilBrown would like the entire 'if (proc->p_proc != 0)' branch removed (not just the one BUG_ON that must be removed for LOCALIO's immediate needs of returning void). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add nfsd_serv_try_get and nfsd_serv_putMike Snitzer
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any caller of nfsd_serv_try_get releases their reference using nfsd_serv_put. A percpu_ref is used to implement the interlock between nfsd_destroy_serv and any caller of nfsd_serv_try_get. This interlock is needed to properly wait for the completion of client initiated localio calls to nfsd (that are _not_ in the context of nfsd). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add nfsd_file_acquire_local()NeilBrown
nfsd_file_acquire_local() can be used to look up a file by filehandle without having a struct svc_rqst. This can be used by NFS LOCALIO to allow the NFS client to bypass the NFS protocol to directly access a file provided by the NFS server which is running in the same kernel. In nfsd_file_do_acquire() care is taken to always use fh_verify() if rqstp is not NULL (as is the case for non-LOCALIO callers). Otherwise the non-LOCALIO callers will not supply the correct and required arguments to __fh_verify (e.g. gssclient isn't passed). Introduce fh_verify_local() wrapper around __fh_verify to make it clear that LOCALIO is intended caller. Also, use GC for nfsd_file returned by nfsd_file_acquire_local. GC offers performance improvements if/when a file is reopened before launderette cleans it from the filecache's LRU. Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: factor out __fh_verify to allow NULL rqstp to be passedNeilBrown
__fh_verify() offers an interface like fh_verify() but doesn't require a struct svc_rqst *, instead it also takes the specific parts as explicit required arguments. So it is safe to call __fh_verify() with a NULL rqstp, but the net, cred, and client args must not be NULL. __fh_verify() does not use SVC_NET(), nor does the functions it calls. Rather than using rqstp->rq_client pass the client and gssclient explicitly to __fh_verify and then to nfsd_set_fh_dentry(). Lastly, it should be noted that the previous commit prepared for 4 associated tracepoints to only be used if rqstp is not NULL (this is a stop-gap that should be properly fixed so localio also benefits from the utility these tracepoints provide when debugging fh_verify issues). Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Short-circuit fh_verify tracepoints for LOCALIOChuck Lever
LOCALIO will be able to call fh_verify() with a NULL rqstp. In this case, the existing trace points need to be skipped because they want to dereference the address fields in the passed-in rqstp. Temporarily make these trace points conditional to avoid a seg fault in this case. Putting the "rqstp != NULL" check in the trace points themselves makes the check more efficient. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()Chuck Lever
Currently, fh_verify() makes some daring assumptions about which version of file handle the caller wants, based on the things it can find in the passed-in rqstp. The about-to-be-introduced LOCALIO use case sometimes has no svc_rqst context, so this logic won't work in that case. Instead, examine the passed-in file handle. It's .max_size field should carry information to allow nfsd_set_fh_dentry() to initialize the file handle appropriately. The file handle used by lockd and the one created by write_filehandle never need any of the version-specific fields (which affect things like write and getattr requests and pre/post attributes). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Refactor nfsd_setuser_and_check_port()NeilBrown
There are several places where __fh_verify unconditionally dereferences rqstp to check that the connection is suitably secure. They look at rqstp->rq_xprt which is not meaningful in the target use case of "localio" NFS in which the client talks directly to the local server. Prepare these to always succeed when rqstp is NULL. Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Handle @rqstp == NULL in check_nfsd_access()NeilBrown
LOCALIO-initiated open operations are not running in an nfsd thread and thus do not have an associated svc_rqst context. Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.hMike Snitzer
Eliminates duplicate functions in various files to allow for additional callers. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errnoMike Snitzer
Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be used by fs/nfs/localio.c Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: factor out nfs_errtbl and nfs_stat_to_errnoMike Snitzer
Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and fs/nfs/nfs3xdr.c Will also be used by fs/nfsd/localio.c Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: add 'noalignwrite' option for lock-less 'lost writes' preventionDan Aloni
There are some applications that write to predefined non-overlapping file offsets from multiple clients and therefore don't need to rely on file locking. However, if these applications want non-aligned offsets and sizes they need to either use locks or risk data corruption, as the NFS client defaults to extending writes to whole pages. This commit adds a new mount option `noalignwrite`, which allows to turn that off and avoid the need of locking, as long as these applications don't overlap on offsets. Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: fix the comment of nfs_get_rootLi Lingfeng
The comment for nfs_get_root() needs to be updated as it would also be used by NFS4 as follows: @x[ nfs_get_root+1 nfs_get_tree_common+1819 nfs_get_tree+2594 vfs_get_tree+73 fc_mount+23 do_nfs4_mount+498 nfs4_try_get_tree+134 nfs_get_tree+2562 vfs_get_tree+73 path_mount+2776 do_mount+226 __se_sys_mount+343 __x64_sys_mount+106 do_syscall_64+69 entry_SYSCALL_64_after_hwframe+97 , mount.nfs4]: 1 Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSv4.2: Fix detection of "Proxying of Times" server supportRoi Azarzar
According to draft-ietf-nfsv4-delstid-07: If a server informs the client via the fattr4_open_arguments attribute that it supports OPEN_ARGS_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS and it returns a valid delegation stateid for an OPEN operation which sets the OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS flag, then it MUST query the client via a CB_GETATTR for the fattr4_time_deleg_access (see Section 5.2) attribute and fattr4_time_deleg_modify attribute (see Section 5.2). Thus, we should look that the server supports proxying of times via OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS. We want to be extra pedantic and continue to check that FATTR4_TIME_DELEG_ACCESS and FATTR4_TIME_DELEG_MODIFY are set. The server needs to expose both for the client to correctly detect "Proxying of Times" support. Signed-off-by: Roi Azarzar <roi.azarzar@vastdata.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: dcb3c20f7419 ("NFSv4: Add a capability for delegated attributes") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSv4: Fail mounts if the lease setup times outTrond Myklebust
If the server is down when the client is trying to mount, so that the calls to exchange_id or create_session fail, then we should allow the mount system call to fail rather than hang and block other mount/umount calls. Reported-by: Oleksandr Tymoshenko <ovt@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23fs: nfs: fix missing refcnt by replacing folio_set_private by ↵Zhaoyang Huang
folio_attach_private This patch is inspired by a code review of fs codes which aims at folio's extra refcnt that could introduce unwanted behavious when judging refcnt, such as[1].That is, the folio passed to mapping_evict_folio carries the refcnts from find_lock_entries, page_cache, corresponding to PTEs and folio's private if has. However, current code doesn't take the refcnt for folio's private which could have mapping_evict_folio miss the one to only PTE and lead to call filemap_release_folio wrongly. [1] long mapping_evict_folio(struct address_space *mapping, struct folio *folio) { ... //current code will misjudge here if there is one pte on the folio which is be deemed as the one as folio's private if (folio_ref_count(folio) > folio_nr_pages(folio) + folio_has_private(folio) + 1) return 0; if (!filemap_release_folio(folio, 0)) return 0; return remove_mapping(mapping, folio); } Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: Remove obsoleted declaration for nfs_read_prepareGaosheng Cui
The nfs_read_prepare() have been removed since commit a4cdda59111f ("NFS: Create a common pgio_rpc_prepare function"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23net/sunrpc: make use of the helper macro LIST_HEAD()Hongbo Li
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Here we can simplify the code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: clnt.c: Remove misleading commentSiddh Raman Pant
destroy_wait doesn't store all RPC clients. There was a list named "all_clients" above it, which got moved to struct sunrpc_net in 2012, but the comment was never removed. Fixes: 70abc49b4f4a ("SUNRPC: make SUNPRC clients list per network namespace context") Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: convert RPC_TASK_* constants to enumStephen Brennan
The RPC_TASK_* constants are defined as macros, which means that most kernel builds will not contain their definitions in the debuginfo. However, it's quite useful for debuggers to be able to view the task state constant and interpret it correctly. Conversion to an enum will ensure the constants are present in debuginfo and can be interpreted by debuggers without needing to hard-code them and track their changes. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: Fix -Wformat-truncation warningKunwu Chan
Increase size of the servername array to avoid truncated output warning. net/sunrpc/clnt.c:582:75: error:‘%s’ directive output may be truncated writing up to 107 bytes into a region of size 48 [-Werror=format-truncation=] 582 | snprintf(servername, sizeof(servername), "%s", | ^~ net/sunrpc/clnt.c:582:33: note:‘snprintf’ output between 1 and 108 bytes into a destination of size 48 582 | snprintf(servername, sizeof(servername), "%s", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 583 | sun->sun_path); Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: Remove unnecessary NULL check before kfree()Thorsten Blum
Since kfree() already checks if its argument is NULL, an additional check before calling kfree() is unnecessary and can be removed. Remove it and thus also the following Coccinelle/coccicheck warning reported by ifnullfree.cocci: WARNING: NULL check before some freeing functions is not needed Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: Annotate struct nfs_cache_array with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member array to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Increment size before adding a new struct to the array. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: simplify and guarantee owner uniqueness.NeilBrown
I have evidence of an Linux NFS client getting NFS4ERR_BAD_SEQID to a v4.0 LOCK request to a Linux server (which had fixed the problem with RELEASE_LOCKOWNER bug fixed). The LOCK request presented a "new" lock owner so there are two seq ids in the request: that for the open file, and that for the new lock. Given the context I am confident that the new lock owner was reported to have the wrong seqid. As lock owner identifiers are reused, the server must still have a lock owner active which the client thinks is no longer active. I wasn't able to determine a root-cause but the simplest fix seems to be to ensure lock owners are always unique much as open owners are (thanks to a time stamp). The easiest way to ensure uniqueness is with a 64bit counter for each server. That will never cycle (if updated once a nanosecond the last 584 years. A single NFS server would not handle open/lock requests nearly that fast, and a Linux node is unlikely to have an uptime approaching that). This patch removes the 2 ida and instead uses a per-server atomic64_t to provide uniqueness. Note that the lock owner already encodes the id as 64 bits even though it is a 32bit value. So changing to a 64bit value does not change the encoding of the lock owner. The open owner encoding is now 4 bytes larger. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: fix memory leak in error path of nfs4_do_reclaimLi Lingfeng
Commit c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()") separate out the freeing of the state owners from nfs4_purge_state_owners() and finish it outside the rcu lock. However, the error path is omitted. As a result, the state owners in "freeme" will not be released. Fix it by adding freeing in the error path. Fixes: c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23Merge tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Notable features of this release include: - Pre-requisites for automatically determining the RPC server thread count - Clean-up and preparation for supporting LOCALIO, which will be merged via the NFS client tree - Enhancements and fixes to NFSv4.2 COPY offload - A new Python-based tool for generating kernel SunRPC XDR encoding and decoding functions, added as an aid for prototyping features in protocols based on the Linux kernel's SunRPC implementation As always I am grateful to the NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (57 commits) xdrgen: Prevent reordering of encoder and decoder functions xdrgen: typedefs should use the built-in string and opaque functions xdrgen: Fix return code checking in built-in XDR decoders tools: Add xdrgen nfsd: fix delegation_blocked() to block correctly for at least 30 seconds nfsd: fix initial getattr on write delegation nfsd: untangle code in nfsd4_deleg_getattr_conflict() nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall() nfsd: return -EINVAL when namelen is 0 NFSD: Wrap async copy operations with trace points NFSD: Clean up extra whitespace in trace_nfsd_copy_done NFSD: Record the callback stateid in copy tracepoints NFSD: Display copy stateids with conventional print formatting NFSD: Limit the number of concurrent async COPY operations NFSD: Async COPY result needs to return a write verifier nfsd: avoid races with wake_up_var() nfsd: use clear_and_wake_up_bit() sunrpc: xprtrdma: Use ERR_CAST() to return NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by() nfsd: call cache_put if xdr_reserve_space returns NULL ...
2024-09-23Merge tag 'nfsd-6.12' into linux-next-with-localioAnna Schumaker
NFSD 6.12 Release Notes Notable features of this release include: - Pre-requisites for automatically determining the RPC server thread count - Clean-up and preparation for supporting LOCALIO, which will be merged via the NFS client tree - Enhancements and fixes to NFSv4.2 COPY offload - A new Python-based tool for generating kernel SunRPC XDR encoding and decoding functions, added as an aid for prototyping features in protocols based on the Linux kernel's SunRPC implementation. As always I am grateful to the NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle.
2024-09-23Merge tag 'gfs2-v6.10-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 update from Andreas Gruenbacher: - Convert the writepage address space operation to writepages (Matthew Wilcox) - A syzkaller fix (by Julian Sun) and a minor cleanup (Andreas Gruenbacher) * tag 'gfs2-v6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Remove gfs2_aspace_writepage() gfs2: Remove gfs2_jdata_writepage() gfs2: Remove __gfs2_writepage() gfs2: Add gfs2_aspace_writepages() gfs2: fix double destroy_workqueue error gfs2: Minor gfs2_glock_cb cleanup
2024-09-23Merge tag 'for-6.12-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix dangling pointer to rb-tree of defragmented inodes after cleanup - a followup fix to handle concurrent lseek on the same fd that could leak memory under some conditions - fix wrong root id reported in tree checker when verifying dref * tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix use-after-free on rbtree that tracks inodes for auto defrag btrfs: tree-checker: fix the wrong output of data backref objectid btrfs: fix race setting file private on concurrent lseek using same fd
2024-09-24kbuild: doc: replace "gcc" in external module descriptionMasahiro Yamada
Avoid "gcc" since it is not the only compiler supported by Kbuild. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24kbuild: doc: describe the -C option precisely for external module buildsMasahiro Yamada
Building external modules is typically done using this command: $ make -C <KERNEL_DIR> M=<EXTMOD_DIR> Here, <KERNEL_DIR> refers to the output directory where the kernel was built, not the kernel source directory. When the kernel is built in the source tree, there is no ambiguity, as the output directory and the source directory are the same. If the kernel was built in a separate build directory, <KERNEL_DIR> should be the kernel output directory. Otherwise, Kbuild cannot locate necessary build artifacts. This has been the method for building external modules against a pre-built kernel in a separate directory for over 20 years. [1] If you pass the kernel source directory to the -C option, you must also specify the kernel build directory using the O= option. This approach works as well, though it results in a slightly longer command: $ make -C <KERNEL_SOURCE_DIR> O=<KERNEL_BUILD_DIR> M=<EXTMOD_DIR> Some people mistakenly believe that O= should specify a build directory for external modules when used together with M=. This commit adds more clarification to Documentation/kbuild/kbuild.rst. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=e321b2ec2eb2993b3d0116e5163c78ad923e3c54 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24kbuild: doc: remove the description about shipped filesMasahiro Yamada
The use of shipped files is discouraged in the upstream kernel these days. [1] Downstream Makefiles have the freedom to use shipped files or other options to handle binaries, but this should not be advertised in the upstream document. [1]: https://lore.kernel.org/all/CAHk-=wgSEi_ZrHdqr=20xv+d6dr5G895CbOAi8ok+7-CQUN=fQ@mail.gmail.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24kbuild: doc: drop section numbering, use references in modules.rstMasahiro Yamada
Do similar to commit 1a4c1c9df72e ("docs/kbuild/makefiles: drop section numbering, use references"). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-09-24kbuild: doc: throw out the local table of contents in modules.rstMasahiro Yamada
Do similar to commit 5e8f0ba38a4d ("docs/kbuild/makefiles: throw out the local table of contents"). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-09-24kbuild: doc: remove outdated description of the limitation on -I usageMasahiro Yamada
Kbuild used to manipulate header search paths, enforcing the odd limitation of "no space after -I". Commit cdd750bfb1f7 ("kbuild: remove 'addtree' and 'flags' magic for header search paths") stopped doing that. This limitation no longer exists. Instead, you need to accurately specify the header search path. (In this case, $(src)/include) Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24kbuild: doc: remove description about grepping CONFIG optionsMasahiro Yamada
This description was added 20 years ago [1]. It does not convey any useful information except for a feeling of nostalgia. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=65e433436b5794ae056d22ddba60fe9194bba007 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24kbuild: doc: update the description about Kbuild/Makefile splitMasahiro Yamada
The phrase "In newer versions of the kernel" was added 14 years ago, by commit efdf02cf0651 ("Documentation/kbuild: major edit of modules.txt sections 1-4"). This feature is no longer new, so remove it and update the paragraph. Example 3 was written 20 years ago [1]. There is no need to note about backward compatibility with such an old build system. Remove Example 3 entirely. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=65e433436b5794ae056d22ddba60fe9194bba007 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24kbuild: remove unnecessary export of RUST_LIB_SRCMasahiro Yamada
If RUST_LIB_SRC is defined in the top-level Makefile (via an environment variable or command line), it is already exported. The only situation where it is defined but not exported is when the top-level Makefile is wrapped by another Makefile (e.g., GNUmakefile). I cannot think of any other use cases. I know some people use this tip to define custom variables. However, even in that case, you can export it directly in the wrapper Makefile. Example GNUmakefile: export RUST_LIB_SRC = /path/to/your/sysroot/lib/rustlib/src/rust/library include Makefile Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Reviewed-by: Alice Ryhl <aliceryhl@google.com>
2024-09-23Merge tag 'fs_for_v6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and isofs updates from Jan Kara: "A few small cleanups in quota and isofs" * tag 'fs_for_v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: isofs: Annotate struct SL_component with __counted_by() quota: remove unnecessary error code translation in dquot_quota_enable quota: remove redundant return at end of void function quota: remove unneeded return value of register_quota_format quota: avoid missing put_quota_format when DQUOT_SUSPENDED is passed
2024-09-23Merge tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs updates from Kent Overstreet: - rcu_pending, btree key cache rework: this solves lock contenting in the key cache, eliminating the biggest source of the srcu lock hold time warnings, and drastically improving performance on some metadata heavy workloads - on multithreaded creates we're now 3-4x faster than xfs. - We're now using an rhashtable instead of the system inode hash table; this is another significant performance improvement on multithreaded metadata workloads, eliminating more lock contention. - for_each_btree_key_in_subvolume_upto(): new helper for iterating over keys within a specific subvolume, eliminating a lot of open coded "subvolume_get_snapshot()" and also fixing another source of srcu lock time warnings, by running each loop iteration in its own transaction (as the existing for_each_btree_key() does). - More work on btree_trans locking asserts; we now assert that we don't hold btree node locks when trans->locked is false, which is important because we don't use lockdep for tracking individual btree node locks. - Some cleanups and improvements in the bset.c btree node lookup code, from Alan. - Rework of btree node pinning, which we use in backpointers fsck. The old hacky implementation, where the shrinker just skipped over nodes in the pinned range, was causing OOMs; instead we now use another shrinker with a much higher seeks number for pinned nodes. - Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue where rebalance would sometimes fall back to allocating from the full filesystem, which is not what we want when it's trying to move data to a specific target. - Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache allocations. - Idmap mounts are now supported (Hongbo Li) - Rename whiteouts are now supported (Hongbo Li) - Erasure coding can now handle devices being marked as failed, or forcibly removed. We still need the evacuate path for erasure coding, but it's getting very close to ready for people to start using. * tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits) bcachefs: return err ptr instead of null in read sb clean bcachefs: Remove duplicated include in backpointers.c bcachefs: Don't drop devices with stripe pointers bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices bcachefs: bch_fs.rw_devs_change_count bcachefs: bch2_dev_remove_stripes() bcachefs: bch2_trigger_ptr() calculates sectors even when no device bcachefs: improve error messages in bch2_ec_read_extent() bcachefs: improve error message on too few devices for ec bcachefs: improve bch2_new_stripe_to_text() bcachefs: ec_stripe_head.nr_created bcachefs: bch_stripe.disk_label bcachefs: stripe_to_mem() bcachefs: EIO errcode cleanup bcachefs: Rework btree node pinning bcachefs: split up btree cache counters for live, freeable bcachefs: btree cache counters should be size_t bcachefs: Don't count "skipped access bit" as touched in btree cache scan bcachefs: Failed devices no longer require mounting in degraded mode bcachefs: bch2_dev_rcu_noerror() ...