summaryrefslogtreecommitdiff
path: root/fs/nfsd/nfs4proc.c
AgeCommit message (Collapse)Author
2025-07-14Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous"Chuck Lever
In the past several kernel releases, we've made NFSv4.2 async copy reliable: - The Linux NFS client and server now both implement and use the NFSv4.2 OFFLOAD_STATUS operation - The Linux NFS server keeps copy stateids around longer - The Linux NFS client and server now both implement referring call lists And resilient against DoS: - The Linux NFS server limits the number of concurrent async copy operations Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Remove the cap on number of operations per NFSv4 COMPOUNDChuck Lever
This limit has always been a sanity check; in nearly all cases a large COMPOUND is a sign of a malfunctioning client. The only real limit on COMPOUND size and complexity is the size of NFSD's send and receive buffers. However, there are a few cases where a large COMPOUND is sane. For example, when a client implementation wants to walk down a long file pathname in a single round trip. A small risk is that now a client can construct a COMPOUND request that can keep a single nfsd thread busy for quite some time. Suggested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-28Merge tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "The marquee feature for this release is that the limit on the maximum rsize and wsize has been raised to 4MB. The default remains at 1MB, but risk-seeking administrators now have the ability to try larger I/O sizes with NFS clients that support them. Eventually the default setting will be increased when we have confidence that this change will not have negative impact. With v6.16, NFSD now has its own debugfs file system where we can add experimental features and make them available outside of our development community without impacting production deployments. The first experimental setting added is one that makes all NFS READ operations use vfs_iter_read() instead of the NFSD splice actor. The plan is to eventually retire the splice actor, as that will enable a number of new capabilities such as the use of struct bio_vec from the top to the bottom of the NFSD stack. Jeff Layton contributed a number of observability improvements. The use of dprintk() in a number of high-traffic code paths has been replaced with static trace points. This release sees the continuation of efforts to harden the NFSv4.2 COPY operation. Soon, the restriction on async COPY operations can be lifted. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.16 development cycle" * tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits) xdrgen: Fix code generated for counted arrays SUNRPC: Bump the maximum payload size for the server NFSD: Add a "default" block size NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro NFSD: Remove NFSD_BUFSIZE sunrpc: Remove the RPCSVC_MAXPAGES macro svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages sunrpc: Adjust size of socket's receive page array dynamically SUNRPC: Remove svc_rqst :: rq_vec SUNRPC: Remove svc_fill_write_vector() NFSD: Use rqstp->rq_bvec in nfsd_iter_write() SUNRPC: Export xdr_buf_to_bvec() NFSD: De-duplicate the svc_fill_write_vector() call sites NFSD: Use rqstp->rq_bvec in nfsd_iter_read() sunrpc: Replace the rq_bvec array with dynamically-allocated memory sunrpc: Replace the rq_pages array with dynamically-allocated memory sunrpc: Remove backchannel check in svc_init_buffer() sunrpc: Add a helper to derive maxpages from sv_max_mesg svcrdma: Reduce the number of rdma_rw contexts per-QP ...
2025-05-15NFSD: Remove NFSD_BUFSIZEChuck Lever
Clean up: The documenting comment for NFSD_BUFSIZE is quite stale. NFSD_BUFSIZE is used only for NFSv4 Reply these days; never for NFSv2 or v3, and never for RPC Calls. Even so, the byte count estimate does not include the size of the NFSv4 COMPOUND Reply HEADER or the RPC auth flavor. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: De-duplicate the svc_fill_write_vector() call sitesChuck Lever
All three call sites do the same thing. I'm struggling with this a bit, however. struct xdr_buf is an XDR layer object and unmarshaling a WRITE payload is clearly a task intended to be done by the proc and xdr functions, not by VFS. This feels vaguely like a layering violation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add tracepoint for getattr and statfs eventsJeff Layton
There isn't a common helper for getattrs, so add these into the protocol-specific helpers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add tracepoint to nfsd_readdirJeff Layton
Observe the start of NFS READDIR operations. The NFS READDIR's count argument can be interesting when tuning a client's readdir behavior. However, the count argument is not passed to nfsd_readdir(). To properly capture the count argument, this tracepoint must appear in each proc function before the nfsd_readdir() call. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: nfsd4_spo_must_allow() must check this is a v4 compound requestNeilBrown
If the request being processed is not a v4 compound request, then examining the cstate can have undefined results. This patch adds a check that the rpc procedure being executed (rq_procinfo) is the NFSPROC4_COMPOUND procedure. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove redundant WARN_ON_ONCE in nfsd4_writeGuoqing Jiang
It can be removed since svc_fill_write_vector already has the same WARN_ON_ONCE. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Record each NFSv4 call's session slot indexChuck Lever
Help the client resolve the race between the reply to an asynchronous COPY reply and the associated CB_OFFLOAD callback by planting the session, slot, and sequence number of the COPY in the CB_SEQUENCE contained in the CB_OFFLOAD COMPOUND. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Shorten CB_OFFLOAD response to NFS4ERR_DELAYChuck Lever
Try not to prolong the wait for completion of a COPY or COPY_NOTIFY operation. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: OFFLOAD_CANCEL should mark an async COPY as completedChuck Lever
Update the status of an async COPY operation when it has been stopped. OFFLOAD_STATUS needs to indicate that the COPY is no longer running. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-07nfsd: Use lookup_one() rather than lookup_one_len()NeilBrown
nfsd uses some VFS interfaces (such as vfs_mkdir) which take an explicit mnt_idmap, and it passes &nop_mnt_idmap as nfsd doesn't yet support idmapped mounts. It also uses the lookup_one_len() family of functions which implicitly use &nop_mnt_idmap. This mixture of implicit and explicit could be confusing. When we eventually update nfsd to support idmap mounts it would be best if all places which need an idmap determined from the mount point were similar and easily found. So this patch changes nfsd to use lookup_one(), lookup_one_unlocked(), and lookup_one_positive_unlocked(), passing &nop_mnt_idmap. This has the benefit of removing some uses of the lookup_one_len functions where permission checking is actually needed. Many callers don't care about permission checking and using these function only where permission checking is needed is a valuable simplification. This change requires passing the name in a qstr. Currently this is a little clumsy, but if nfsd is changed to use qstr more broadly it will result in a net improvement. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-3-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10nfsd: prevent callback tasks running concurrentlyJeff Layton
The nfsd4_callback workqueue jobs exist to queue backchannel RPCs to rpciod. Because they run in different workqueue contexts, the rpc_task can run concurrently with the workqueue job itself, should it become requeued. This is problematic as there is no locking when accessing the fields in the nfsd4_callback. Add a new unsigned long to nfsd4_callback and declare a new NFSD4_CALLBACK_RUNNING flag to be set in it. When attempting to run a workqueue job, do a test_and_set_bit() on that flag first, and don't queue the workqueue job if it returns true. Clear NFSD4_CALLBACK_RUNNING in nfsd41_destroy_cb(). This also gives us a more reliable mechanism for handling queueing failures in codepaths where we have to take references under spinlocks. We can now do the test_and_set_bit on NFSD4_CALLBACK_RUNNING first, and only take references to the objects if that returns false. Most of the nfsd4_run_cb() callers are converted to use this new flag or the nfsd4_try_run_cb() wrapper. The main exception is the callback channel probe, which has its own synchronization. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: handle delegated timestamps in SETATTRJeff Layton
Allow SETATTR to handle delegated timestamps. This patch assumes that only the delegation holder has the ability to set the timestamps in this way, so we allow this only if the SETATTR stateid refers to a *_ATTRS_DELEG delegation. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-12-17NFSD: fix management of pending async copiesOlga Kornievskaia
Currently the pending_async_copies count is decremented just before a struct nfsd4_copy is destroyed. After commit aa0ebd21df9c ("NFSD: Add nfsd4_copy time-to-live") nfsd4_copy structures sticks around for 10 lease periods after the COPY itself has completed, the pending_async_copies count stays high for a long time. This causes NFSD to avoid the use of background copy even though the actual background copy workload might no longer be running. In this patch, decrement pending_async_copies once async copy thread is done processing the copy work. Fixes: aa0ebd21df9c ("NFSD: Add nfsd4_copy time-to-live") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Add nfsd4_copy time-to-liveChuck Lever
Keep async copy state alive for a few lease cycles after the copy completes so that OFFLOAD_STATUS returns something meaningful. This means that NFSD's client shutdown processing needs to purge any of this state that happens to be waiting to die. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Add a laundromat reaper for async copy stateChuck Lever
RFC 7862 Section 4.8 states: > A copy offload stateid will be valid until either (A) the client > or server restarts or (B) the client returns the resource by > issuing an OFFLOAD_CANCEL operation or the client replies to a > CB_OFFLOAD operation. Instead of releasing async copy state when the CB_OFFLOAD callback completes, now let it live until the next laundromat run after the callback completes. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operationsChuck Lever
Currently __destroy_client() consults the nfs4_client's async_copies list to determine whether there are ongoing async COPY operations. However, NFSD now keeps copy state in that list even when the async copy has completed, to enable OFFLOAD_STATUS to find the COPY results for a while after the COPY has completed. DESTROY_CLIENTID should not be blocked if the client's async_copies list contains state for only completed copy operations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOADChuck Lever
RFC 7862 permits callback services to respond to CB_OFFLOAD with NFS4ERR_DELAY. Currently NFSD drops the CB_OFFLOAD in that case. To improve the reliability of COPY offload, NFSD should rather send another CB_OFFLOAD completion notification. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Free async copy information in nfsd4_cb_offload_release()Chuck Lever
RFC 7862 Section 4.8 states: > A copy offload stateid will be valid until either (A) the client > or server restarts or (B) the client returns the resource by > issuing an OFFLOAD_CANCEL operation or the client replies to a > CB_OFFLOAD operation. Currently, NFSD purges the metadata for an async COPY operation as soon as the CB_OFFLOAD callback has been sent. It does not wait even for the client's CB_OFFLOAD response, as the paragraph above suggests that it should. This makes the OFFLOAD_STATUS operation ineffective during the window between the completion of an asynchronous COPY and the server's receipt of the corresponding CB_OFFLOAD response. This is important if, for example, the client responds with NFS4ERR_DELAY, or the transport is lost before the server receives the response. A client might use OFFLOAD_STATUS to query the server about the still pending asynchronous COPY, but NFSD will respond to OFFLOAD_STATUS as if it had never heard of the presented copy stateid. This patch starts to address this issue by extending the lifetime of struct nfsd4_copy at least until the server has seen the client's CB_OFFLOAD response, or the CB_OFFLOAD has timed out. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Fix nfsd4_shutdown_copy()Chuck Lever
nfsd4_shutdown_copy() is just this: while ((copy = nfsd4_get_copy(clp)) != NULL) nfsd4_stop_copy(copy); nfsd4_get_copy() bumps @copy's reference count, preventing nfsd4_stop_copy() from releasing @copy. A while loop like this usually works by removing the first element of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy() alters the async_copies list. Best I can tell, then, is that nfsd4_shutdown_copy() continues to loop until other threads manage to remove all the items from this list. The spinning loop blocks shutdown until these items are gone. Possibly the reason we haven't seen this issue in the field is because client_has_state() prevents __destroy_client() from calling nfsd4_shutdown_copy() if there are any items on this list. In a subsequent patch I plan to remove that restriction. Fixes: e0639dc5805a ("NFSD introduce async copy feature") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Add a tracepoint to record canceled async COPY operationsChuck Lever
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOTPali Rohár
Currently NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT do not bypass only GSS, but bypass any method. This is a problem specially for NFS3 AUTH_NULL-only exports. The purpose of NFSD_MAY_BYPASS_GSS_ON_ROOT is described in RFC 2623, section 2.3.2, to allow mounting NFS2/3 GSS-only export without authentication. So few procedures which do not expose security risk used during mount time can be called also with AUTH_NONE or AUTH_SYS, to allow client mount operation to finish successfully. The problem with current implementation is that for AUTH_NULL-only exports, the NFSD_MAY_BYPASS_GSS_ON_ROOT is active also for NFS3 AUTH_UNIX mount attempts which confuse NFS3 clients, and make them think that AUTH_UNIX is enabled and is working. Linux NFS3 client never switches from AUTH_UNIX to AUTH_NONE on active mount, which makes the mount inaccessible. Fix the NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT implementation and really allow to bypass only exports which have enabled some real authentication (GSS, TLS, or any other). The result would be: For AUTH_NULL-only export if client attempts to do mount with AUTH_UNIX flavor then it will receive access errors, which instruct client that AUTH_UNIX flavor is not usable and will either try other auth flavor (AUTH_NULL if enabled) or fails mount procedure. Similarly if client attempt to do mount with AUTH_NULL flavor and only AUTH_UNIX flavor is enabled then the client will receive access error. This should fix problems with AUTH_NULL-only or AUTH_UNIX-only exports if client attempts to mount it with other auth flavor (e.g. with AUTH_NULL for AUTH_UNIX-only export, or with AUTH_UNIX for AUTH_NULL-only export). Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18nfsd: Fill NFSv4.1 server implementation fields in OP_EXCHANGE_ID responsePali Rohár
NFSv4.1 OP_EXCHANGE_ID response from server may contain server implementation details (domain, name and build time) in optional nfs_impl_id4 field. Currently nfsd does not fill this field. Send these information in NFSv4.1 OP_EXCHANGE_ID response. Fill them with the same values as what is Linux NFSv4.1 client doing. Domain is hardcoded to "kernel.org", name is composed in the same way as "uname -srvm" output and build time is hardcoded to zeros. NFSv4.1 client and server implementation fields are useful for statistic purposes or for identifying type of clients and servers. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18nfsd: new tracepoint for after op_func in compound processingJeff Layton
Turn nfsd_compound_encode_err tracepoint into a class and add a new nfsd_compound_op_err tracepoint. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-10-30NFSD: Never decrement pending_async_copies on errorChuck Lever
The error flow in nfsd4_copy() calls cleanup_async_copy(), which already decrements nn->pending_async_copies. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-10-29NFSD: Initialize struct nfsd4_copy earlierChuck Lever
Ensure the refcount and async_copies fields are initialized early. cleanup_async_copy() will reference these fields if an error occurs in nfsd4_copy(). If they are not correctly initialized, at the very least, a refcount underflow occurs. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Wrap async copy operations with trace pointsChuck Lever
Add an nfsd_copy_async_done to record the timestamp, the final status code, and the callback stateid of an async copy. Rename the nfsd_copy_do_async tracepoint to match that naming convention to make it easier to enable both of these with a single glob. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Limit the number of concurrent async COPY operationsChuck Lever
Nothing appears to limit the number of concurrent async COPY operations that clients can start. In addition, AFAICT each async COPY can copy an unlimited number of 4MB chunks, so can run for a long time. Thus IMO async COPY can become a DoS vector. Add a restriction mechanism that bounds the number of concurrent background COPY operations. Start simple and try to be fair -- this patch implements a per-namespace limit. An async COPY request that occurs while this limit is exceeded gets NFS4ERR_DELAY. The requesting client can choose to send the request again after a delay or fall back to a traditional read/write style copy. If there is need to make the mechanism more sophisticated, we can visit that in future patches. Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Async COPY result needs to return a write verifierChuck Lever
Currently, when NFSD handles an asynchronous COPY, it returns a zero write verifier, relying on the subsequent CB_OFFLOAD callback to pass the write verifier and a stable_how4 value to the client. However, if the CB_OFFLOAD never arrives at the client (for example, if a network partition occurs just as the server sends the CB_OFFLOAD operation), the client will never receive this verifier. Thus, if the client sends a follow-up COMMIT, there is no way for the client to assess the COMMIT result. The usual recovery for a missing CB_OFFLOAD is for the client to send an OFFLOAD_STATUS operation, but that operation does not carry a write verifier in its result. Neither does it carry a stable_how4 value, so the client /must/ send a COMMIT in this case -- which will always fail because currently there's still no write verifier in the COPY result. Thus the server needs to return a normal write verifier in its COPY result even if the COPY operation is to be performed asynchronously. If the server recognizes the callback stateid in subsequent OFFLOAD_STATUS operations, then obviously it has not restarted, and the write verifier the client received in the COPY result is still valid and can be used to assess a COMMIT of the copied data, if one is needed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: track the main opcode for callbacksJeff Layton
Keep track of the "main" opcode for the callback, and display it in the tracepoint. This makes it simpler to discern what's happening when there is more than one callback in flight. The one special case is the CB_NULL RPC. That's not a CB_COMPOUND opcode, so designate the value 0 for that. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: move error choice for incorrect object types to version-specific code.NeilBrown
If an NFS operation expects a particular sort of object (file, dir, link, etc) but gets a file handle for a different sort of object, it must return an error. The actual error varies among NFS versions in non-trivial ways. For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only, INVAL is suitable. For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK was found when not expected. This take precedence over NOTDIR. For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in preference to EINVAL when none of the specific error codes apply. When nfsd_mode_check() finds a symlink where it expected a directory it needs to return an error code that can be converted to NOTDIR for v2 or v3 but will be SYMLINK for v4. It must be different from the error code returns when it finds a symlink but expects a regular file - that must be converted to EINVAL or SYMLINK. So we introduce an internal error code nfserr_symlink_not_dir which each version converts as appropriate. nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is only used by NFSv4 and only for OPEN. NFSERR_INVAL is never a suitable error if the object is the wrong time. For v4.0 we use nfserr_symlink for non-dirs even if not a symlink. For v4.1 we have nfserr_wrong_type. We handle this difference in-place in nfsd_check_obj_isreg() as there is nothing to be gained by delaying the choice to nfsd4_map_status(). As a result of these changes, nfsd_mode_check() doesn't need an rqstp arg any more. Note that NFSv4 operations are actually performed in the xdr code(!!!) so to the only place that we can map the status code successfully is in nfsd4_encode_operation(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: Don't pass all of rqst into rqst_exp_find()NeilBrown
Rather than passing the whole rqst, pass the pieces that are actually needed. This makes the inputs to rqst_exp_find() more obvious. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: don't assume copy notify when preprocessing the stateidSagi Grimberg
Move the stateid handling to nfsd4_copy_notify. If nfs4_preprocess_stateid_op did not produce an output stateid, error out. Copy notify specifically does not permit the use of special stateids, so enforce that outside generic stateid pre-processing. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08NFSD: Support write delegations in LAYOUTGETChuck Lever
I noticed LAYOUTGET(LAYOUTIOMODE4_RW) returning NFS4ERR_ACCESS unexpectedly. The NFS client had created a file with mode 0444, and the server had returned a write delegation on the OPEN(CREATE). The client was requesting a RW layout using the write delegation stateid so that it could flush file modifications. Creating a read-only file does not seem to be problematic for NFSv4.1 without pNFS, so I began looking at NFSD's implementation of LAYOUTGET. The failure was because fh_verify() was doing a permission check as part of verifying the FH presented during the LAYOUTGET. It uses the loga_iomode value to specify the @accmode argument to fh_verify(). fh_verify(MAY_WRITE) on a file whose mode is 0444 fails with -EACCES. To permit LAYOUT* operations in this case, add OWNER_OVERRIDE when checking the access permission of the incoming file handle for LAYOUTGET and LAYOUTCOMMIT. Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # v6.6+ Message-Id: 4E9C0D74-A06D-4DC3-A48A-73034DC40395@oracle.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-09NFSD: Force all NFSv4.2 COPY requests to be synchronousChuck Lever
We've discovered that delivering a CB_OFFLOAD operation can be unreliable in some pretty unremarkable situations. Examples include: - The server dropped the connection because it lost a forechannel NFSv4 request and wishes to force the client to retransmit - The GSS sequence number window under-flowed - A network partition occurred When that happens, all pending callback operations, including CB_OFFLOAD, are lost. NFSD does not retransmit them. Moreover, the Linux NFS client does not yet support sending an OFFLOAD_STATUS operation to probe whether an asynchronous COPY operation has finished. Thus, on Linux NFS clients, when a CB_OFFLOAD is lost, asynchronous COPY can hang until manually interrupted. I've tried a couple of remedies, but so far the side-effects are worse than the disease and they have had to be reverted. So temporarily force COPY operations to be synchronous so that the use of CB_OFFLOAD is avoided entirely. This is a fix that can easily be backported to LTS kernels. I am working on client patches that introduce an implementation of OFFLOAD_STATUS. Note that NFSD arbitrarily limits the size of a copy_file_range to 4MB to avoid indefinitely blocking an nfsd thread. A short COPY result is returned in that case, and the client can present a fresh COPY request for the remainder. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: Add COPY status code to OFFLOAD_STATUS responseChuck Lever
Clients that send an OFFLOAD_STATUS might want to distinguish between an async COPY operation that is still running, has completed successfully, or that has failed. The intention of this patch is to make NFSD behave like this: * Copy still running: OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied so far, and an empty osr_status array * Copy completed successfully: OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied, and an osr_status of NFS4_OK * Copy failed: OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied, and an osr_status other than NFS4_OK * Copy operation lost, canceled, or otherwise unrecognized: OFFLOAD_STATUS returns NFS4ERR_BAD_STATEID NB: Though RFC 7862 Section 11.2 lists a small set of NFS status codes that are valid for OFFLOAD_STATUS, there do not seem to be any explicit spec limits on the status codes that may be returned in the osr_status field. At this time we have no unit tests for COPY and its brethren, as pynfs does not yet implement support for NFSv4.2. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: Record status of async copy operation in struct nfsd4_copyChuck Lever
After a client has started an asynchronous COPY operation, a subsequent OFFLOAD_STATUS operation will need to report the status code once that COPY operation has completed. The recorded status record will be used by a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06nfsd: trivial GET_DIR_DELEGATION supportJeff Layton
This adds basic infrastructure for handing GET_DIR_DELEGATION calls from clients, including the decoders and encoders. For now, it always just returns NFS4_OK + GDD4_UNAVAIL. Eventually clients may start sending this operation, and it's better if we can return GDD4_UNAVAIL instead of having to abort the whole compound. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()Trond Myklebust
The main point of the guarded SETATTR is to prevent races with other WRITE and SETATTR calls. That requires that the check of the guard time against the inode ctime be done after taking the inode lock. Furthermore, we need to take into account the 32-bit nature of timestamps in NFSv3, and the possibility that files may change at a faster rate than once a second. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: Fix a regression in nfsd_setattr()Trond Myklebust
Commit bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") broke the NFSv3 pre/post op attributes behaviour when doing a SETATTR rpc call by stripping out the calls to fh_fill_pre_attrs() and fh_fill_post_attrs(). Fixes: bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Message-ID: <20240216012451.22725-1-trondmy@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: make all of the nfsd stats per-network namespaceJosef Bacik
We have a global set of counters that we modify for all of the nfsd operations, but now that we're exposing these stats across all network namespaces we need to make the stats also be per-network namespace. We already have some caching stats that are per-network namespace, so move these definitions into the same counter and then adjust all the helpers and users of these stats to provide the appropriate nfsd_net struct so that the stats are maintained for the per-network namespace objects. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07NFSD: Modify NFSv4 to use nfsd_read_splice_ok()Chuck Lever
Avoid the use of an atomic bitop, and prepare for adding a run-time switch for using splice reads. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-30Merge tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "This release completes the SunRPC thread scheduler work that was begun in v6.6. The scheduler can now find an svc thread to wake in constant time and without a list walk. Thanks again to Neil Brown for this overhaul. Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD control plane. The long-term plan is to provide the same functionality as found in /proc/fs/nfsd, plus some interesting additions, and then migrate the NFSD user space utilities to netlink. A long series to overhaul NFSD's NFSv4 operation encoding was applied in this release. The goals are to bring this family of encoding functions in line with the matching NFSv4 decoding functions and with the NFSv2 and NFSv3 XDR functions, preparing the way for better memory safety and maintainability. A further improvement to NFSD's write delegation support was contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the server to retrieve cached size and mtime data from clients holding write delegations. If the server can retrieve this information, it does not have to recall the delegation in some cases. The usual panoply of bug fixes and minor improvements round out this release. As always I am grateful to all contributors, reviewers, and testers" * tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (127 commits) svcrdma: Fix tracepoint printk format svcrdma: Drop connection after an RDMA Read error NFSD: clean up alloc_init_deleg() NFSD: Fix frame size warning in svc_export_parse() NFSD: Rewrite synopsis of nfsd_percpu_counters_init() nfsd: Clean up errors in nfs3proc.c nfsd: Clean up errors in nfs4state.c NFSD: Clean up errors in stats.c NFSD: simplify error paths in nfsd_svc() NFSD: Clean up nfsd4_encode_seek() NFSD: Clean up nfsd4_encode_offset_status() NFSD: Clean up nfsd4_encode_copy_notify() NFSD: Clean up nfsd4_encode_copy() NFSD: Clean up nfsd4_encode_test_stateid() NFSD: Clean up nfsd4_encode_exchange_id() NFSD: Clean up nfsd4_do_encode_secinfo() NFSD: Clean up nfsd4_encode_access() NFSD: Clean up nfsd4_encode_readdir() NFSD: Clean up nfsd4_encode_entry4() NFSD: Add an nfsd4_encode_nfs_cookie4() helper ...
2023-10-18nfsd: convert to new timestamp accessorsJeff Layton
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-50-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-16NFSD: Clean up nfsd4_encode_copy_notify()Chuck Lever
Replace open-coded encoding logic with the use of conventional XDR utility functions. Note that if we replace the cpn_sec and cpn_nsec fields with a single struct timespec64 field, the encoder can use nfsd4_encode_nfstime4(), as that is the data type specified by the XDR spec. NFS4ERR_INVAL seems inappropriate if the encoder doesn't support encoding the response. Instead use NFS4ERR_SERVERFAULT, since this condition is a software bug on the server. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16NFSD: Remove a layering violation when encoding lock_deniedChuck Lever
An XDR encoder is responsible for marshaling results, not releasing memory that was allocated by the upper layer. We have .op_release for that purpose. Move the release of the ld_owner.data string to op_release functions for LOCK and LOCKT. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16NFSD: Clean up nfsd4_encode_layoutcommit()Chuck Lever
Adopt the use of conventional XDR utility functions. Restructure the encoder to better align with the XDR definition of the result. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16SUNRPC: change how svc threads are asked to exit.NeilBrown
svc threads are currently stopped using kthread_stop(). This requires identifying a specific thread. However we don't care which thread stops, just as long as one does. So instead, set a flag in the svc_pool to say that a thread needs to die, and have each thread check this flag instead of calling kthread_should_stop(). The first thread to find and clear this flag then moves towards exiting. This removes an explicit dependency on sp_all_threads which will make a future patch simpler. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>