Age | Commit message (Collapse) | Author |
|
->buf_free is called nearly once per RPC. Only rarely does
xprt_rdma_free() have to do anything, thus tracing every one of
these calls seems unnecessary. Instead, just throw a trace event
when that one occasional RPC still has MRs that need to be
released.
xprt_rdma_free() is further micro-optimized to reduce the amount of
work done in the common case.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Tie each MR event to the requesting rpc_task to make it easier to
follow MR ownership and control flow.
MR unmapping and recycling can happen in the background, after an
MR's mr_req field is stale, so set up a separate tracepoint class
for those events.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
- Rename it following the "_err" suffix convention
- Replace display of kernel memory addresses
- Tie MR exhaustion to a peer IP address, similar to the createmrs
tracepoint
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
- Replace displayed kernel memory addresses
- Tie the XID and event with the peer's IP address
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Replace unnecessary display of kernel memory addresses.
Also, there are no longer any trace_xprtrdma_defer_cmp() call sites.
And remove the trace_xprtrdma_leaked_rep() tracepoint because there
doesn't seem to be an overwhelming need to have a tracepoint for
catching a software bug that has long since been fixed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
- Rename the tracepoints with the "_err" suffix to indicate these
are rare error events
- Replace display of kernel memory addresses
- Tie the XID and error to a connection IP address instead
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
- Replace the display of kernel memory addresses
- Add "_err" to the end of its name to indicate that it's a
tracepoint that fires only when there's an error
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Set up a completion ID in each rpcrdma_frwr. The ID is used to match
an incoming completion to a transport (CQ) and other MR-related
activity.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Set up a completion ID in each rpcrdma_req. The ID is used to match
an incoming Send completion to a transport and to a previous
ib_post_send().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Set up a completion ID in each rpcrdma_rep. The ID is used to match
an incoming Receive completion to a transport and to a previous
ib_post_recv().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull nfsd updates from Bruce Fields:
"The one new feature this time, from Anna Schumaker, is READ_PLUS,
which has the same arguments as READ but allows the server to return
an array of data and hole extents.
Otherwise it's a lot of cleanup and bugfixes"
* tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits)
NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()
sunrpc: raise kernel RPC channel buffer size
svcrdma: fix bounce buffers for unaligned offsets and multiple pages
nfsd: remove unneeded break
net/sunrpc: Fix return value for sysctl sunrpc.transports
NFSD: Encode a full READ_PLUS reply
NFSD: Return both a hole and a data segment
NFSD: Add READ_PLUS hole segment encoding
NFSD: Add READ_PLUS data support
NFSD: Hoist status code encoding into XDR encoder functions
NFSD: Map nfserr_wrongsec outside of nfsd_dispatch
NFSD: Remove the RETURN_STATUS() macro
NFSD: Call NFSv2 encoders on error returns
NFSD: Fix .pc_release method for NFSv2
NFSD: Remove vestigial typedefs
NFSD: Refactor nfsd_dispatch() error paths
NFSD: Clean up nfsd_dispatch() variables
NFSD: Clean up stale comments in nfsd_dispatch()
NFSD: Clean up switch statement in nfsd_dispatch()
...
|
|
This was discovered using O_DIRECT at the client side, with small
unaligned file offsets or IOs that span multiple file pages.
Fixes: e248aa7be86 ("svcrdma: Remove max_sge check at connect time")
Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Drop duplicate words in net/sunrpc/.
Also fix "Anyone" to be "Any one".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
sg_init_table zeroes its first argument, so the allocation of that argument
doesn't have to.
the semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,n,flags;
@@
x =
- kcalloc
+ kmalloc_array
(n,sizeof(*x),flags)
...
sg_init_table(x,n)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
These instruments don't appear to add any substantial value.
We already have this at the termination of each RPC:
iozone-2617 [002] 975.713126: rpc_stats_latency: task:418@5 xid=0x260eab5d nfsv3 LOOKUP backlog=15 rtt=32 execute=58
iozone-2617 [002] 975.713127: xprt_release_cong: task:418@5 snd_task:4294967295 cong=256 cwnd=16384
iozone-2617 [002] 975.713127: xprt_put_cong: task:418@5 snd_task:4294967295 cong=0 cwnd=16384
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Introduce a tracepoint in call_allocate that reports the exact
sizes in the RPC buffer allocation request and the status of the
result. This helps catch problems with XDR buffer provisioning,
and replaces transport-specific debugging instrumentation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFS/RDMA resource leak
- Fix the error handling during delegation recall
- NFSv4.0 needs to return the delegation on a zero-stateid SETATTR
- Stop printk reading past end of string
* tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: stop printk reading past end of string
NFS: Zero-stateid SETATTR should first return delegation
NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall
xprtrdma: Release in-flight MRs on disconnect
|
|
Dan Aloni reports that when a server disconnects abruptly, a few
memory regions are left DMA mapped. Over time this leak could pin
enough I/O resources to slow or even deadlock an NFS/RDMA client.
I found that if a transport disconnects before pending Send and
FastReg WRs can be posted, the to-be-registered MRs are stranded on
the req's rl_registered list and never released -- since they
weren't posted, there's no Send completion to DMA unmap them.
Reported-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
|
Pull NFS server updates from Chuck Lever:
"Highlights:
- Support for user extended attributes on NFS (RFC 8276)
- Further reduce unnecessary NFSv4 delegation recalls
Notable fixes:
- Fix recent krb5p regression
- Address a few resource leaks and a rare NULL dereference
Other:
- De-duplicate RPC/RDMA error handling and other utility functions
- Replace storage and display of kernel memory addresses by tracepoints"
* tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits)
svcrdma: CM event handler clean up
svcrdma: Remove transport reference counting
svcrdma: Fix another Receive buffer leak
SUNRPC: Refresh the show_rqstp_flags() macro
nfsd: netns.h: delete a duplicated word
SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")
nfsd: avoid a NULL dereference in __cld_pipe_upcall()
nfsd4: a client's own opens needn't prevent delegations
nfsd: Use seq_putc() in two functions
svcrdma: Display chunk completion ID when posting a rw_ctxt
svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
svcrdma: Introduce Send completion IDs
svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
svcrdma: Introduce Receive completion IDs
svcrdma: Introduce infrastructure to support completion IDs
svcrdma: Add common XDR encoders for RDMA and Read segments
svcrdma: Add common XDR decoders for RDMA and Read segments
SUNRPC: Add helpers for decoding list discriminators symbolically
svcrdma: Remove declarations for functions long removed
svcrdma: Clean up trace_svcrdma_send_failed() tracepoint
...
|
|
Now that there's a core tracepoint that reports these events, there's
no need to maintain dprintk() call sites in each arm of the switch
statements.
We also refresh the documenting comments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Jason tells me that a ULP cannot rely on getting an ESTABLISHED
and DISCONNECTED event pair for each connection, so transport
reference counting in the CM event handler will never be reliable.
Now that we have ib_drain_qp(), svcrdma should no longer need to
hold transport references while Sends and Receives are posted. So
remove the get/put call sites in the CM event handlers.
This eliminates a significant source of locked memory bus traffic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
During a connection tear down, the Receive queue is flushed before
the device resources are freed. Typically, all the Receives flush
with IB_WR_FLUSH_ERR.
However, any pending successful Receives flush with IB_WR_SUCCESS,
and the server automatically posts a fresh Receive to replace the
completing one. This happens even after the connection has closed
and the RQ is drained. Receives that are posted after the RQ is
drained appear never to complete, causing a Receive resource leak.
The leaked Receive buffer is left DMA-mapped.
To prevent these late-posted recv_ctxt's from leaking, block new
Receive posting after XPT_CLOSE is set.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Currently the header size calculations are using an assignment
operator instead of a += operator when accumulating the header
size leading to incorrect sizes. Fix this by using the correct
operator.
Addresses-Coverity: ("Unused value")
Fixes: 302d3deb2068 ("xprtrdma: Prevent inline overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Re-use the post_rw tracepoint (safely) to trace cc_info lifetime
events, including completion IDs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
First, refactor: Dereference the svc_rdma_send_ctxt inside
svc_rdma_send() instead of at every call site.
Then, it can be passed into trace_svcrdma_post_send() to get the
proper completion ID.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Set up a completion ID in each svc_rdma_send_ctxt. The ID is used
to match an incoming Send completion to a transport and to a
previous ib_post_send().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When recording a trace event in the Receive path, tie decoding
results and errors to an incoming Receive completion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Set up a completion ID in each svc_rdma_recv_ctxt. The ID is used
to match an incoming Receive completion to a transport and to a
previous ib_post_recv().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: De-duplicate some code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: De-duplicate some code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Use these helpers in a few spots to demonstrate their use.
The remaining open-coded discriminator checks in rpcrdma will be
addressed in subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
- Use the _err naming convention instead
- Remove display of kernel memory address of the controlling xprt
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Final refactor: Replace internals of svc_rdma_send_error() with a
simple call to svc_rdma_send_error_msg().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Prepare for svc_rdma_send_error_msg() to be invoked from another
source file.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Like svc_rdma_send_error(), have svc_rdma_send_error_msg() handle
any error conditions internally, rather than duplicating that
recovery logic at every call site.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The common "send RDMA_ERR" function should be in svc_rdma_sendto.c,
since that is where the other Send-related functions are located.
So from here, I will beef up svc_rdma_send_error_msg() and deprecate
svc_rdma_send_error().
A generic svc_rdma_send_error_msg() will need to handle both
ERR_CHUNK and ERR_VERS. Copy that logic from svc_rdma_send_error()
to svc_rdma_send_error_msg().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Another step towards making svc_rdma_send_error_msg() and
svc_rdma_send_error() similar enough to eliminate one of them.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit 4757d90b15d8 ("svcrdma: Report Write/Reply chunk overruns")
made an effort to preserve I/O pages until RDMA Write completion.
In a subsequent patch, I intend to de-duplicate the two functions
that send ERR_CHUNK responses. Pull the save_io_pages() call out of
svc_rdma_send_error_msg() to make it more like
svc_rdma_send_error().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path") moved the
page saver logic so that it gets executed event when an error occurs.
In that case, the I/O is never posted, and those pages are then
leaked. Errors in this path, however, are quite rare.
Fixes: 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Ensure that the connect worker is awoken if an attempt to establish
a connection is unsuccessful. Otherwise the worker waits forever
and the transport workload hangs.
Connect errors should not attempt to destroy the ep, since the
connect worker continues to use it after the handler runs, so these
errors are now handled independently of DISCONNECTED events.
Reported-by: Dan Aloni <dan@kernelim.com>
Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
I noticed that when rpcrdma_xprt_connect() returns -ENOMEM,
instead of retrying the connect, the RPC client kills the
RPC task that requested the connection. We want a retry
here.
Fixes: cb586decbb88 ("xprtrdma: Make sendctx queue lifetime the same as connection lifetime")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Both Dan and I have observed two processes invoking
rpcrdma_xprt_disconnect() concurrently. In my case:
1. The connect worker invokes rpcrdma_xprt_disconnect(), which
drains the QP and waits for the final completion
2. This causes the newly posted Receive to flush and invoke
xprt_force_disconnect()
3. xprt_force_disconnect() sets CLOSE_WAIT and wakes up the RPC task
that is holding the transport lock
4. The RPC task invokes xprt_connect(), which calls ->ops->close
5. xprt_rdma_close() invokes rpcrdma_xprt_disconnect(), which tries
to destroy the QP.
Deadlock.
To prevent xprt_force_disconnect() from waking anything, handle the
clean up after a failed connection attempt in the xprt's sndtask.
The retry loop is removed from rpcrdma_xprt_connect() to ensure
that the newly allocated ep and id are properly released before
a REJECTED connection attempt can be retried.
Reported-by: Dan Aloni <dan@kernelim.com>
Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In the error paths, there's no need to call kfree(ep) after calling
rpcrdma_ep_put(ep).
Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The RPC client currently doesn't handle ERR_CHUNK replies correctly.
rpcrdma_complete_rqst() incorrectly passes a negative number to
xprt_complete_rqst() as the number of bytes copied. Instead, set
task->tk_status to the error value, and return zero bytes copied.
In these cases, return -EIO rather than -EREMOTEIO. The RPC client's
finite state machine doesn't know what to do with -EREMOTEIO.
Additional clean ups:
- Don't double-count RDMA_ERROR replies
- Remove a stale comment
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@kernel.vger.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
1. Ensure that only rpcrdma_cm_event_handler() modifies
ep->re_connect_status to avoid racy changes to that field.
2. Ensure that xprt_force_disconnect() is invoked only once as a
transport is closed or destroyed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Refactor: Pass struct rpcrdma_xprt instead of an IB layer object.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up: Sometimes creating a fresh rpcrdma_ep can fail. That's why
xprt_rdma_connect() always checks if the r_xprt->rx_ep pointer is
valid before dereferencing it. Instead, xprt_rdma_connect() can
simply check rpcrdma_xprt_connect()'s return value.
Also, there's no need to set re_connect_status to zero just after
the rpcrdma_ep is created, since it is allocated with kzalloc.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|