summaryrefslogtreecommitdiff
path: root/net/sunrpc/xprtrdma
AgeCommit message (Collapse)Author
2020-11-30svcrdma: Use parsed chunk lists to derive the inv_rkeyChuck Lever
Refactor: Don't duplicate header decoding smarts here. Instead, use the new parsed chunk lists. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Add a "parsed chunk list" data structureChuck Lever
This simple data structure binds the location of each data payload inside of an RPC message to the chunk that will be used to push it to or pull it from the client. There are several benefits to this small additional overhead: * It enables support for more than one chunk in incoming Read and Write lists. * It translates the version-specific on-the-wire format into a generic in-memory structure, enabling support for multiple versions of the RPC/RDMA transport protocol. * It enables the server to re-organize a chunk list if it needs to adjust where Read chunk data lands in server memory without altering the contents of the XDR-encoded Receive buffer. Construction of these lists is done while sanity checking each incoming RPC/RDMA header. Subsequent patches will make use of the generated data structures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Clean up svc_rdma_encode_reply_chunk()Chuck Lever
Refactor: Match the control flow of svc_rdma_encode_write_list(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Post RDMA Writes while XDR encoding repliesChuck Lever
The only RPC/RDMA ordering requirement between RDMA Writes and RDMA Sends is that the responder must post the Writes on the Send queue before posting the Send that conveys the RPC Reply for that Write payload. The Linux NFS server implementation now has a transport method that can post result Payload Writes earlier than svc_rdma_sendto: ->xpo_result_payload() This gets RDMA Writes going earlier so they are more likely to be complete at the remote end before the Send completes. Some care must be taken with pulled-up Replies. We don't want to push the Write chunk and then send the same payload data via Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30NFSD: Invoke svc_encode_result_payload() in "read" NFSD encodersChuck Lever
Have the NFSD encoders annotate the boundaries of every direct-data-placement eligible result data payload. Then change svcrdma to use that annotation instead of the xdr->page_len when handling Write chunks. For NFSv4 on RDMA, that enables the ability to recognize multiple result payloads per compound. This is a pre-requisite for supporting multiple Write chunks per RPC transaction. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30SUNRPC: Rename svc_encode_read_payload()Chuck Lever
Clean up: "result payload" is a less confusing name for these payloads. "READ payload" reflects only the NFS usage. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Refactor the RDMA Write pathChuck Lever
Refactor for subsequent changes. Constify the xdr_buf argument to ensure the code here does not modify it, and to enable callers to pass in a "const struct xdr_buf *". At the same time, rename the helper functions, which emit RDMA Writes, not RDMA Sends, and add documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Const-ify the xdr_buf argumentsChuck Lever
Clean up: Ensure the code in rw.c does not modify the argument, and enable callers to also use "const struct xdr_buf *". Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30svcrdma: Catch another Reply chunk overflow caseChuck Lever
When space in the Reply chunk runs out in the middle of a segment, we end up passing a zero-length SGL to rdma_rw_ctx_init(), and it oopses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-11xprtrdma: Micro-optimize MR DMA-unmappingChuck Lever
Now that rpcrdma_ep is no longer part of rpcrdma_xprt, there are four or five serial address dereferences needed to get to the IB device needed for DMA unmapping. Instead, let's use the same pattern that regbufs use: cache a pointer to the device in the MR, and use that as the indication that unmapping is necessary. This also guarantees that the exact same device is used for DMA mapping and unmapping, even if the r_xprt's ep has been replaced. I don't think this can happen today, but future changes might break this assumption. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Move rpcrdma_mr_put()Chuck Lever
Clean up: This function is now invoked only in frwr_ops.c. The move enables deduplication of the trace_xprtrdma_mr_unmap() call site. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Trace unmap_sync callsChuck Lever
->buf_free is called nearly once per RPC. Only rarely does xprt_rdma_free() have to do anything, thus tracing every one of these calls seems unnecessary. Instead, just throw a trace event when that one occasional RPC still has MRs that need to be released. xprt_rdma_free() is further micro-optimized to reduce the amount of work done in the common case. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Display the task ID when reporting MR eventsChuck Lever
Tie each MR event to the requesting rpc_task to make it easier to follow MR ownership and control flow. MR unmapping and recycling can happen in the background, after an MR's mr_req field is stale, so set up a separate tracepoint class for those events. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Clean up trace_xprtrdma_nomrs()Chuck Lever
- Rename it following the "_err" suffix convention - Replace display of kernel memory addresses - Tie MR exhaustion to a peer IP address, similar to the createmrs tracepoint Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Clean up xprtrdma callback tracepointsChuck Lever
- Replace displayed kernel memory addresses - Tie the XID and event with the peer's IP address Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Clean up tracepoints in the reply pathChuck Lever
Replace unnecessary display of kernel memory addresses. Also, there are no longer any trace_xprtrdma_defer_cmp() call sites. And remove the trace_xprtrdma_leaked_rep() tracepoint because there doesn't seem to be an overwhelming need to have a tracepoint for catching a software bug that has long since been fixed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Clean up reply parsing error tracepointsChuck Lever
- Rename the tracepoints with the "_err" suffix to indicate these are rare error events - Replace display of kernel memory addresses - Tie the XID and error to a connection IP address instead Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Clean up trace_xprtrdma_post_linvChuck Lever
- Replace the display of kernel memory addresses - Add "_err" to the end of its name to indicate that it's a tracepoint that fires only when there's an error Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Introduce FRWR completion IDsChuck Lever
Set up a completion ID in each rpcrdma_frwr. The ID is used to match an incoming completion to a transport (CQ) and other MR-related activity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Introduce Send completion IDsChuck Lever
Set up a completion ID in each rpcrdma_req. The ID is used to match an incoming Send completion to a transport and to a previous ib_post_send(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Introduce Receive completion IDsChuck Lever
Set up a completion ID in each rpcrdma_rep. The ID is used to match an incoming Receive completion to a transport and to a previous ib_post_recv(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-11xprtrdma: Replace dprintk call sites in ERR_CHUNK pathChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-10-22Merge tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "The one new feature this time, from Anna Schumaker, is READ_PLUS, which has the same arguments as READ but allows the server to return an array of data and hole extents. Otherwise it's a lot of cleanup and bugfixes" * tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits) NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy SUNRPC: fix copying of multiple pages in gss_read_proxy_verf() sunrpc: raise kernel RPC channel buffer size svcrdma: fix bounce buffers for unaligned offsets and multiple pages nfsd: remove unneeded break net/sunrpc: Fix return value for sysctl sunrpc.transports NFSD: Encode a full READ_PLUS reply NFSD: Return both a hole and a data segment NFSD: Add READ_PLUS hole segment encoding NFSD: Add READ_PLUS data support NFSD: Hoist status code encoding into XDR encoder functions NFSD: Map nfserr_wrongsec outside of nfsd_dispatch NFSD: Remove the RETURN_STATUS() macro NFSD: Call NFSv2 encoders on error returns NFSD: Fix .pc_release method for NFSv2 NFSD: Remove vestigial typedefs NFSD: Refactor nfsd_dispatch() error paths NFSD: Clean up nfsd_dispatch() variables NFSD: Clean up stale comments in nfsd_dispatch() NFSD: Clean up switch statement in nfsd_dispatch() ...
2020-10-16svcrdma: fix bounce buffers for unaligned offsets and multiple pagesDan Aloni
This was discovered using O_DIRECT at the client side, with small unaligned file offsets or IOs that span multiple file pages. Fixes: e248aa7be86 ("svcrdma: Remove max_sge check at connect time") Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-09-25net: sunrpc: delete repeated wordsRandy Dunlap
Drop duplicate words in net/sunrpc/. Also fix "Anyone" to be "Any one". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-09-21xprtrdma: drop double zeroingJulia Lawall
sg_init_table zeroes its first argument, so the allocation of that argument doesn't have to. the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,n,flags; @@ x = - kcalloc + kmalloc_array (n,sizeof(*x),flags) ... sg_init_table(x,n) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21SUNRPC: Hoist trace_xprtrdma_op_setport into generic codeChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21SUNRPC: Remove debugging instrumentation from xprt_releaseChuck Lever
These instruments don't appear to add any substantial value. We already have this at the termination of each RPC: iozone-2617 [002] 975.713126: rpc_stats_latency: task:418@5 xid=0x260eab5d nfsv3 LOOKUP backlog=15 rtt=32 execute=58 iozone-2617 [002] 975.713127: xprt_release_cong: task:418@5 snd_task:4294967295 cong=256 cwnd=16384 iozone-2617 [002] 975.713127: xprt_put_cong: task:418@5 snd_task:4294967295 cong=0 cwnd=16384 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21SUNRPC: Hoist trace_xprtrdma_op_allocate into generic codeChuck Lever
Introduce a tracepoint in call_allocate that reports the exact sizes in the RPC buffer allocation request and the status of the result. This helps catch problems with XDR buffer provisioning, and replaces transport-specific debugging instrumentation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-09Merge tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fix an NFS/RDMA resource leak - Fix the error handling during delegation recall - NFSv4.0 needs to return the delegation on a zero-stateid SETATTR - Stop printk reading past end of string * tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: stop printk reading past end of string NFS: Zero-stateid SETATTR should first return delegation NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall xprtrdma: Release in-flight MRs on disconnect
2020-08-26xprtrdma: Release in-flight MRs on disconnectChuck Lever
Dan Aloni reports that when a server disconnects abruptly, a few memory regions are left DMA mapped. Over time this leak could pin enough I/O resources to slow or even deadlock an NFS/RDMA client. I found that if a transport disconnects before pending Send and FastReg WRs can be posted, the to-be-registered MRs are stranded on the req's rl_registered list and never released -- since they weren't posted, there's no Send completion to DMA unmap them. Reported-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-09Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds
Pull NFS server updates from Chuck Lever: "Highlights: - Support for user extended attributes on NFS (RFC 8276) - Further reduce unnecessary NFSv4 delegation recalls Notable fixes: - Fix recent krb5p regression - Address a few resource leaks and a rare NULL dereference Other: - De-duplicate RPC/RDMA error handling and other utility functions - Replace storage and display of kernel memory addresses by tracepoints" * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits) svcrdma: CM event handler clean up svcrdma: Remove transport reference counting svcrdma: Fix another Receive buffer leak SUNRPC: Refresh the show_rqstp_flags() macro nfsd: netns.h: delete a duplicated word SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()") nfsd: avoid a NULL dereference in __cld_pipe_upcall() nfsd4: a client's own opens needn't prevent delegations nfsd: Use seq_putc() in two functions svcrdma: Display chunk completion ID when posting a rw_ctxt svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send() svcrdma: Introduce Send completion IDs svcrdma: Record Receive completion ID in svc_rdma_decode_rqst svcrdma: Introduce Receive completion IDs svcrdma: Introduce infrastructure to support completion IDs svcrdma: Add common XDR encoders for RDMA and Read segments svcrdma: Add common XDR decoders for RDMA and Read segments SUNRPC: Add helpers for decoding list discriminators symbolically svcrdma: Remove declarations for functions long removed svcrdma: Clean up trace_svcrdma_send_failed() tracepoint ...
2020-07-28svcrdma: CM event handler clean upChuck Lever
Now that there's a core tracepoint that reports these events, there's no need to maintain dprintk() call sites in each arm of the switch statements. We also refresh the documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-28svcrdma: Remove transport reference countingChuck Lever
Jason tells me that a ULP cannot rely on getting an ESTABLISHED and DISCONNECTED event pair for each connection, so transport reference counting in the CM event handler will never be reliable. Now that we have ib_drain_qp(), svcrdma should no longer need to hold transport references while Sends and Receives are posted. So remove the get/put call sites in the CM event handlers. This eliminates a significant source of locked memory bus traffic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-28svcrdma: Fix another Receive buffer leakChuck Lever
During a connection tear down, the Receive queue is flushed before the device resources are freed. Typically, all the Receives flush with IB_WR_FLUSH_ERR. However, any pending successful Receives flush with IB_WR_SUCCESS, and the server automatically posts a fresh Receive to replace the completing one. This happens even after the connection has closed and the RQ is drained. Receives that are posted after the RQ is drained appear never to complete, causing a Receive resource leak. The leaked Receive buffer is left DMA-mapped. To prevent these late-posted recv_ctxt's from leaking, block new Receive posting after XPT_CLOSE is set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-15xprtrdma: fix incorrect header size calculationsColin Ian King
Currently the header size calculations are using an assignment operator instead of a += operator when accumulating the header size leading to incorrect sizes. Fix this by using the correct operator. Addresses-Coverity: ("Unused value") Fixes: 302d3deb2068 ("xprtrdma: Prevent inline overflow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-13svcrdma: Display chunk completion ID when posting a rw_ctxtChuck Lever
Re-use the post_rw tracepoint (safely) to trace cc_info lifetime events, including completion IDs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()Chuck Lever
First, refactor: Dereference the svc_rdma_send_ctxt inside svc_rdma_send() instead of at every call site. Then, it can be passed into trace_svcrdma_post_send() to get the proper completion ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Introduce Send completion IDsChuck Lever
Set up a completion ID in each svc_rdma_send_ctxt. The ID is used to match an incoming Send completion to a transport and to a previous ib_post_send(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Record Receive completion ID in svc_rdma_decode_rqstChuck Lever
When recording a trace event in the Receive path, tie decoding results and errors to an incoming Receive completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Introduce Receive completion IDsChuck Lever
Set up a completion ID in each svc_rdma_recv_ctxt. The ID is used to match an incoming Receive completion to a transport and to a previous ib_post_recv(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Add common XDR encoders for RDMA and Read segmentsChuck Lever
Clean up: De-duplicate some code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Add common XDR decoders for RDMA and Read segmentsChuck Lever
Clean up: De-duplicate some code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13SUNRPC: Add helpers for decoding list discriminators symbolicallyChuck Lever
Use these helpers in a few spots to demonstrate their use. The remaining open-coded discriminator checks in rpcrdma will be addressed in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Clean up trace_svcrdma_send_failed() tracepointChuck Lever
- Use the _err naming convention instead - Remove display of kernel memory address of the controlling xprt Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Consolidate send_error helper functionsChuck Lever
Final refactor: Replace internals of svc_rdma_send_error() with a simple call to svc_rdma_send_error_msg(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Make svc_rdma_send_error_msg() a global functionChuck Lever
Prepare for svc_rdma_send_error_msg() to be invoked from another source file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Eliminate return value for svc_rdma_send_error_msg()Chuck Lever
Like svc_rdma_send_error(), have svc_rdma_send_error_msg() handle any error conditions internally, rather than duplicating that recovery logic at every call site. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Add a @status parameter to svc_rdma_send_error_msg()Chuck Lever
The common "send RDMA_ERR" function should be in svc_rdma_sendto.c, since that is where the other Send-related functions are located. So from here, I will beef up svc_rdma_send_error_msg() and deprecate svc_rdma_send_error(). A generic svc_rdma_send_error_msg() will need to handle both ERR_CHUNK and ERR_VERS. Copy that logic from svc_rdma_send_error() to svc_rdma_send_error_msg(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>