summaryrefslogtreecommitdiff
path: root/net/sunrpc/xprtrdma/xprt_rdma.h
AgeCommit message (Collapse)Author
2015-03-31xprtrdma: Add a "deregister_external" op for each memreg modeChuck Lever
There is very little common processing among the different external memory deregistration functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31xprtrdma: Add a "register_external" op for each memreg modeChuck Lever
There is very little common processing among the different external memory registration functions. Have rpcrdma_create_chunks() call the registration method directly. This removes a stack frame and a switch statement from the external registration path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31xprtrdma: Add a "max_payload" op for each memreg modeChuck Lever
The max_payload computation is generalized to ensure that the payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of data segments that can be transmitted in an inline buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31xprtrdma: Add vector of ops for each memory registration strategyChuck Lever
Instead of employing switch() statements, let's use the typical Linux kernel idiom for handling behavioral variation: virtual functions. Start by defining a vector of operations for each supported memory registration mode, and by adding a source file for each mode. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-03-31xprtrdma: Perform a full marshal on retransmitChuck Lever
Commit 6ab59945f292 ("xprtrdma: Update rkeys after transport reconnect" added logic in the ->send_request path to update the chunk list when an RPC/RDMA request is retransmitted. Note that rpc_xdr_encode() resets and re-encodes the entire RPC send buffer for each retransmit of an RPC. The RPC send buffer is not preserved from the previous transmission of an RPC. Revert 6ab59945f292, and instead, just force each request to be fully marshaled every time through ->send_request. This should preserve the fix from 6ab59945f292, while also performing pullup during retransmits. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-02-23xprtrdma: Store RDMA credits in unsigned variablesChuck Lever
Dan Carpenter's static checker pointed out: net/sunrpc/xprtrdma/rpc_rdma.c:879 rpcrdma_reply_handler() warn: can 'credits' be negative? "credits" is defined as an int. The credits value comes from the server as a 32-bit unsigned integer. A malicious or broken server can plant a large unsigned integer in that field which would result in an underflow in the following logic, potentially triggering a deadlock of the mount point by blocking the client from issuing more RPC requests. net/sunrpc/xprtrdma/rpc_rdma.c: 876 credits = be32_to_cpu(headerp->rm_credit); 877 if (credits == 0) 878 credits = 1; /* don't deadlock */ 879 else if (credits > r_xprt->rx_buf.rb_max_requests) 880 credits = r_xprt->rx_buf.rb_max_requests; 881 882 cwnd = xprt->cwnd; 883 xprt->cwnd = credits << RPC_CWNDSHIFT; 884 if (xprt->cwnd > cwnd) 885 xprt_release_rqst_cong(rqst->rq_task); Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: eba8ff660b2d ("xprtrdma: Move credit update to RPC . . .") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-02-05xprtrdma: Address sparse complaint in rpcr_to_rdmar()Chuck Lever
With "make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__": linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: warning: incorrect type in initializer (different base types) linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: expected restricted __be32 [usertype] *buffer linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: got unsigned int [usertype] *rq_buffer As far as I can tell this is a false positive. Reported-by: kbuild-all@01.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Clean up after adding regbuf managementChuck Lever
rpcrdma_{de}register_internal() are used only in verbs.c now. MAX_RPCRDMAHDR is no longer used and can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Allocate zero pad separately from rpcrdma_bufferChuck Lever
Use the new rpcrdma_alloc_regbuf() API to shrink the amount of contiguous memory needed for a buffer pool by moving the zero pad buffer into a regbuf. This is for consistency with the other uses of internally registered memory. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Allocate RPC/RDMA receive buffer separately from struct rpcrdma_repChuck Lever
The rr_base field is currently the buffer where RPC replies land. An RPC/RDMA reply header lands in this buffer. In some cases an RPC reply header also lands in this buffer, just after the RPC/RDMA header. The inline threshold is an agreed-on size limit for RDMA SEND operations that pass from server and client. The sum of the RPC/RDMA reply header size and the RPC reply header size must be less than this threshold. The largest RDMA RECV that the client should have to handle is the size of the inline threshold. The receive buffer should thus be the size of the inline threshold, and not related to RPCRDMA_MAX_SEGS. RPC replies received via RDMA WRITE (long replies) are caught in rq_rcv_buf, which is the second half of the RPC send buffer. Ie, such replies are not involved in any way with rr_base. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Allocate RPC/RDMA send buffer separately from struct rpcrdma_reqChuck Lever
The rl_base field is currently the buffer where each RPC/RDMA call header is built. The inline threshold is an agreed-on size limit to for RDMA SEND operations that pass between client and server. The sum of the RPC/RDMA header size and the RPC header size must be less than or equal to this threshold. Increasing the r/wsize maximum will require MAX_SEGS to grow significantly, but the inline threshold size won't change (both sides agree on it). The server's inline threshold doesn't change. Since an RPC/RDMA header can never be larger than the inline threshold, make all RPC/RDMA header buffers the size of the inline threshold. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Allocate RPC send buffer separately from struct rpcrdma_reqChuck Lever
Because internal memory registration is an expensive and synchronous operation, xprtrdma pre-registers send and receive buffers at mount time, and then re-uses them for each RPC. A "hardway" allocation is a memory allocation and registration that replaces a send buffer during the processing of an RPC. Hardway must be done if the RPC send buffer is too small to accommodate an RPC's call and reply headers. For xprtrdma, each RPC send buffer is currently part of struct rpcrdma_req so that xprt_rdma_free(), which is passed nothing but the address of an RPC send buffer, can find its matching struct rpcrdma_req and rpcrdma_rep quickly via container_of / offsetof. That means that hardway currently has to replace a whole rpcrmda_req when it replaces an RPC send buffer. This is often a fairly hefty chunk of contiguous memory due to the size of the rl_segments array and the fact that both the send and receive buffers are part of struct rpcrdma_req. Some obscure re-use of fields in rpcrdma_req is done so that xprt_rdma_free() can detect replaced rpcrdma_req structs, and restore the original. This commit breaks apart the RPC send buffer and struct rpcrdma_req so that increasing the size of the rl_segments array does not change the alignment of each RPC send buffer. (Increasing rl_segments is needed to bump up the maximum r/wsize for NFS/RDMA). This change opens up some interesting possibilities for improving the design of xprt_rdma_allocate(). xprt_rdma_allocate() is now the one place where RPC send buffers are allocated or re-allocated, and they are now always left in place by xprt_rdma_free(). A large re-allocation that includes both the rl_segments array and the RPC send buffer is no longer needed. Send buffer re-allocation becomes quite rare. Good send buffer alignment is guaranteed no matter what the size of the rl_segments array is. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Add struct rpcrdma_regbuf and helpersChuck Lever
There are several spots that allocate a buffer via kmalloc (usually contiguously with another data structure) and then register that buffer internally. I'd like to split the buffers out of these data structures to allow the data structures to scale. Start by adding functions that can kmalloc and register a buffer, and can manage/preserve the buffer's associated ib_sge and ib_mr fields. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Simplify synopsis of rpcrdma_buffer_create()Chuck Lever
Clean up: There is one call site for rpcrdma_buffer_create(). All of the arguments there are fields of an rpcrdma_xprt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Take struct ib_qp_attr and ib_qp_init_attr off the stackChuck Lever
Reduce stack footprint of the connection upcall handler function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Take struct ib_device_attr off the stackChuck Lever
Device attributes are large, and are used in more than one place. Stash a copy in dynamically allocated memory. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Remove rpcrdma_ep::rep_func and ::rep_xprtChuck Lever
Clean up: The rep_func field always refers to rpcrdma_conn_func(). rep_func should have been removed by commit b45ccfd25d50 ("xprtrdma: Remove MEMWINDOWS registration modes"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Move credit update to RPC reply handlerChuck Lever
Reduce work in the receive CQ handler, which can be run at hardware interrupt level, by moving the RPC/RDMA credit update logic to the RPC reply handler. This has some additional benefits: More header sanity checking is done before trusting the incoming credit value, and the receive CQ handler no longer touches the RPC/RDMA header (the CPU stalls while waiting for the header contents to be brought into the cache). This further extends work begun by commit e7ce710a8802 ("xprtrdma: Avoid deadlock when credit window is reset"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Remove rl_mr field, and the mr_chunk unionChuck Lever
Clean up: Since commit 0ac531c18323 ("xprtrdma: Remove REGISTER memory registration mode"), the rl_mr pointer is no longer used anywhere. After removal, there's only a single member of the mr_chunk union, so mr_chunk can be removed as well, in favor of a single pointer field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Remove rpcrdma_ep::rep_iaChuck Lever
Clean up: This field is not used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30xprtrdma: Rename "xprt" and "rdma_connect" fields in struct rpcrdma_xprtChuck Lever
Clean up: Use consistent field names in struct rpcrdma_xprt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25xprtrdma: Cap req_cqinitChuck Lever
Recent work made FRMR registration and invalidation completions unsignaled. This greatly reduces the adapter interrupt rate. Every so often, however, a posted send Work Request is allowed to signal. Otherwise, the provider's Work Queue will wrap and the workload will hang. The number of Work Requests that are allowed to remain unsignaled is determined by the value of req_cqinit. Currently, this is set to the size of the send Work Queue divided by two, minus 1. For FRMR, the send Work Queue is the maximum number of concurrent RPCs (currently 32) times the maximum number of Work Requests an RPC might use (currently 7, though some adapters may need more). For mlx4, this is 224 entries. This leaves completion signaling disabled for 111 send Work Requests. Some providers hold back dispatching Work Requests until a CQE is generated. If completions are disabled, then no CQEs are generated for quite some time, and that can stall the Work Queue. I've seen this occur running xfstests generic/113 over NFSv4, where eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM because the Work Queue has overflowed. The connection is dropped and re-established. Cap the rep_cqinit setting so completions are not left turned off for too long. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-09-29svcrdma: advertise the correct max payloadSteve Wise
Svcrdma currently advertises 1MB, which is too large. The correct value is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed in an NFSRDMA IO chunk * the host page size. This bug is usually benign because the Linux X64 NFSRDMA client correctly limits the payload size to the correct value (64*4096 = 256KB). But if the Linux client is PPC64 with a 64KB page size, then the client will indeed use a payload size that will overflow the server. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31xprtrdma: Make rpcrdma_ep_disconnect() return voidChuck Lever
Clean up: The return code is used only for dprintk's that are already redundant. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnectChuck Lever
FAST_REG_MR Work Requests update a Memory Region's rkey. Rkey's are used to block unwanted access to the memory controlled by an MR. The rkey is passed to the receiver (the NFS server, in our case), and is also used by xprtrdma to invalidate the MR when the RPC is complete. When a FAST_REG_MR Work Request is flushed after a transport disconnect, xprtrdma cannot tell whether the WR actually hit the adapter or not. So it is indeterminant at that point whether the existing rkey is still valid. After the transport connection is re-established, the next FAST_REG_MR or LOCAL_INV Work Request against that MR can sometimes fail because the rkey value does not match what xprtrdma expects. The only reliable way to recover in this case is to deregister and register the MR before it is used again. These operations can be done only in a process context, so handle it in the transport connect worker. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Chain together all MWs in same buffer poolChuck Lever
During connection loss recovery, need to visit every MW in a buffer pool. Any MW that is in use by an RPC will not be on the rb_mws list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Unclutter struct rpcrdma_mr_segChuck Lever
Clean ups: - make it obvious that the rl_mw field is a pointer -- allocated separately, not as part of struct rpcrdma_mr_seg - promote "struct {} frmr;" to a named type - promote the state enum to a named type - name the MW state field the same way other fields in rpcrdma_mw are named Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Update rkeys after transport reconnectChuck Lever
Various reports of: rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 ep ffff8800bfd3e848 Ensure that rkeys in already-marshalled RPC/RDMA headers are refreshed after the QP has been replaced by a reconnect. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=249 Suggested-by: Selvin Xavier <Selvin.Xavier@Emulex.Com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Limit data payload size for ALLPHYSICALChuck Lever
When the client uses physical memory registration, each page in the payload gets its own array entry in the RPC/RDMA header's chunk list. Therefore, don't advertise a maximum payload size that would require more array entries than can fit in the RPC buffer where RPC/RDMA headers are built. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Protect ia->ri_id when unmapping/invalidating MRsChuck Lever
Ensure ia->ri_id remains valid while invoking dma_unmap_page() or posting LOCAL_INV during a transport reconnect. Otherwise, ia->ri_id->device or ia->ri_id->qp is NULL, which triggers a panic. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=259 Fixes: ec62f40 'xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting' Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Avoid deadlock when credit window is resetChuck Lever
Update the cwnd while processing the server's reply. Otherwise the next task on the xprt_sending queue is still subject to the old credit window. Currently, no task is awoken if the old congestion window is still exceeded, even if the new window is larger, and a deadlock results. This is an issue during a transport reconnect. Servers don't normally shrink the credit window, but the client does reset it to 1 when reconnecting so the server can safely grow it again. As a minor optimization, remove the hack of grabbing the initial cwnd size (which happens to be RPC_CWNDSCALE) and using that value as the congestion scaling factor. The scaling value is invariant, and we are better off without the multiplication operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Limit work done by completion handlerChuck Lever
Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady stream of CQ events could starve other work because of the boundless loop pooling in rpcrdma_{send,recv}_poll(). Instead of a (potentially infinite) while loop, return after collecting a budgeted number of completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrmda: Reduce calls to ib_poll_cq() in completion handlersChuck Lever
Change the completion handlers to grab up to 16 items per ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16 items are returned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Split the completion queueChuck Lever
The current CQ handler uses the ib_wc.opcode field to distinguish between event types. However, the contents of that field are not reliable if the completion status is not IB_WC_SUCCESS. When an error completion occurs on a send event, the CQ handler schedules a tasklet with something that is not a struct rpcrdma_rep. This is never correct behavior, and sometimes it results in a panic. To resolve this issue, split the completion queue into a send CQ and a receive CQ. The send CQ handler now handles only struct rpcrdma_mw wr_id's, and the receive CQ handler now handles only struct rpcrdma_rep wr_id's. Fix suggested by Shirley Ma <shirley.ma@oracle.com> Reported-by: Rafael Reiter <rafael.reiter@ims.co.at> Fixes: 5c635e09cec0feeeb310968e51dad01040244851 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Klemens Senn <klemens.senn@ims.co.at> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Make rpcrdma_ep_destroy() return voidChuck Lever
Clean up: rpcrdma_ep_destroy() returns a value that is used only to print a debugging message. rpcrdma_ep_destroy() already prints debugging messages in all error cases. Make rpcrdma_ep_destroy() return void instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Simplify rpcrdma_deregister_external() synopsisChuck Lever
Clean up: All remaining callers of rpcrdma_deregister_external() pass NULL as the last argument, so remove that argument. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Remove MEMWINDOWS registration modesChuck Lever
The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were intended as stop-gap modes before the introduction of FRMR. They are now considered obsolete. MEMWINDOWS_ASYNC is also considered unsafe because it can leave client memory registered and exposed for an indeterminant time after each I/O. At this point, the MEMWINDOWS modes add needless complexity, so remove them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process contextChuck Lever
An IB provider can invoke rpcrdma_conn_func() in an IRQ context, thus rpcrdma_conn_func() cannot be allowed to directly invoke generic RPC functions like xprt_wake_pending_tasks(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: mind the device's max fast register page list depthSteve Wise
Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2013-02-01SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()Trond Myklebust
tk_xprt is just a shortcut for tk_client->cl_xprt, however cl_xprt is defined as an __rcu variable. Replace dereferences of tk_xprt with non-rcu dereferences where it is safe to do so. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17svcrdma: Cleanup sparse warnings in the svcrdma moduleTom Tucker
The svcrdma transport was un-marshalling requests in-place. This resulted in sparse warnings due to __beXX data containing both NBO and HBO data. The code has been restructured to do byte-swapping as the header is parsed instead of when the header is validated immediately after receipt. Also moved extern declarations for the workqueue and memory pools to the private header file. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-27Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
* 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits) NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation() nfs: don't use d_move in nfs_async_rename_done RDMA: Increasing RPCRDMA_MAX_DATA_SEGS SUNRPC: Replace xprt->resend and xprt->sending with a priority queue SUNRPC: Allow caller of rpc_sleep_on() to select priority levels SUNRPC: Support dynamic slot allocation for TCP connections SUNRPC: Clean up the slot table allocation SUNRPC: Initalise the struct xprt upon allocation SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot pnfs: simplify pnfs files module autoloading nfs: document nfsv4 sillyrename issues NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL SUNRPC: sunrpc should not explicitly depend on NFS config options NFS: Clean up - simplify the switch to read/write-through-MDS NFS: Move the pnfs write code into pnfs.c NFS: Move the pnfs read code into pnfs.c NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request NFS: Cache rpc_ops in struct nfs_pageio_descriptor ...
2011-07-26atomic: use <linux/atomic.h>Arun Sharma
This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25RDMA: Increasing RPCRDMA_MAX_DATA_SEGSSteve Dickson
Our performance team has noticed that increasing RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly increases throughput when using the RDMA transport. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11RPCRDMA: Fix FRMR registration/invalidate handling.Tom Tucker
When the rpc_memreg_strategy is 5, FRMR are used to map RPC data. This mode uses an FRMR to map the RPC data, then invalidates (i.e. unregisers) the data in xprt_rdma_free. These FRMR are used across connections on the same mount, i.e. if the connection goes away on an idle timeout and reconnects later, the FRMR are not destroyed and recreated. This creates a problem for transport errors because the WR that invalidate an FRMR may be flushed (i.e. fail) leaving the FRMR valid. When the FRMR is later used to map an RPC it will fail, tearing down the transport and starting over. Over time, more and more of the FRMR pool end up in the wrong state resulting in seemingly random disconnects. This fix keeps track of the FRMR state explicitly by setting it's state based on the successful completion of a reg/inv WR. If the FRMR is ever used and found to be in the wrong state, an invalidate WR is prepended, re-syncing the FRMR state and avoiding the connection loss. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls.Tom Talpey
Add defensive timeouts to wait_for_completion() calls in RDMA address resolution, and make them interruptible. Fix the timeout units to milliseconds (formerly jiffies) and move to private header. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.Tom Talpey
The RPC/RDMA protocol allows clients and servers to avoid RDMA operations for data which is purely the result of XDR padding. On the client, automatically insert the necessary padding for such server replies, and optionally don't marshal such chunks. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: suppress retransmit on RPC/RDMA clients.Tom Talpey
An RPC/RDMA client cannot retransmit on an unbroken connection, doing so violates its flow control with the server. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: add data types and new FRMR memory registration enum.Tom Talpey
Internal RPC/RDMA structure updates in preparation for FRMR support. Signed-off-by: Tom Talpey <talpey@netapp.com> Acked-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09RPCRDMA: rpc rdma transport switch\"Talpey, Thomas\
This implements the configuration and building of the core transport switch implementation of the rpcrdma transport. Stubs are provided for the rpcrdma protocol handling, and the infiniband/iwarp verbs interface. These are provided in following patches. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>