linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-05-15	svcrdma: Reduce the number of rdma_rw contexts per-QP	Chuck Lever
	There is an upper bound on the number of rdma_rw contexts that can be created per QP. This invisible upper bound is because rdma_create_qp() adds one or more additional SQEs for each ctxt that the ULP requests via qp_attr.cap.max_rdma_ctxs. The QP's actual Send Queue length is on the order of the sum of qp_attr.cap.max_send_wr and a factor times qp_attr.cap.max_rdma_ctxs. The factor can be up to three, depending on whether MR operations are required before RDMA Reads. This limit is not visible to RDMA consumers via dev->attrs. When the limit is surpassed, QP creation fails with -ENOMEM. For example: svcrdma's estimate of the number of rdma_rw contexts it needs is three times the number of pages in RPCSVC_MAXPAGES. When MAXPAGES is about 260, the internally-computed SQ length should be: 64 credits + 10 backlog + 3 * (3 * 260) = 2414 Which is well below the advertised qp_max_wr of 32768. If RPCSVC_MAXPAGES is increased to 4MB, that's 1040 pages: 64 credits + 10 backlog + 3 * (3 * 1040) = 9434 However, QP creation fails. Dynamic printk for mlx5 shows: calc_sq_size:618:(pid 1514): send queue size (9326 * 256 / 64 -> 65536) exceeds limits(32768) Although 9326 is still far below qp_max_wr, QP creation still fails. Because the total SQ length calculation is opaque to RDMA consumers, there doesn't seem to be much that can be done about this except for consumers to try to keep the requested rdma_rw ctxt count low. Fixes: 2da0f610e733 ("svcrdma: Increase the per-transport rw_ctx count") Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11	svcrdma: Unregister the device if svc_rdma_accept() fails	Chuck Lever
	To handle device removal, svc_rdma_accept() requests removal notification for the underlying device when accepting a connection. However svc_rdma_free() is not invoked if svc_rdma_accept() fails. There needs to be a matching "unregister" in that case; otherwise the device cannot be removed. Fixes: c4de97f7c454 ("svcrdma: Handle device removal outside of the CM event handler") Cc: stable@vger.kernel.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10	svcrdma: do not unregister device for listeners	Olga Kornievskaia
	On an rdma-capable machine, a start/stop/start and then on a stop of a knfsd server would lead kref underflow warning because svc_rdma_free would indiscriminately unregister the rdma device but a listening transport never calls the rdma_rn_register() thus leading to kref going down to 0 on the 1st stop of the server and on the 2nd stop it leads to a problem. Suggested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: c4de97f7c454 ("svcrdma: Handle device removal outside of the CM event handler") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	sunrpc: xprtrdma: Use ERR_CAST() to return	Yan Zhen
	Using ERR_CAST() is more reasonable and safer, When it is necessary to convert the type of an error pointer and return it. Signed-off-by: Yan Zhen <yanzhen@vivo.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	svcrdma: Handle device removal outside of the CM event handler	Chuck Lever
	Synchronously wait for all disconnects to complete to ensure the transports have divested all hardware resources before the underlying RDMA device can safely be removed. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08	svcrdma: Handle ADDR_CHANGE CM event properly	Chuck Lever
	Sagi tells me that when a bonded device reports an address change, the consumer must destroy its listener IDs and create new ones. See commit a032e4f6d60d ("nvmet-rdma: fix bonding failover possible NULL deref"). Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08	svcrdma: Refactor the creation of listener CMA ID	Chuck Lever
	In a moment, I will add a second consumer of CMA ID creation in svcrdma. Refactor so this code can be reused. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	svcrdma: Increase the per-transport rw_ctx count	Chuck Lever
	rdma_rw_mr_factor() returns the smallest number of MRs needed to move a particular number of pages. svcrdma currently asks for the number of MRs needed to move RPCSVC_MAXPAGES (a little over one megabyte), as that is the number of pages in the largest r/wsize the server supports. This call assumes that the client's NIC can bundle a full one megabyte payload in a single rdma_segment. In fact, most NICs cannot handle a full megabyte with a single rkey / rdma_segment. Clients will typically split even a single Read chunk into many segments. The server needs one MR to read each rdma_segment in a Read chunk, and thus each one needs an rw_ctx. svcrdma has been vastly underestimating the number of rw_ctxs needed to handle 64 RPC requests with large Read chunks using small rdma_segments. Unfortunately there doesn't seem to be a good way to estimate this number without knowing the client NIC's capabilities. Even then, the client RPC/RDMA implementation is still free to split a chunk into smaller segments (for example, it might be using physical registration, which needs an rdma_segment per page). The best we can do for now is choose a number that will guarantee forward progress in the worst case (one page per segment). At some later point, we could add some mechanisms to make this much less of a problem: - Add a core API to add more rw_ctxs to an already-established QP - svcrdma could treat rw_ctx exhaustion as a temporary error and try again - Limit the number of Reads in flight Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	svcrdma: Update max_send_sges after QP is created	Chuck Lever
	rdma_create_qp() can modify cap.max_send_sges. Copy the new value to the svcrdma transport so it is bound by the new limit instead of the requested one. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	svcrdma: Report CQ depths in debugging output	Chuck Lever
	Check that svc_rdma_accept() is allocating an appropriate number of CQEs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	svcrdma: Reserve an extra WQE for ib_drain_rq()	Chuck Lever
	Do as other ULPs already do: ensure there is an extra Receive WQE reserved for the tear-down drain WR. I haven't heard reports of problems but it can't hurt. Note that rq_depth is used to compute the Send Queue depth as well, so this fix should affect both the SQ and RQ. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Add back svcxprt_rdma::sc_read_complete_q	Chuck Lever
	Having an nfsd thread waiting for an RDMA Read completion is problematic if the Read responder (ie, the client) stops responding. We need to go back to handling RDMA Reads by allowing the nfsd thread to return to the svc scheduler, then waking a second thread finish the RPC message once the Read completion fires. As a next step, add a list_head upon which completed Reads are queued. A subsequent patch will make use of this queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Clean up comment in svc_rdma_accept()	Chuck Lever
	The comment that starts "Qualify ..." applies to only some of the following code paragraph. Re-arrange the lines so the comment makes more sense. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Remove queue-shortening warnings	Chuck Lever
	These won't have much diagnostic value for site administrators. Since they can't be disabled, they become noise. What's more, the subsequent rdma_create_qp() call adjusts the Send Queue size (possibly downward) without warning, making the size reported by these pr_warns inaccurate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Remove pointer addresses shown in dprintk()	Chuck Lever
	There are a couple of dprintk() call sites in svc_rdma_accept() that show pointer addresses. These days, displayed pointer addresses are hashed and thus have little or no diagnostic value, especially for site administrators. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Add lockdep class keys for transport locks	Chuck Lever
	Two svcrdma-related transport locks can become quite contended. Collate their use and make them easy to find in /proc/lock_stat for better observability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Add a utility workqueue to svcrdma	Chuck Lever
	To handle work in the background, set up an UNBOUND workqueue for svcrdma. Subsequent patches will make use of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12	svcrdma: Allocate new transports on device's NUMA node	Chuck Lever
	The physical device's NUMA node ID is available when allocating an svc_xprt for an incoming connection. Use that value to ensure the svc_xprt structure is allocated on the NUMA node closest to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-14	SUNRPC: always free ctxt when freeing deferred request	NeilBrown
	Since the ->xprt_ctxt pointer was added to svc_deferred_req, it has not been sufficient to use kfree() to free a deferred request. We may need to free the ctxt as well. As freeing the ctxt is all that ->xpo_release_rqst() does, we repurpose it to explicit do that even when the ctxt is not stored in an rqst. So we now have ->xpo_release_ctxt() which is given an xprt and a ctxt, which may have been taken either from an rqst or from a dreq. The caller is now responsible for clearing that pointer after the call to ->xpo_release_ctxt. We also clear dr->xprt_ctxt when the ctxt is moved into a new rqst when revisiting a deferred request. This ensures there is only one pointer to the ctxt, so the risk of double freeing in future is reduced. The new code in svc_xprt_release which releases both the ctxt and any rq_deferred depends on this. Fixes: 773f91b2cf3f ("SUNRPC: Fix NFSD's request deferral on RDMA transports") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20	SUNRPC: Remove ->xpo_secure_port()	Chuck Lever
	There's no need for the cost of this extra virtual function call during every RPC transaction: the RQ_SECURE bit can be set properly in ->xpo_recvfrom() instead. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-06	Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"	Chuck Lever
	The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next. Found via KCSAN. Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23	SUNRPC: Use RMW bitops in single-threaded hot paths	Chuck Lever
	I noticed CPU pipeline stalls while using perf. Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are not needed outside the svc thread scheduler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-19	svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free()	Chuck Lever
	svc_xprt_free() already "puts" the bc_xprt before calling the transport's "free" method. No need to do it twice. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17	svcrdma: Convert rdma->sc_rw_ctxts to llist	Chuck Lever
	Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts to an llist. The goal is to reduce the average overhead of Send completions, because a transport's completion handlers are single-threaded on one CPU core. This change reduces CPU utilization of each Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17	svcrdma: Relieve contention on sc_send_lock.	Chuck Lever
	/proc/lock_stat indicates the the sc_send_lock is heavily contended when the server is under load from a single client. To address this, convert the send_ctxt free list to an llist. Returning an item to the send_ctxt cache is now waitless, which reduces the instruction path length in the single-threaded Send handler (svc_rdma_wc_send). The goal is to enable the ib_comp_wq worker to handle a higher RPC/RDMA Send completion rate given the same CPU resources. This change reduces CPU utilization of Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
2021-03-31	svcrdma: Remove sc_read_complete_q	Chuck Lever
	Now that svc_rdma_recvfrom() waits for Read completion, sc_read_complete_q is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22	svcrdma: Add a "deferred close" helper	Chuck Lever
	Refactor a bit of commonly used logic so that every site that wants a close deferred to an nfsd thread does all the right things (set_bit(XPT_CLOSE) then enqueue). Also, once XPT_CLOSE is set on a transport, it is never cleared. If XPT_CLOSE is already set, then the close is already being handled and the enqueue can be skipped. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22	svcrdma: Maintain a Receive water mark	Chuck Lever
	Post more Receives when the number of pending Receives drops below a water mark. The batch mechanism is disabled if the underlying device cannot support a reasonably-sized Receive Queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22	svcrdma: Provide an explanatory comment in CMA event handler	Chuck Lever
	Clean up: explain why svc_xprt_enqueue() is invoked in the event handler even though no xpt_flags bits are toggled here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-02-15	svcrdma: Hold private mutex while invoking rdma_accept()	Chuck Lever
	RDMA core mutex locking was restructured by commit d114c6feedfe ("RDMA/cma: Add missing locking to rdma_accept()") [Aug 2020]. When lock debugging is enabled, the RPC/RDMA server trips over the new lockdep assertion in rdma_accept() because it doesn't call rdma_accept() from its CM event handler. As a temporary fix, have svc_rdma_accept() take the handler_mutex explicitly. In the meantime, let's consider how to restructure the RPC/RDMA transport to invoke rdma_accept() from the proper context. Calls to svc_rdma_accept() are serialized with calls to svc_rdma_free() by the generic RPC server layer. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-rdma/20210209154014.GO4247@nvidia.com/ Fixes: d114c6feedfe ("RDMA/cma: Add missing locking to rdma_accept()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30	SUNRPC: Rename svc_encode_read_payload()	Chuck Lever
	Clean up: "result payload" is a less confusing name for these payloads. "READ payload" reflects only the NFS usage. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-28	svcrdma: CM event handler clean up	Chuck Lever
	Now that there's a core tracepoint that reports these events, there's no need to maintain dprintk() call sites in each arm of the switch statements. We also refresh the documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-28	svcrdma: Remove transport reference counting	Chuck Lever
	Jason tells me that a ULP cannot rely on getting an ESTABLISHED and DISCONNECTED event pair for each connection, so transport reference counting in the CM event handler will never be reliable. Now that we have ib_drain_qp(), svcrdma should no longer need to hold transport references while Sends and Receives are posted. So remove the get/put call sites in the CM event handlers. This eliminates a significant source of locked memory bus traffic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13	svcrdma: Add common XDR decoders for RDMA and Read segments	Chuck Lever
	Clean up: De-duplicate some code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18	SUNRPC: Trace a few more generic svc_xprt events	Chuck Lever
	In lieu of dprintks or tracepoints in each individual transport implementation, introduce tracepoints in the generic part of the RPC layer. These typically fire for connection lifetime events, so shouldn't contribute a lot of noise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18	svcrdma: Add tracepoints to report ->xpo_accept failures	Chuck Lever
	Failure to accept a connection is typically due to a problem specific to a transport type. Also, ->xpo_accept returns NULL on error rather than reporting a specific problem. So, add failure-specific tracepoints in svc_rdma_accept(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18	svcrdma: Displayed remote IP address should match stored address	Chuck Lever
	Clean up: After commit 1e091c3bbf51 ("svcrdma: Ignore source port when computing DRC hash"), the IP address stored in xpt_remote always has a port number of zero. Thus, there's no need to display the port number when displaying the IP address of a remote NFS/RDMA client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-04-17	svcrdma: Fix leak of svc_rdma_recv_ctxt objects	Chuck Lever
	Utilize the xpo_release_rqst transport method to ensure that each rqstp's svc_rdma_recv_ctxt object is released even when the server cannot return a Reply for that rqstp. Without this fix, each RPC whose Reply cannot be sent leaks one svc_rdma_recv_ctxt. This is a 2.5KB structure, a 4KB DMA-mapped Receive buffer, and any pages that might be part of the Reply message. The leak is infrequent unless the network fabric is unreliable or Kerberos is in use, as GSS sequence window overruns, which result in connection loss, are more common on fast transports. Fixes: 3a88092ee319 ("svcrdma: Preserve Receive buffer until svc_rdma_sendto") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Remove svcrdma_cm_event() trace point	Chuck Lever
	Clean up. This trace point is no longer needed because the RDMA/core CMA code has an equivalent trace point that was added by commit ed999f820a6c ("RDMA/cma: Add trace points in RDMA Connection Manager"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	nfsd: Fix NFSv4 READ on RDMA when using readv	Chuck Lever
	svcrdma expects that the payload falls precisely into the xdr_buf page vector. This does not seem to be the case for nfsd4_encode_readv(). This code is called only when fops->splice_read is missing or when RQ_SPLICE_OK is clear, so it's not a noticeable problem in many common cases. Add new transport method: ->xpo_read_payload so that when a READ payload does not fit exactly in rq_res's page vector, the XDR encoder can inform the RPC transport exactly where that payload is, without the payload's XDR pad. That way, when a Write chunk is present, the transport knows what byte range in the Reply message is supposed to be matched with the chunk. Note that the Linux NFS server implementation of NFS/RDMA can currently handle only one Write chunk per RPC-over-RDMA message. This simplifies the implementation of this fix. Fixes: b04209806384 ("nfsd4: allow exotic read compounds") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2019-09-27	Merge tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux	Linus Torvalds
	Pull nfsd updates from Bruce Fields: "Highlights: - Add a new knfsd file cache, so that we don't have to open and close on each (NFSv2/v3) READ or WRITE. This can speed up read and write in some cases. It also replaces our readahead cache. - Prevent silent data loss on write errors, by treating write errors like server reboots for the purposes of write caching, thus forcing clients to resend their writes. - Tweak the code that allocates sessions to be more forgiving, so that NFSv4.1 mounts are less likely to hang when a server already has a lot of clients. - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be limited only by the backend filesystem and the maximum RPC size. - Allow the server to enforce use of the correct kerberos credentials when a client reclaims state after a reboot. And some miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits) sunrpc: clean up indentation issue nfsd: fix nfs read eof detection nfsd: Make nfsd_reset_boot_verifier_locked static nfsd: degraded slot-count more gracefully as allocation nears exhaustion. nfsd: handle drc over-allocation gracefully. nfsd: add support for upcall version 2 nfsd: add a "GetVersion" upcall for nfsdcld nfsd: Reset the boot verifier on all write I/O errors nfsd: Don't garbage collect files that might contain write errors nfsd: Support the server resetting the boot verifier nfsd: nfsd_file cache entries should be per net namespace nfsd: eliminate an unnecessary acl size limit Deprecate nfsd fault injection nfsd: remove duplicated include from filecache.c nfsd: Fix the documentation for svcxdr_tmpalloc() nfsd: Fix up some unused variable warnings nfsd: close cached files prior to a REMOVE or RENAME that would replace target nfsd: rip out the raparms cache nfsd: have nfsd_test_lock use the nfsd_file cache nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache ...
2019-08-19	svcrdma: Use llist for managing cache of recv_ctxts	Chuck Lever
	Use a wait-free mechanism for managing the svc_rdma_recv_ctxts free list. Subsequently, sc_recv_lock can be eliminated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19	svcrdma: Remove svc_rdma_wq	Chuck Lever
	Clean up: the system workqueue will work just as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-05	rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors	Chuck Lever
	Send and Receive completion is handled on a single CPU selected at the time each Completion Queue is allocated. Typically this is when an initiator instantiates an RDMA transport, or when a target accepts an RDMA connection. Some ULPs cannot open a connection per CPU to spread completion workload across available CPUs and MSI vectors. For such ULPs, provide an API that allows the RDMA core to select a completion vector based on the device's complement of available comp_vecs. ULPs that invoke ib_alloc_cq() with only comp_vector 0 are converted to use the new API so that their completion workloads interfere less with each other. Suggested-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Cc: <linux-cifs@vger.kernel.org> Cc: <v9fs-developer@lists.sourceforge.net> Link: https://lore.kernel.org/r/20190729171923.13428.52555.stgit@manet.1015granger.net Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-06	SUNRPC: Remove the bh-safe lock requirement on xprt->transport_lock	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-06-19	svcrdma: Ignore source port when computing DRC hash	Chuck Lever
	The DRC appears to be effectively empty after an RPC/RDMA transport reconnect. The problem is that each connection uses a different source port, which defeats the DRC hash. Clients always have to disconnect before they send retransmissions to reset the connection's credit accounting, thus every retransmit on NFS/RDMA will miss the DRC. An NFS/RDMA client's IP source port is meaningless for RDMA transports. The transport layer typically sets the source port value on the connection to a random ephemeral port. The server already ignores it for the "secure port" check. See commit 16e4d93f6de7 ("NFSD: Ignore client's source port on RDMA transports"). The Linux NFS server's DRC resolves XID collisions from the same source IP address by using the checksum of the first 200 bytes of the RPC call header. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06	svcrdma: Remove syslog warnings in work completion handlers	Chuck Lever
	These can result in a lot of log noise, and are able to be triggered by client misbehavior. Since there are trace points in these handlers now, there's no need to spam the log. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06	svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabled	Chuck Lever
	CC [M] net/sunrpc/xprtrdma/svc_rdma_transport.o linux/net/sunrpc/xprtrdma/svc_rdma_transport.c: In function ‘svc_rdma_accept’: linux/net/sunrpc/xprtrdma/svc_rdma_transport.c:452:19: warning: variable ‘sap’ set but not used [-Wunused-but-set-variable] struct sockaddr *sap; ^ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06	svcrdma: Remove max_sge check at connect time	Chuck Lever
	Two and a half years ago, the client was changed to use gathered Send for larger inline messages, in commit 655fec6987b ("xprtrdma: Use gathered Send for large inline messages"). Several fixes were required because there are a few in-kernel device drivers whose max_sge is 3, and these were broken by the change. Apparently my memory is going, because some time later, I submitted commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma: Reduce max_send_sges"). These too incorrectly assumed in-kernel device drivers would have more than a few Send SGEs available. The fix for the server side is not the same. This is because the fundamental problem on the server is that, whether or not the client has provisioned a chunk for the RPC reply, the server must squeeze even the most complex RPC replies into a single RDMA Send. Failing in the send path because of Send SGE exhaustion should never be an option. Therefore, instead of failing when the send path runs out of SGEs, switch to using a bounce buffer mechanism to handle RPC replies that are too complex for the device to send directly. That allows us to remove the max_sge check to enable drivers with small max_sge to work again. Reported-by: Don Dutile <ddutile@redhat.com> Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...") Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27	sunrpc: remove unused xpo_prep_reply_hdr callback	Vasily Averin
	xpo_prep_reply_hdr are not used now. It was defined for tcp transport only, however it cannot be called indirectly, so let's move it to its caller and remove unused callback. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>