summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-03RDMA/hns: Remove set but not used variable 'fclr_write_fail_flag'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'hns_roce_function_clear': drivers/infiniband/hw/hns/hns_roce_hw_v2.c:1135:7: warning: variable 'fclr_write_fail_flag' set but not used [-Wunused-but-set-variable] It is never used, so can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03RDMA/i40iw: Set queue pair state when being queriedLiu, Changcheng
The API for ib_query_qp requires the driver to set qp_state and cur_qp_state on return, add the missing sets. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03IB/i40iw: Use kmemdup rather than open codingFuqian Huang
Use kmemdump instead of kzmalloc + memcpy. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03IB/ipoib: Remove memset after vzalloc in ipoib_cm.cFuqian Huang
vzalloc has already zeroed the memory. So a memset is unneeded. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03IB: Remove unneeded memsetFuqian Huang
In commit af7ddd8a627c ("Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping"), dma_alloc_coherent/dmam_alloc_coherent always zeroed the returned memory. So the memset after a coherent allocation function is not needed. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02Merge branch 'siw' into rdma.git for-nextJason Gunthorpe
Bernard Metzler says: ==================== This patch set contributes the SoftiWarp driver rebased for latest rdma-next. SoftiWarp (siw) implements the iWarp RDMA protocol over kernel TCP sockets. The driver integrates with the linux-rdma framework. A matching userlevel driver is available as PR at https://github.com/linux-rdma/rdma-core/pull/536 Many thanks for reviewing and testing the driver, especially to Leon, Jason, Steve, Doug, Olga, Dennis, Gal. You all helped to significantly improve the driver over the last year. Please find below a list of changes and comments, compared to older versions of the siw driver. Many thanks! Bernard. CHANGES: ======== v3 (this version) ----------------- - Rebased to rdma-next - Removed unneccessary initialization of enums in siw-abi.h - Added comment on sizing of all work queues to power of two. v2 ----------------- - Changed recieve path CRC calculation to compute CRC32c not on target buffer after placement, but on original skbuf. This change severely hurts performance, if CRC is switched on, since skb must now be walked twice. It is planned to work on an extension to skb_copy_bits() to fold in CRC computation. - Moved debugging to using ibdev_dbg(). - Dropped detailed packet debug printing. - Removed siw_debug.[ch] files. - Removed resource tracking, code now relies on restrack of RDMA midlayer. Only object counting to enforce reported device limits is left in place. - Removed all nested switch-case statements. - Cleaned up header file #include's - Moved CQ create/destroy to new semantics, where midlayer creates/destroys containing object. - Set siw's ABI version to 1 (was 0 before) - Removed all enum initialization where not needed. - Fixed MAINTANERS entry for siw driver - This version stays with the current siw specific management of user memory (siw_umem_get() vs. ib_umem_get(), etc.). This, since the current ib_umem implementation is less efficient for user page lookup on the fast path, where effciency is important for a SW RDMA driver. It is planned to contribute enhancements to the ib_umem framework, wich makes it suitable for SW drivers as well. v1 (first version after v9 of siw RFC) -------------------------------------- - Rebased to 5.2-rc1 - All IDR code got removed. - Both MR and QP deallocation verbs now synchronously free the resources referenced by the RDMA mid-layer. - IPv6 support was added. - For compatibility with Chelsio iWarp hardware, the RX path was slightly reworked. It now allows packet intersection between tagged and untagged RDMAP operations. While not a defined behavior as of IETF RFC 5040/5041, some RDMA hardware may intersect an ongoing outbound (large) tagged message, such as an multisegment RDMA Read Response with sending an untagged message, such as an RDMA Send frame. This behavior was only detected in an NVMeF setup, where siw was used at target side, and RDMA hardware at client side (during file write). siw now implements two input paths for tagged and untagged messages each, and allows the intersected placement of both messages. - The siw kernel abi file got renamed from siw_user.h to siw-abi.h. ==================== * branch 'siw': SIW addition to kernel build environment SIW completion queue methods SIW receive path SIW transmit path SIW queue pair methods SIW application buffer management SIW application interface SIW connection management SIW network and RDMA core interface SIW main include file iWarp wire packet format
2019-07-02rdma/siw: addition to kernel build environmentBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: completion queue methodsBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: receive pathBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: transmit pathBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: queue pair methodsBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: application buffer managementBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: application interfaceBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: connection managementBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: network and RDMA core interfaceBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: main include fileBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02rdma/siw: iWarp wire packet formatBernard Metzler
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: No need to use try_module_get for debugfsDennis Dalessandro
The call in debugfs.c for try_module_get() is not needed. A reference to the module will be taken by the VFS layer as long as the owner field is set in the file ops struct. So set this as well as remove the call. Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Add trace for map_mr_sgMike Marciniszyn
Add trace to debug map_mr_sg handling. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Enhance trace information for FRWR debugMike Marciniszyn
This patch enhances the MR trace information to enable more focused debug of MR issues. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Add missing INVALIDATE opcodes for traceMike Marciniszyn
This was missed in the original implementation of the memory management extensions. Fixes: 0db3dfa03c08 ("IB/hfi1: Work request processing for fast register mr and invalidate") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Reduce excessive aspm inlinesMichael J. Ruhl
Uninline the aspm API since it increases code space for no reason. Move the aspm module param to the new aspm C file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR detailsMichael J. Ruhl
Add some helper functions to hide struct rvt_swqe details. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPsMichael J. Ruhl
Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a non-zero reference count. IBTA 11.2.2 notes no such return value or error case: Output Modifiers: - Verb results: - Operation completed successfully. - Invalid HCA handle. - Invalid address handle. ULPs never test for this error and this will leak memory. The reference count exists to allow for driver independent progress mechanisms to process UD SWQEs in parallel with post sends. The SWQE will hold a reference count until the UD SWQE completes and then drops the reference. Fix by removing need to reference count the AH. Add a UD specific allocation to each SWQE entry to cache the necessary information for independent progress. Copy the information during the post send processing. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Set QP allowed opcodes after QP allocationMichael J. Ruhl
Currently QP allowed_ops is set after the QP is completely initialized. This curtails the use of this optimization for any initialization before allowed_ops is set. Fix by adding a helper to determine the correct allowed_ops and moving the setting of the allowed_ops to just after QP allocation. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is fullKamenee Arumugam
When a completion queue is full, the associated queue pairs are not put into the error state. According to the IBTA specification, this is a violation. Quote from IBTA spec: C9-218: A Requester Class F error occurs when the CQ is inaccessible or full and an attempt is made to complete a WQE. The Affected QP shall be moved to the error state and affiliated asynchronous errors generated as described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The current WQE and any subsequent WQEs are left in an unknown state. C11-37: The CI shall generate a CQ Error when a CQ overrun is detected. This condition will result in an Affiliated Asynchronous Error for any associated Work Queues when they attempt to use that CQ. Completions can no longer be added to the CQ. It is not guaranteed that completions present in the CQ at the time the error occurred can be retrieved. Possible causes include a CQ overrun or a CQ protection error. Put the qp in error state when cq is full. Implement a state called full to continue to put other associated QPs in error state. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Fracture single lock used for posting and processing RWQEsKamenee Arumugam
Usage of single lock prevents fetching posted and processing receive work queue entries from progressing simultaneously and impacts overall performance. Fracture the single lock used for posting and processing Receive Work Queue Entries (RWQEs) to allow the circular buffer to be filled and emptied at the same time. Two new spinlocks - one for the producers and one for the consumers used for posting and processing RWQEs simultaneously and the two indices are define on two different cache lines. The threshold count is used to avoid reading other index in different cache line every time. Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Move receive work queue struct into uapi directoryKamenee Arumugam
The rvt_rwqe and rvt_rwq struct elements are shared between rdmavt and the providers but are not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2, The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. Move rvt_rwqe and rvt_rwq struct into rvt-abi.h header in uapi directory. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Move rvt_cq_wc struct into uapi directoryKamenee Arumugam
The rvt_cq_wc struct elements are shared between rdmavt and the providers but not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. In that case, move rvt_cq_wc struct into the rvt-abi.h header file and create a rvt_k_cq_w for the kernel completion queue. Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/hns: fix spelling mistake "attatch" -> "attach"Colin Ian King
There is a spelling mistake in an dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/netlink: Audit policy settings for netlink attributesDoug Ledford
For all string attributes for which we don't currently accept the element as input, we only use it as output, set the string length to RDMA_NLDEV_ATTR_EMPTY_STRING which is defined as 1. That way we will only accept a null string for that element. This will prevent someone from writing a new input routine that uses the element without also updating the policy to have a valid value. Also while there, make sure the existing entries that are valid have the correct policy, if not, correct the policy. Remove unnecessary checks for nla_strlcpy() overflow once the policy has been set correctly. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/hns: Cleanup unnecessary exported symbolsLijun Ou
This patch removes the hns-roce.ko for cleanup all the exported symbols in common part. Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25docs: infiniband: convert docs to ReST and rename to *.rstMauro Carvalho Chehab
The InfiniBand docs are plain text with no markups. So, all we needed to do were to add the title markups and some markup sequences in order to properly parse tables, lists and literal blocks. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/hns: Fix an error code in hns_roce_set_user_sq_size()Dan Carpenter
This function is supposed to return negative kernel error codes but here it returns CMD_RST_PRC_EBUSY (2). The error code eventually gets passed to IS_ERR() and since it's not an error pointer it leads to an Oops in hns_roce_v1_rsv_lp_qp() Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25RDMA/hns: fix potential integer overflow on left shiftColin Ian King
There is a potential integer overflow when int i is left shifted as this is evaluated using 32 bit arithmetic but is being used in a context that expects an expression of type dma_addr_t. Fix this by casting integer i to dma_addr_t before shifting to avoid the overflow. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 2ac0bc5e725e ("RDMA/hns: Add a group interfaces for optimizing buffers getting flow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Refactor MR descriptors allocationMax Gurtovoy
Improve code readability using static helpers for each memory region type. Re-use the common logic to get smaller functions that are easy to maintain and reduce code duplication. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Use PA mapping for PI handoverMax Gurtovoy
If possibe, avoid doing a UMR operation to register data and protection buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the SG lists using PA access. This is safe, since the internal key for data and protection never exposed to the remote server (only signature key might be exposed). If PA mappings are not possible, perform mapping using MTT/KLM descriptors. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1266.4K/1262.4K 1720.1K/1732.1K 4k 793139/570902 1129.6K/773982 32k 72660/72086 97229/96164 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1590.2K/1600.1K 1828.2K/1830.3K 4k 1078.1K/937272 1142.1K/815304 32k 77012/77369 98125/97435 Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Improve PI handover performanceIsrael Rukshin
In some loads, there is performance degradation when using KLM mkey instead of MTT mkey. This is because KLM descriptor access is via indirection that might require more HW resources and cycles. Using KLM descriptor is not necessary when there are no gaps at the data/metadata sg lists. As an optimization, use MTT mkey whenever it is possible. For that matter, allocate internal MTT mkey and choose the effective pi_mr for in transaction according to the required mapping scheme. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o/baseline): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1262.4K/1243.3K/1147.1K 1732.1K/1725.1K/1423.8K 4k 570902/571233/457874 773982/743293/642080 32k 72086/72388/71933 96164/71789/93249 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1600.1K/1572.1K/1393.3K 1830.3K/1823.5K/1557.2K 4k 937272/921992/762934 815304/753772/646071 32k 77369/75052/72058 97435/73180/94612 Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Idan Burstein <idanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR codeIsrael Rukshin
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY was used. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/rw: Use IB_WR_REG_MR_INTEGRITY for PI handoverIsrael Rukshin
Replace the old signature handover API with the new one. The new API simplifes PI handover code complexity for ULPs and improve performance. For RW API it will reduce the maximum number of work requests per task and the need of dealing with multiple MRs (and their registrations and invalidations) per task. All the mappings and registration of the data and the protection buffers is done by the LLD using a single WR and a special MR type (IB_MR_TYPE_INTEGRITY) for the PI handover operation. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1243.3K/1182.3K 1725.1K/1680.2K 4k 571233/528835 743293/748259 32k 72388/71086 71789/93573 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1572.1K/1427.2K 1823.5K/1724.3K 4k 921992/916194 753772/768267 32k 75052/73960 73180/95484 There is a performance degradation when writing big block sizes. Degradation is caused by the complexity of combining multiple indirections and perform RDMA READ operation from it. This will be fixed in the following patches by reducing the indirections if possible. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/rw: Introduce rdma_rw_inv_key helperIsrael Rukshin
This is a preparation for adding new signature API to the rw-API. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Validate integrity handover device capMax Gurtovoy
Protect the case that a ULP tries to allocate a QP with signature enabled flag while the LLD doesn't support this feature. While we're here, also move integrity_en attribute from mlx5_qp to ib_qp as a preparation for adding new integrity API to the rw-API (that is part of ib_core module). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Rename signature qp create flag and signature device capabilityIsrael Rukshin
Rename IB_QP_CREATE_SIGNATURE_EN to IB_QP_CREATE_INTEGRITY_EN and IB_DEVICE_SIGNATURE_HANDOVER to IB_DEVICE_INTEGRITY_HANDOVER. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Add an integrity MR pool supportIsrael Rukshin
This is a preparation for adding new signature API to the rw-API. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24IB/iser: Unwind WR union at iser_tx_descIsrael Rukshin
After decreasing WRs array size from 7 to 3 it is more readable to give each WR a descriptive name. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handoverIsrael Rukshin
Using this new API reduces iSER code complexity. It also reduces the maximum number of work requests per task and the need of dealing with multiple MRs (and their registrations and invalidations) per task. It is done by using a single WR and a special MR type (IB_MR_TYPE_INTEGRITY) for PI operation. The setup of the tested benchmark: - 2 servers with 24 cores (1 initiator and 1 target) - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1236.6K/1164.3K 1357.2K/1332.8K 1k 1196.5K/1163.8K 1348.4K/1262.7K 2k 1016.7K/921950 1003.7K/931230 4k 662728/600545 595423/501513 8k 385954/384345 333775/277090 16k 222864/222820 170317/170671 32k 116869/114896 82331/82244 64k 55205/54931 40264/40021 Using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1090.1K/1030.9K 1303.9K/1101.4K 1k 1057.7K/904583 1318.4K/988085 2k 965226/638799 1008.6K/692514 4k 555479/410151 542414/414517 8k 298675/224964 264729/237508 16k 133485/122481 164625/138647 32k 74329/67615 80143/78743 64k 35716/35519 39294/37334 We get performance improvement at all block sizes. The most significant improvement is when writing 4k bs (almost 30% more iops). Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work requestMax Gurtovoy
This new WR will be used to perform PI (protection information) handover using the new API. Using the new API, the user will post a single WR that will internally perform all the needed actions to complete PI operation. This new WR will use a memory region that was allocated as IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the registration. In the old API, in order to perform a signature handover operation, each ULP should perform the following: 1. Map and register the data buffers. 2. Map and register the protection buffers. 3. Post a special reg WR to configure the signature handover operation layout. 4. Invalidate the signature memory key. 5. Invalidate protection buffers memory key. 6. Invalidate data buffers memory key. In the new API, the mapping of both data and protection buffers is performed using a single call to ib_map_mr_sg_pi function. Also the registration of the buffers and the configuration of the signature operation layout is done by a single new work request called IB_WR_REG_MR_INTEGRITY. This patch implements this operation for mlx5 devices that are capable to offload data integrity generation/validation while performing the actual buffer transfer. This patch will not remove the old signature API that is used by the iSER initiator and target drivers. This will be done in the future. In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work request, we are using a single UMR operation to register both data and protection buffers using KLM's. Afterwards, another UMR operation will describe the strided block format. These will be followed by 2 SET_PSV operations to set the memory/wire domains initial signature parameters passed by the user. In the end of the whole transaction, only the signature memory key (the one that exposed for the RDMA operation) will be invalidated. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Update set_sig_data_segment attribute for new signature APIMax Gurtovoy
Explicitly pass the sig_mr and the access flags for the mkey segment configuration. This function will be used also in the new signature API, so modify it in order to use it in both APIs. This is a preparation commit before adding new signature API. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Pass UMR segment flags instead of booleanMax Gurtovoy
UMR ctrl segment flags can vary between UMR operations. for example, using inline UMR or adding free/not-free checks for a memory key. This is a preparation commit before adding new signature API that will not need not-free checks for the internal memory key during the UMR operation. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>