summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5/qp.c
AgeCommit message (Collapse)Author
2020-04-28RDMA/mlx5: Organize QP types checks in one placeLeon Romanovsky
Perform check if QP type is supported in one place at the beginning of the create_qp function instead of current implementation with checks buried inside of the code. Link: https://lore.kernel.org/r/20200427154636.381474-2-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-23net/mlx5: Update transobj.c new cmd interfaceLeon Romanovsky
Do mass update of transobj.c to reuse newly introduced mlx5_cmd_exec_in*() interfaces. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-22RDMA/mlx5: Set GRH fields in query QP on RoCEAharon Landau
GRH fields such as sgid_index, hop limit, et. are set in the QP context when QP is created/modified. Currently, when query QP is performed, we fill the GRH fields only if the GRH bit is set in the QP context, but this bit is not set for RoCE. Adjust the check so we will set all relevant data for the RoCE too. Since this data is returned to userspace, the below is an ABI regression. Fixes: d8966fcd4c25 ("IB/core: Use rdma_ah_attr accessor functions") Link: https://lore.kernel.org/r/20200413132028.930109-1-leon@kernel.org Signed-off-by: Aharon Landau <aharonl@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-19net/mlx5: Move QP logic to mlx5_ibLeon Romanovsky
The mlx5_core doesn't need any functionality coded in qp.c, so move that file to drivers/infiniband/ be under mlx5_ib responsibility. Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "The majority of the patches are cleanups, refactorings and clarity improvements. This cycle saw some more activity from Syzkaller, I think we are now clean on all but one of those bugs, including the long standing and obnoxious rdma_cm locking design defect. Continue to see many drivers getting cleanups, with a few new user visible features. Summary: - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1 - Lots of cleanup patches for hns - Convert more places to use refcount - Aggressively lock the RDMA CM code that syzkaller says isn't working - Work to clarify ib_cm - Use the new ib_device lifecycle model in bnxt_re - Fix mlx5's MR cache which seems to be failing more often with the new ODP code - mlx5 'dynamic uar' and 'tx steering' user interfaces" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits) RDMA/bnxt_re: make bnxt_re_ib_init static IB/qib: Delete struct qib_ivdev.qp_rnd RDMA/hns: Fix uninitialized variable bug RDMA/hns: Modify the mask of QP number for CQE of hip08 RDMA/hns: Reduce the maximum number of extend SGE per WQE RDMA/hns: Reduce PFC frames in congestion scenarios RDMA/mlx5: Add support for RDMA TX flow table net/mlx5: Add support for RDMA TX steering IB/hfi1: Call kobject_put() when kobject_init_and_add() fails IB/hfi1: Fix memory leaks in sysfs registration and unregistration IB/mlx5: Move to fully dynamic UAR mode once user space supports it IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib IB/mlx5: Extend QP creation to get uar page index from user space IB/mlx5: Extend CQ creation to get uar page index from user space IB/mlx5: Expose UAR object and its alloc/destroy commands IB/hfi1: Get rid of a warning RDMA/hns: Remove redundant judgment of qp_type RDMA/hns: Remove redundant assignment of wc->smac when polling cq RDMA/hns: Remove redundant qpc setup operations RDMA/hns: Remove meaningless prints ...
2020-03-27IB/mlx5: Move to fully dynamic UAR mode once user space supports itYishai Hadas
Move to fully dynamic UAR mode once user space supports it. In this case we prevent any legacy mode of UARs on the allocated context and prevent redundant allocation of the static ones. Link: https://lore.kernel.org/r/20200324060143.1569116-6-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27IB/mlx5: Extend QP creation to get uar page index from user spaceYishai Hadas
Extend QP creation to get uar page index from user space, this mode can be used with the UAR dynamic mode APIs to allocate/destroy a UAR object. As part of enabling this option blocked the weird/un-supported cross channel option which uses index 0 hard-coded. This QP flag wasn't exposed to user space as part of any formal upstream release, the dynamic option can allow having valid UAR page index instead. Link: https://lore.kernel.org/r/20200324060143.1569116-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-25RDMA/mlx5: Block delay drop to unprivileged usersMaor Gottlieb
It has been discovered that this feature can globally block the RX port, so it should be allowed for highly privileged users only. Fixes: 03404e8ae652("IB/mlx5: Add support to dropless RQ") Link: https://lore.kernel.org/r/20200322124906.1173790-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24RDMA/mlx5: Fix access to wrong pointer while performing flush due to errorLeon Romanovsky
The main difference between send and receive SW completions is related to separate treatment of WQ queue. For receive completions, the initial index to be flushed is stored in "tail", while for send completions, it is in deleted "last_poll". CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G OE --------- -t - 4.18.0-147.el8.ppc64le #1 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] NIP: c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000 REGS: c0000036cc9db940 TRAP: 0400 Tainted: G OE --------- -t - (4.18.0-147.el8.ppc64le) MSR: 9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24004488 XER: 20040000 CFAR: c00800000e586af0 IRQMASK: 0 GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800 GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011 GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438 GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010 GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000 GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800 NIP [c000003c7c00a000] 0xc000003c7c00a000 LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core] Call Trace: [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable) [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core] [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0 [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760 [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0 [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80 Fixes: 8e3b68830186 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion") Link: https://lore.kernel.org/r/20200318091640.44069-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04Merge tag 'v5.6-rc4' into rdma.git for-nextJason Gunthorpe
Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04RDMA/providers: Fix return value when QP type isn't supportedKamal Heib
The proper return code is "-EOPNOTSUPP" when the requested QP type is not supported by the provider. Link: https://lore.kernel.org/r/20200130082049.463-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supportedMark Zhang
When binding a QP with a counter and the QP state is not RESET, return failure if the rts2rts_qp_counters_set_id is not supported by the device. This is to prevent cases like manual bind for Connect-IB devices from returning success when the feature is not supported. Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter") Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21Merge tag 'rds-odp-for-5.5' into rdma.git for-nextJason Gunthorpe
From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== * tag 'rds-odp-for-5.5': net/rds: Use prefetch for On-Demand-Paging MR net/rds: Handle ODP mr registration/unregistration net/rds: Detect need of On-Demand-Paging memory registration RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs RDMA/mlx5: Don't fake udata for kernel path IB/mlx5: Add ODP WQE handlers for kernel QPs IB/core: Add interface to advise_mr for kernel users IB/core: Introduce ib_reg_user_mr IB: Allow calls to ib_umem_get from kernel ULPs Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/mlx5: Set relaxed ordering when requestedMichael Guralnik
Enable relaxed ordering in the mkey context when requested. As relaxed ordering is not currently supported in UMR, disable UMR usage for relaxed ordering MRs. Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB/mlx5: Add ODP WQE handlers for kernel QPsMoni Shoua
One of the steps in ODP page fault handler for WQEs is to read a WQE from a QP send queue or receive queue buffer at a specific index. Since the implementation of this buffer is different between kernel and user QP the implementation of the handler needs to be aware of that and handle it in a different way. ODP for kernel MRs is currently supported only for RDMA_READ and RDMA_WRITE operations so change the handler to - read a WQE from a kernel QP send queue - fail if access to receive queue or shared receive queue is required for a kernel QP Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB: Allow calls to ib_umem_get from kernel ULPsMoni Shoua
So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-03RDMA/mlx5: use true,false for bool variablezhengbin
Fixes coccicheck warning: drivers/infiniband/hw/mlx5/mr.c:150:2-26: WARNING: Assignment of 0/1 to bool variable drivers/infiniband/hw/mlx5/mr.c:1455:2-26: WARNING: Assignment of 0/1 to bool variable drivers/infiniband/hw/mlx5/qp.c:1874:6-20: WARNING: Assignment of 0/1 to bool variable Link: https://lore.kernel.org/r/1577176812-2238-6-git-send-email-zhengbin13@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Again another fairly quiet cycle with few notable core code changes and the usual variety of driver bug fixes and small improvements. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits) RDMA/hns: Delete unnecessary callback functions for cq RDMA/hns: Rename the functions used inside creating cq RDMA/hns: Redefine the member of hns_roce_cq struct RDMA/hns: Redefine interfaces used in creating cq RDMA/efa: Expose RDMA read related attributes RDMA/efa: Support remote read access in MR registration RDMA/efa: Store network attributes in device attributes IB/hfi1: remove redundant assignment to variable ret RDMA/bnxt_re: Fix missing le16_to_cpu RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series RDMA/bnxt_re: Fix Kconfig indentation IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset RDMA/cm: Use refcount_t type for refcount variable IB/mlx5: Support extended number of strides for Striding RQ IB/mlx4: Update HW GID table while adding vlan GID ...
2019-11-19IB/mlx5: Support extended number of strides for Striding RQMark Zhang
Extends the minimum single WQE strides from 64 to 8, which is exposed by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps. Choose right number of strides based on FW capability. Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17IB/umem: remove the dmasync argument to ib_umem_getChristoph Hellwig
The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-31IB/mlx5: Test write combining supportMichael Guralnik
Linux can run in all sorts of physical machines and VMs where write combining may or may not be supported. Currently there is no way to reliably tell if the system supports WC, or not. The driver uses WC to optimize posting work to the HCA, and getting this wrong in either direction can cause a significant performance loss. Add a test in mlx5_ib initialization process to test whether write-combining is supported on the machine. The test will run as part of the enable_driver callback to ensure that the test runs after the device is setup and can create and modify the QP needed, but runs before the device is exposed to the users. The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame is different from the WQE in memory, requesting CQE only on the BlueFlame WQE. By checking whether we received a completion on one of these WQEs we can know if BlueFlame succeeded and this write-combining must be supported. Change reporting of BlueFlame support to be dependent on write-combining support instead of the FW's guess as to what the machine can do. Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22IB/mlx5: Align usage of QP1 create flags with rest of mlx5 definesMichael Guralnik
There is little value in keeping separate function for one flag, provide it directly like any other mlx5 define. Link: https://lore.kernel.org/r/20191020064400.8344-2-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-17RDMA/mlx5: Clear old rate limit when closing QPRafi Wiener
Before QP is closed it changes to ERROR state, when this happens the QP was left with old rate limit that was already removed from the table. Fixes: 7d29f349a4b9 ("IB/mlx5: Properly adjust rate limit on QP state transitions") Signed-off-by: Rafi Wiener <rafiw@mellanox.com> Signed-off-by: Oleg Kuporosov <olegk@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-09-13Merge tag 'v5.3-rc8' into rdma.git for-nextJason Gunthorpe
To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-20IB/mlx5: Block MR WR if UMR is not possibleMoni Shoua
Check conditions that are mandatory to post_send UMR WQEs. 1. Modifying page size. 2. Modifying remote atomic permissions if atomic access is required. If either condition is not fulfilled then fail to post_send() flow. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-9-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29Merge branch 'wip/dl-for-rc' into wip/dl-for-nextDoug Ledford
The fix for IB port statistics initialization ("IB/core: Fix querying total rdma stats") is needed before we take a follow-on patch to for-next. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29IB/mlx5: Support per device q counters in switchdev modeParav Pandit
When parent mlx5_core_dev is in switchdev mode, q_counters are not applicable to multiple non uplink vports. Hence, have make them limited to device level. While at it, correct __mlx5_ib_qp_set_counter() and __mlx5_ib_modify_qp() to use u16 set_id as defined by the device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190723073117.7175-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-25IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specificationYishai Hadas
The specification for the Toeplitz function doesn't require to set the key explicitly to be symmetric. In case a symmetric functionality is required a symmetric key can be simply used. Wrongly forcing the algorithm to symmetric causes the wrong packet distribution and a performance degradation. Link: https://lore.kernel.org/r/20190723065733.4899-7-leon@kernel.org Cc: <stable@vger.kernel.org> # 4.7 Fixes: 28d6137008b2 ("IB/mlx5: Add RSS QP support") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Alex Vainman <alexv@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24IB/mlx5: Fix unreg_umr to ignore the mkey stateYishai Hadas
Fix unreg_umr to ignore the mkey state and do not fail if was freed. This prevents a case that a user space application already changed the mkey state to free and then the UMR operation will fail leaving the mkey in an inappropriate state. Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org Cc: <stable@vger.kernel.org> # 3.19 Fixes: 968e78dd9644 ("IB/mlx5: Enhance UMR support to allow partial page table update") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05IB/mlx5: Support set qp counterMark Zhang
Support bind a qp with counter. If counter is null then bind the qp to the default counter. Different QP state has different operation: - RESET: Set the counter field so that it will take effective during RST2INIT change; - RTS: Issue an RTS2RTS change to update the QP counter; - Other: Set the counter field and mark the counter_pending flag, when QP is moved to RTS state and this flag is set, then issue an RTS2RTS modification to update the counter. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03Merge mlx5-next into rdma for-nextJason Gunthorpe
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in the next patches. Resolved the conflicts: - esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports() version - esw_offloads_steering_init() drop the cap test - esw_offloads_init() drop the extra function arguments * branch 'mlx5-next': (39 commits) net/mlx5: Expose device definitions for object events net/mlx5: Report EQE data upon CQ completion net/mlx5: Report a CQ error event only when a handler was set net/mlx5: mlx5_core_create_cq() enhancements net/mlx5: Expose the API to register for ANY event net/mlx5: Use event mask based on device capabilities net/mlx5: Fix mlx5_core_destroy_cq() error flow net/mlx5: E-Switch, Handle UC address change in switchdev mode net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop net/mlx5: E-Switch, Use iterator for vlan and min-inline setups net/mlx5: E-Switch, Reg/unreg function changed event at correct stage net/mlx5: E-Switch, Consolidate eswitch function number of VFs net/mlx5: E-Switch, Refactor eswitch SR-IOV interface net/mlx5: Handle host PF vport mac/guid for ECPF net/mlx5: E-Switch, Use correct flags when configuring vlan net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs net/mlx5: Don't handle VF func change if host PF is disabled net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices net/mlx5: Move pci status reg access mutex to mlx5_pci_init net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03net/mlx5: Report EQE data upon CQ completionYishai Hadas
Report EQE data upon CQ completion to let upper layers use this data. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-06-24RDMA/mlx5: Use PA mapping for PI handoverMax Gurtovoy
If possibe, avoid doing a UMR operation to register data and protection buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the SG lists using PA access. This is safe, since the internal key for data and protection never exposed to the remote server (only signature key might be exposed). If PA mappings are not possible, perform mapping using MTT/KLM descriptors. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1266.4K/1262.4K 1720.1K/1732.1K 4k 793139/570902 1129.6K/773982 32k 72660/72086 97229/96164 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1590.2K/1600.1K 1828.2K/1830.3K 4k 1078.1K/937272 1142.1K/815304 32k 77012/77369 98125/97435 Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Improve PI handover performanceIsrael Rukshin
In some loads, there is performance degradation when using KLM mkey instead of MTT mkey. This is because KLM descriptor access is via indirection that might require more HW resources and cycles. Using KLM descriptor is not necessary when there are no gaps at the data/metadata sg lists. As an optimization, use MTT mkey whenever it is possible. For that matter, allocate internal MTT mkey and choose the effective pi_mr for in transaction according to the required mapping scheme. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o/baseline): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1262.4K/1243.3K/1147.1K 1732.1K/1725.1K/1423.8K 4k 570902/571233/457874 773982/743293/642080 32k 72086/72388/71933 96164/71789/93249 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1600.1K/1572.1K/1393.3K 1830.3K/1823.5K/1557.2K 4k 937272/921992/762934 815304/753772/646071 32k 77369/75052/72058 97435/73180/94612 Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Idan Burstein <idanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR codeIsrael Rukshin
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY was used. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Validate integrity handover device capMax Gurtovoy
Protect the case that a ULP tries to allocate a QP with signature enabled flag while the LLD doesn't support this feature. While we're here, also move integrity_en attribute from mlx5_qp to ib_qp as a preparation for adding new integrity API to the rw-API (that is part of ib_core module). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Rename signature qp create flag and signature device capabilityIsrael Rukshin
Rename IB_QP_CREATE_SIGNATURE_EN to IB_QP_CREATE_INTEGRITY_EN and IB_DEVICE_SIGNATURE_HANDOVER to IB_DEVICE_INTEGRITY_HANDOVER. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work requestMax Gurtovoy
This new WR will be used to perform PI (protection information) handover using the new API. Using the new API, the user will post a single WR that will internally perform all the needed actions to complete PI operation. This new WR will use a memory region that was allocated as IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the registration. In the old API, in order to perform a signature handover operation, each ULP should perform the following: 1. Map and register the data buffers. 2. Map and register the protection buffers. 3. Post a special reg WR to configure the signature handover operation layout. 4. Invalidate the signature memory key. 5. Invalidate protection buffers memory key. 6. Invalidate data buffers memory key. In the new API, the mapping of both data and protection buffers is performed using a single call to ib_map_mr_sg_pi function. Also the registration of the buffers and the configuration of the signature operation layout is done by a single new work request called IB_WR_REG_MR_INTEGRITY. This patch implements this operation for mlx5 devices that are capable to offload data integrity generation/validation while performing the actual buffer transfer. This patch will not remove the old signature API that is used by the iSER initiator and target drivers. This will be done in the future. In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work request, we are using a single UMR operation to register both data and protection buffers using KLM's. Afterwards, another UMR operation will describe the strided block format. These will be followed by 2 SET_PSV operations to set the memory/wire domains initial signature parameters passed by the user. In the end of the whole transaction, only the signature memory key (the one that exposed for the RDMA operation) will be invalidated. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Update set_sig_data_segment attribute for new signature APIMax Gurtovoy
Explicitly pass the sig_mr and the access flags for the mkey segment configuration. This function will be used also in the new signature API, so modify it in order to use it in both APIs. This is a preparation commit before adding new signature API. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Pass UMR segment flags instead of booleanMax Gurtovoy
UMR ctrl segment flags can vary between UMR operations. for example, using inline UMR or adding free/not-free checks for a memory key. This is a preparation commit before adding new signature API that will not need not-free checks for the internal memory key during the UMR operation. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-20RDMA: Check umem pointer validity prior to releaseLeon Romanovsky
Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20RDMA: Convert destroy_wq to be voidLeon Romanovsky
All callers of destroy WQ are always success and there is no need to check their return value, so convert destroy_wq to be void. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-05-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. Summary: - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits) RDMA/ipoib: Allow user space differentiate between valid dev_port IB/core, ipoib: Do not overreact to SM LID change event RDMA/device: Don't fire uevent before device is fully initialized lib/scatterlist: Remove leftover from sg_page_iter comment RDMA/efa: Add driver to Kconfig/Makefile RDMA/efa: Add the efa module RDMA/efa: Add EFA verbs implementation RDMA/efa: Add common command handlers RDMA/efa: Implement functions that submit and complete admin commands RDMA/efa: Add the ABI definitions RDMA/efa: Add the com service API definitions RDMA/efa: Add the efa_com.h file RDMA/efa: Add the efa.h header file RDMA/efa: Add EFA device definitions RDMA: Add EFA related definitions RDMA/umem: Remove hugetlb flag RDMA/bnxt_re: Use core helpers to get aligned DMA address RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks RDMA/umem: Add API to find best driver supported page size in an MR ...
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support AES128-CCM ciphers in kTLS, from Vakul Garg. 2) Add fib_sync_mem to control the amount of dirty memory we allow to queue up between synchronize RCU calls, from David Ahern. 3) Make flow classifier more lockless, from Vlad Buslov. 4) Add PHY downshift support to aquantia driver, from Heiner Kallweit. 5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces contention on SLAB spinlocks in heavy RPC workloads. 6) Partial GSO offload support in XFRM, from Boris Pismenny. 7) Add fast link down support to ethtool, from Heiner Kallweit. 8) Use siphash for IP ID generator, from Eric Dumazet. 9) Pull nexthops even further out from ipv4/ipv6 routes and FIB entries, from David Ahern. 10) Move skb->xmit_more into a per-cpu variable, from Florian Westphal. 11) Improve eBPF verifier speed and increase maximum program size, from Alexei Starovoitov. 12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit spinlocks. From Neil Brown. 13) Allow tunneling with GUE encap in ipvs, from Jacky Hu. 14) Improve link partner cap detection in generic PHY code, from Heiner Kallweit. 15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan Maguire. 16) Remove SKB list implementation assumptions in SCTP, your's truly. 17) Various cleanups, optimizations, and simplifications in r8169 driver. From Heiner Kallweit. 18) Add memory accounting on TX and RX path of SCTP, from Xin Long. 19) Switch PHY drivers over to use dynamic featue detection, from Heiner Kallweit. 20) Support flow steering without masking in dpaa2-eth, from Ioana Ciocoi. 21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri Pirko. 22) Increase the strict parsing of current and future netlink attributes, also export such policies to userspace. From Johannes Berg. 23) Allow DSA tag drivers to be modular, from Andrew Lunn. 24) Remove legacy DSA probing support, also from Andrew Lunn. 25) Allow ll_temac driver to be used on non-x86 platforms, from Esben Haabendal. 26) Add a generic tracepoint for TX queue timeouts to ease debugging, from Cong Wang. 27) More indirect call optimizations, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits) cxgb4: Fix error path in cxgb4_init_module net: phy: improve pause mode reporting in phy_print_status dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings net: macb: Change interrupt and napi enable order in open net: ll_temac: Improve error message on error IRQ net/sched: remove block pointer from common offload structure net: ethernet: support of_get_mac_address new ERR_PTR error net: usb: smsc: fix warning reported by kbuild test robot staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check net: dsa: support of_get_mac_address new ERR_PTR error net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats vrf: sit mtu should not be updated when vrf netdev is the link net: dsa: Fix error cleanup path in dsa_init_module l2tp: Fix possible NULL pointer dereference taprio: add null check on sched_nest to avoid potential null pointer dereference net: mvpp2: cls: fix less than zero check on a u32 variable net_sched: sch_fq: handle non connected flows net_sched: sch_fq: do not assume EDT packets are ordered net: hns3: use devm_kcalloc when allocating desc_cb net: hns3: some cleanup for struct hns3_enet_ring ...
2019-05-06Merge tag 'arm64-mmiowb' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull mmiowb removal from Will Deacon: "Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Remove mmiowb() from the kernel memory barrier API and instead, for architectures that need it, hide the barrier inside spin_unlock() when MMIO has been performed inside the critical section. The only relatively recent changes have been addressing review comments on the documentation, which is in a much better shape thanks to the efforts of Ben and Ingo. I was initially planning to split this into two pull requests so that you could run the coccinelle script yourself, however it's been plain sailing in linux-next so I've just included the whole lot here to keep things simple" * tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits) docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section arch: Remove dummy mmiowb() definitions from arch code net/ethernet/silan/sc92031: Remove stale comment about mmiowb() i40iw: Redefine i40iw_mmiowb() to do nothing scsi/qla1280: Remove stale comment about mmiowb() drivers: Remove explicit invocations of mmiowb() drivers: Remove useless trailing comments from mmiowb() invocations Documentation: Kill all references to mmiowb() riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() m68k/io: Remove useless definition of mmiowb() nds32/io: Remove useless definition of mmiowb() x86/io: Remove useless definition of mmiowb() arm64/io: Remove useless definition of mmiowb() ARM/io: Remove useless definition of mmiowb() mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors ...
2019-05-03IB/mlx5: Add missing XRC options to QP optional params maskJack Morgenstein
The QP transition optional parameters for the various transition for XRC QPs are identical to those for RC QPs. Many of the XRC QP transition optional parameter bits are missing from the QP optional mask table. These omissions caused failures when doing XRC QP state transitions. For example, when trying to change the response timer of an XRC receive QP via the RTS2RTS transition, the new timer value was ignored because MLX5_QP_OPTPAR_RNR_TIMEOUT bit was missing from the optional params mask for XRC qps for the RTS2RTS transition. Fix this by adding the missing XRC optional parameters for all QP transitions to the opt_mask table. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Fixes: a4774e9095de ("IB/mlx5: Fix opt param mask according to firmware spec") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three trivial overlapping conflicts. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25Merge branch 'mlx5_tir_icm' into rdma.git for-nextJason Gunthorpe
Ariel Levkovich says: ==================== The series exposes the ICM address of the receive transport interface (TIR) of Raw Packet and RSS QPs to the user since they are required to properly create and insert steering rules that direct flows to these QPs. ==================== For dependencies this branch is based on mlx5-next from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'mlx5_tir_icm': IB/mlx5: Expose TIR ICM address to user space net/mlx5: Introduce new TIR creation core API net/mlx5: Expose TIR ICM address in command outbox Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25IB/mlx5: Expose TIR ICM address to user spaceAriel Levkovich
This patch exposes the TIR ICM address of raw packet and RSS QPs to user space. In order to pass the new field, the patch extends the mlx5 specific QP creation response structure and fills it with the icm address returned by the FW command, if available. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24Merge branch 'rdma_mmap' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== Upon review it turns out there are some long standing problems in BAR mapping area: * BAR pages intended for read-only can be switched to writable via mprotect. * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page. * Disassociate causes SIGBUS when touching the pages. * CPU pages are being mapped through to the process via remap_pfn_range instead of the more appropriate vm_insert_page, causing weird behaviors during disassociation. This series adds the missing VM_* flag manipulation, adds faulting a zero page for disassociation and revises the CPU page mappings to use vm_insert_page. ==================== For dependencies this branch is based on for-rc from git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git * branch 'rdma_mmap': RDMA: Remove rdma_user_mmap_page RDMA/mlx5: Use get_zeroed_page() for clock_info RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>