summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5
AgeCommit message (Collapse)Author
2020-10-26RDMA: Convert sysfs device * show functions to use sysfs_emit()Joe Perches
Done with cocci script: @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Remove AH from uverbs_cmd_maskJason Gunthorpe
Drivers that need a uverbs AH should instead set the create_user_ah() op similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never implemented so are just deleted. Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check create_flags during create_qpJason Gunthorpe
Each driver should check that the QP attrs create_flags is supported. Unfortuantely when create_flags was added to the QP attrs the drivers were not updated. uverbs_ex_cmd_mask was used to block it - even though kernel drivers use these flags too. Check that flags is zero in all drivers that don't use it, remove IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code to be EOPNOTSUPP. Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check flags during create_cqJason Gunthorpe
Each driver should check that the CQ attrs is supported. Unfortuantely when flags was added to the CQ attrs the drivers were not updated, uverbs_ex_cmd_mask was used to block it. This was missed when create CQ was converted to ioctl, so non-zero flags could have been passed into drivers. Check that flags is zero in all drivers that don't use it, remove IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask. Fixes: 41b2a71fc848 ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file") Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check attr_mask during modify_qpJason Gunthorpe
Each driver should check that it can support the provided attr_mask during modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block modify_qp_ex because the driver didn't check RATE_LIMIT. Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check srq_type during create_srqJason Gunthorpe
uverbs was blocking srq_types the driver doesn't support based on the CREATE_XSRQ cmd_mask. Fix all drivers to check for supported srq_types during create_srq and move CREATE_XSRQ to the core code. Link: https://lore.kernel.org/r/5-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Move more uverbs_cmd_mask settings to the coreJason Gunthorpe
These functions all depend on the driver providing a specific op: - REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op - ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this without providing the op - OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow. - OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd() - CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq() - QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but sometimes supplies a NULL op. - RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op - ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an (now deleted) implementation but no userspace All drivers were checked that no drivers provide the op without also setting uverbs_cmd_mask so this should have no functional change. Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Remove elements in uverbs_cmd_mask that all drivers setJason Gunthorpe
This is a step toward eliminating uverbs_cmd_mask. Preset this list in the core code. Only the op reg_user_mr wasn't already being required from the drivers. Link: https://lore.kernel.org/r/3-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Remove uverbs_ex_cmd_mask values that are linked to functionsJason Gunthorpe
Since a while now the uverbs layer checks if the driver implements a function before allowing the ucmd to proceed. This largely obsoletes the cmd_mask stuff, but there is some tricky bits in drivers preventing it from being removed. Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the core code. These are triggered soley based on the related ops function pointer. query_device_ex is not triggered based on an op, but all drivers already implement something compatible with the extension, so enable it globally too. Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA/mlx5: Fix devlink deadlock on net namespace deletionParav Pandit
When a mlx5 core devlink instance is reloaded in different net namespace, its associated IB device is deleted and recreated. Example sequence is: $ ip netns add foo $ devlink dev reload pci/0000:00:08.0 netns foo $ ip netns del foo mlx5 IB device needs to attach and detach the netdevice to it through the netdev notifier chain during load and unload sequence. A below call graph of the unload flow. cleanup_net() down_read(&pernet_ops_rwsem); <- first sem acquired ops_pre_exit_list() pre_exit() devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one() [...] mlx5_ib_remove() mlx5_ib_unbind_slave_port() mlx5_remove_netdev_notifier() unregister_netdevice_notifier() down_write(&pernet_ops_rwsem);<- recurrsive lock Hence, when net namespace is deleted, mlx5 reload results in deadlock. When deadlock occurs, devlink mutex is also held. This not only deadlocks the mlx5 device under reload, but all the processes which attempt to access unrelated devlink devices are deadlocked. Hence, fix this by mlx5 ib driver to register for per net netdev notifier instead of global one, which operats on the net namespace without holding the pernet_ops_rwsem. Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload") Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16RDMA: Explicitly pass in the dma_device to ib_register_deviceJason Gunthorpe
The code in setup_dma_device has become rather convoluted, move all of this to the drivers. Drives now pass in a DMA capable struct device which will be used to setup DMA, or drivers must fully configure the ibdev for DMA and pass in NULL. Other than setting the masks in rvt all drivers were doing this already anyhow. mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for DMA based on their hardweare limits in: __mthca_init_one() dma_set_max_seg_size (1G) __mlx4_init_one() dma_set_max_seg_size (1G) mlx5_pci_init() set_dma_caps() dma_set_max_seg_size (2G) Other non software drivers (except usnic) were extended to UINT_MAX [1, 2] instead of 2G as was before. [1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/ [2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/ Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/core: Modify enum ib_gid_type and enum rdma_network_typeAvihai Horon
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so enum ib_gid_type will match the gid types of the new query GID table API which will be introduced in the following patches. This change in enum ib_gid_type requires to change also enum rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1 values. Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/mlx5: Sync device with CPU pages upon ODP MR registrationYishai Hadas
Sync device with CPU pages upon ODP MR registration. mlx5 already has to zero the HW's version of the PAS list, may as well deliver a PAS list that matches the current CPU page tables configuration. Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/mlx5: Extend advice MR to support non faulting modeYishai Hadas
Extend advice MR to support non faulting mode, this can improve performance by increasing the populated page tables in the device. Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01IB/core: Enable ODP sync without faultingYishai Hadas
Enable ODP sync without faulting, this improves performance by reducing the number of page faults in the system. The gain from this option is that the device page table can be aligned with the presented pages in the CPU page table without causing page faults. As of that, the overhead on data path from hardware point of view to trigger a fault which end-up by calling the driver to bring the pages will be dropped. Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01IB/core: Improve ODP to use hmm_range_fault()Yishai Hadas
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve performance in a few aspects: This includes: - Dropping the need to allocate and free memory to hold its output - No need any more to use put_page() to unpin the pages - The logic to detect contiguous pages is done based on the returned order, no need to run per page and evaluate. In addition, moving to use hmm_range_fault() enables to reduce page faults in the system with it's snapshot mode, this will be introduced in next patches from this series. As part of this, cleanup some flows and use the required data structures to work with hmm_range_fault(). Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29RDMA/drivers: Remove udata check from special QPLeon Romanovsky
GSI QP can't be created from the user space, hence the udata check is always false (udata == NULL). Remove that check and simplify the flow. Link: https://lore.kernel.org/r/20200926102450.2966017-9-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29RDMA/mlx5: Delete not needed GSI QP signal QP typeLeon Romanovsky
GSI QP doesn't need signal QP type because it is initialized statically to zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too. This means that the GSI QP signal QP type can be removed. Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29RDMA/mlx5: Change GSI QP to have same creation flow like other QPsLeon Romanovsky
There is no reason to have separate create flow for the GSI QP, while general create_qp routine has all needed checks and ability to allocate and free the proper struct mlx5_ib_qp. Link: https://lore.kernel.org/r/20200926102450.2966017-4-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29RDMA/mlx5: Reuse existing fields in parent QP storage objectLeon Romanovsky
Remove duplication of mlx5_ib_qp and mlx5_ib_gsi_qp fields. This change returns the memory footprint of mlx5_ib QP to be as it was before embedding GSI QP. Link: https://lore.kernel.org/r/20200926102450.2966017-3-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29RDMA/mlx5: Embed GSI QP into general mlx5_ib QPLeon Romanovsky
The GSI QPs have different create flow from the regular QPs, but it is not really needed. Update the code to use mlx5_ib_qp as a storage class for all outside of GSI calls. Link: https://lore.kernel.org/r/20200926102450.2966017-2-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-25RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()Liu Shixin
sizeof() when applied to a pointer typed expression should give the size of the pointed data, even if the data is a pointer. Fixes: e1f24a79f424 ("IB/mlx5: Support congestion related counters") Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22RDMA/mlx5: Don't call to restrack recursivelyLeon Romanovsky
The restrack is going to manage memory of all IB objects and must be called before object is created. GSI QP in the mlx5_ib separated between creating dummy interface and HW object beneath. This was achieved by double call to ib_create_qp(). In order to skip such reentry call to internal driver create_qp code. Link: https://lore.kernel.org/r/20200922091106.2152715-3-leon@kernel.org Reviewed-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/mlx5: Clarify what the UMR is for when creating MRsJason Gunthorpe
Once a mkey is created it can be modified using UMR. This is desirable for performance reasons. However, different hardware has restrictions on what modifications are possible using UMR. Make sense of these checks: - mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be altered. Most cases create MRs using 0 access flags (now made clear by consistent use of set_mkc_access_pd_addr_fields()), but the old logic here was tormented. Make it clear that this is checking if the current access_flags can be modified using UMR to different access_flags. It is always OK to use UMR to change flags that all HW supports. - mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to enable and update the PAS/XLT. Enabling requires updating the entity size, so UMR ends up completely disabled on this old hardware. Make it clear why it is disabled. FRWR, ODP and cache always requires mlx5_ib_can_load_pas_with_umr(). - mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be resized to hold a new PAS list. This only works for cached MR's because we don't store the PAS list size in other cases. To be very clear, arrange things so any pre-created MR's in the cache check the newly requested access_flags before allowing the MR to leave the cache. If UMR cannot set the required access_flags the cache fails to create the MR. This in turn means relaxed ordering and atomic are now correctly blocked early for implicit ODP on older HW. Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't workJason Gunthorpe
set_reg_wr() always fails if !umr_modify_entity_size_disabled because mlx5_ib_can_use_umr() always fails. Without set_reg_wr() IB_WR_REG_MR doesn't work and that means the device should not advertise IB_DEVICE_MEM_MGT_EXTENSIONS. Fixes: 841b07f99a47 ("IB/mlx5: Block MR WR if UMR is not possible") Link: https://lore.kernel.org/r/20200914112653.345244-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/mlx5: Make mkeys always owned by the kernel's PD when not enabledJason Gunthorpe
Any mkey that is not enabled and assigned to userspace should have the PD set to a kernel owned PD. When cache entries are created for the first time the PDN is set to 0, which is probably a kernel PD, but be explicit. When a MR is registered using the hybrid reg_create with UMR xlt & enable the disabled mkey is pointing at the user PD, keep it pointing at the kernel until a UMR enables it and sets the user PD. Fixes: 9ec4483a3f0f ("IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache") Link: https://lore.kernel.org/r/20200914112653.345244-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/mlx5: Use set_mkc_access_pd_addr_fields() in reg_create()Jason Gunthorpe
reg_create() open codes this helper, use the shared code. Link: https://lore.kernel.org/r/20200914112653.345244-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/mlx5: Remove dead check for EAGAIN after alloc_mr_from_cache()Jason Gunthorpe
alloc_mr_from_cache() no longer returns EAGAIN, this is just dead code now. Fixes: aad719dcf379 ("RDMA/mlx5: Allow MRs to be created in the cache synchronously") Link: https://lore.kernel.org/r/20200914112653.345244-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18Merge branch 'mlx_sw_owner_v2' into rdma.git for-nextJason Gunthorpe
Leon Romanovsky says: ==================== This series from Alex extends software steering interface to support devices with extra capability "sw_owner_2" which will replace existing "sw_owner". ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies. * branch 'mlx5_sw_owner_v2: RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices RDMA/mlx5: Add sw_owner_v2 bit capability
2020-09-18Merge branch 'mlx5_active_speed' into rdma.git for-nextJason Gunthorpe
Leon Romanovsky says: ==================== IBTA declares speed as 16 bits, but kernel stores it in u8. This series fixes in-kernel declaration while keeping external interface intact. ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies. * branch 'mlx5_active_speed': RDMA: Fix link active_speed size RDMA/mlx5: Delete duplicated mlx5_ptys_width enum net/mlx5: Refactor query port speed functions
2020-09-18RDMA: Fix link active_speed sizeAharon Landau
According to the IB spec active_speed size should be u16 and not u8 as before. Changing it to allow further extensions in offered speeds. Link: https://lore.kernel.org/r/20200917090223.1018224-4-leon@kernel.org Signed-off-by: Aharon Landau <aharonl@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devicesAlex Vesker
Expose the ICM address to access TIR and QP, this will allow sw_owned_v2 devices to steer traffic to TIRs and QPs same as done with sw_owner capability. Link: https://lore.kernel.org/r/20200903073857.1129166-4-leon@kernel.org Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devicesAlex Vesker
sw_owner_v2 will replace sw_owner for future devices, this means that if sw_owner_v2 is set sw_owner should be ignored and DM allocation is required for sw_owner_v2 devices to function. Link: https://lore.kernel.org/r/20200903073857.1129166-3-leon@kernel.org Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17RDMA: Convert RWQ table logic to ib_core allocation schemeLeon Romanovsky
Move struct ib_rwq_ind_table allocation to ib_core. Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17RDMA: Clean MW allocation and free flowsLeon Romanovsky
Move allocation and destruction of memory windows under ib_core responsibility and clean drivers to ensure that no updates to MW ib_core structures are done in driver layer. Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17RDMA/mlx5: Delete duplicated mlx5_ptys_width enumAharon Landau
Combine two same enums to avoid duplication. Signed-off-by: Aharon Landau <aharonl@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17net/mlx5: Refactor query port speed functionsAharon Landau
The functions mlx5_query_port_link_width_oper and mlx5_query_port_ib_proto_oper are always called together, so combine them to a new function called mlx5_query_port_oper to avoid duplication. And while the mlx5i_get_port_settings is the same as mlx5_query_port_oper therefore let's remove it. According to the IB spec link_width_oper and ib_proto_oper should be u16 and not as written u8, so perform casting as a preparation to cross-RDMA patch which will fix that type for all drivers in the RDMA subsystem. Fixes: ada68c31ba9c ("net/mlx5: Introduce a new header file for physical port functions") Signed-off-by: Aharon Landau <aharonl@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-11RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()Jason Gunthorpe
ib_umem_num_pages() should only be used by things working with the SGL in CPU pages directly. Drivers building DMA lists should use the new ib_num_dma_blocks() which returns the number of blocks rdma_umem_for_each_block() will return. To make this general for DMA drivers requires a different implementation. Computing DMA block count based on umem->address only works if the requested page size is < PAGE_SIZE and/or the IOVA == umem->address. Instead the number of DMA pages should be computed in the IOVA address space, not umem->address. Thus the IOVA has to be stored inside the umem so it can be used for these calculations. For now set it to umem->address by default and fix it up if ib_umem_find_best_pgsz() was called. This allows drivers to be converted to ib_umem_num_dma_blocks() safely. Link: https://lore.kernel.org/r/6-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Make counters destroy symmetricalLeon Romanovsky
Change counters to return failure like any other verbs destroy, however this flow shouldn't return error at all. Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to return error for destroy WQLeon Romanovsky
Make this interface symmetrical to other destroy paths. Fixes: a49b1dc7ae44 ("RDMA: Convert destroy_wq to be void") Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Change XRCD destroy return valueLeon Romanovsky
Update XRCD destroy flow to allow command failure. Fixes: 28ad5f65c314 ("RDMA: Move XRCD to be under ib_core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Allow fail of destroy CQLeon Romanovsky
Like any other verbs objects, CQ shouldn't fail during destroy, but mlx5_ib didn't follow this contract with mixed IB verbs objects with DEVX. Such mix causes to the situation where FW and kernel are fully interdependent on the reference counting of each side. Kernel verbs and drivers that don't have DEVX flows shouldn't fail. Fixes: e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on SRQ destroyLeon Romanovsky
In similar way to other IB objects, restore the ability to return error on SRQ destroy. Strictly speaking, this change is not necessary, and provided here to ensure a symmetrical interface like other destroy functions. Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/mlx5: Issue FW command to destroy SRQ on reentryLeon Romanovsky
The HW release can fail and leave the system in limbo state, where SRQ is removed from the table, but can't be destroyed later. In every reentry, the initial xa_erase_irq() check will fail. Rewrite the erase logic to keep index, but don't store the entry itself. By doing it, we can safely reinsert entry back in the case of destroy failure. Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on AH destroyLeon Romanovsky
Like any other IB verbs objects, AH are refcounted by ib_core. The release of those objects are controlled by ib_core with promise that AH destroy can't fail. Being SW object for now, this change makes dealloc_ah() to behave like any other destroy IB flows. Fixes: d345691471b4 ("RDMA: Handle AH allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on PD deallocateLeon Romanovsky
The IB verbs objects are counted by the kernel and ib_core ensures that deallocate PD will success so it will be called once all other objects that depends on PD will be released. This is achieved by managing various reference counters on such objects. The mlx5 driver didn't follow this standard flow when allowed DEVX objects that are not managed by ib_core to be interleaved with the ones under ib_core responsibility. In such interleaved scenarios deallocate command can fail and ib_core will leave uobject in internal DB and attempt to clean it later to free resources anyway. This change partially restores returned value from dealloc_pd() for all drivers, but keeping in mind that non-DEVX devices and kernel verbs paths shouldn't fail. Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/mlx5: Fix potential race between destroy and CQE pollLeon Romanovsky
The SRQ can be destroyed right before mlx5_cmd_get_srq is called. In such case the latter will return NULL instead of expected SRQ. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20200830084010.102381-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31Merge tag 'v5.9-rc3' into rdma.git for-nextJason Gunthorpe
Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27IB/mlx5: Add DCT RoCE LAG supportMark Zhang
When DCT QPs work in RoCE LAG mode: 1. DCT creation is allowed only when it is supported 2. The "port" of a DCT QP is assigned in a round-robin way Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27IB/mlx5: Add tx_affinity support for DCI QPMark Zhang
DCI QP supports tx_affinity as well. Link: https://lore.kernel.org/r/20200818115245.700581-2-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>