summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2021-01-19IB/mlx5: Make function staticParav Pandit
mlx5_query_mad_ifc_smp_attr_node_info() is internal to mad.c Hence, make it static. Link: https://lore.kernel.org/r/20210113121703.559778-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19IB/mlx5: Add mutex destroy call to cap_mask_mutex mutexParav Pandit
mutex_destroy() call for device's cap_mask_mutex mutex is missing, let's add it to annotate destruction. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20210113121703.559778-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19IB/iser: Simplify prot_caps settingMax Gurtovoy
Reduce the number of instructions made for setting protection caps. No need to do bitwise OR with 0 since we can zero the return value in the beginning of the function. Link: https://lore.kernel.org/r/20210111145754.56727-5-mgurtovoy@nvidia.com Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19IB/iser: Enforce iser_max_sectors to be greater than 0Max Gurtovoy
A value of 0 will casue the driver to fail establishing a valid connection to remote target. The following can be seen in the log in this case: iser: iser_connect: connecting to: 1.1.1.88:3260 iser: iser_cma_handler: address resolved (0): status 0 conn 00000000090aa4de id 00000000167d3b5a iser: iser_cma_handler: route resolved (2): status 0 conn 00000000090aa4de id 00000000167d3b5a iser: iscsi_iser_ep_poll: iser conn 00000000090aa4de rc = 0 iser: iser_create_ib_conn_res: setting conn 00000000090aa4de cma_id 00000000167d3b5a qp 00000000efa80660 max_send_wr 4619 iser_cma_handler: established (9): status 0 conn 00000000090aa4de id 00000000167d3b5a iser: iser_connected_handler: remote qpn:1c7 my qpn:1c6 iser: iser_connected_handler: conn 00000000090aa4de: negotiated remote invalidation iser: iscsi_iser_ep_poll: iser conn 00000000090aa4de rc = 1 scsi host10: iSCSI Initiator over iSER mlx5_core 0000:07:00.0: mlx5_cmd_check:769:(pid 616473): CREATE_MKEY(0x200) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x3bf6f) iser: iser_create_fastreg_desc: Failed to allocate ib_fast_reg_mr err=-22 iser: iser_alloc_rx_descriptors: failed allocating rx descriptors / data buffers iser: iscsi_iser_ep_disconnect: ep 00000000d2040785 iser conn 00000000090aa4de iser: iser_conn_terminate: iser_conn 00000000090aa4de state 3 iser: iser_free_ib_conn_res: freeing conn 00000000090aa4de cma_id 00000000167d3b5a qp 00000000efa80660 iser: iser_device_try_release: device 00000000dc871b1b refcount 0 Link: https://lore.kernel.org/r/20210111145754.56727-4-mgurtovoy@nvidia.com Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19IB/iser: Protect iscsi_max_lun module param using callbackMax Gurtovoy
Remove the check from the module_init function. Link: https://lore.kernel.org/r/20210111145754.56727-3-mgurtovoy@nvidia.com Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19IB/iser: Remove unneeded semicolonsMax Gurtovoy
No need to add semicolon after closing bracket. Link: https://lore.kernel.org/r/20210111145754.56727-2-mgurtovoy@nvidia.com Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19Merge branch 'devx_set_get' into rdma.git for-nextJason Gunthorpe
Leon Romanovsky says: ==================== Be more strict with DEVX get/set operations for the obj_id. ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies. * branch 'devx_set_get': RDMA/mlx5: Use strict get/set operations for obj_id RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation net/mlx5: Expose ifc bits for query modify header
2021-01-19RDMA/mlx5: Use strict get/set operations for obj_idYishai Hadas
Use strict get/set operations for obj_id based on the specific object type. This comes to prevent any miss match between the general header to the legacy header commands. Link: https://lore.kernel.org/r/20201230130121.180350-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19RDMA/mlx5: Use the correct obj_id upon DEVX TIR creationYishai Hadas
Use the correct obj_id upon DEVX TIR creation by strictly taking the tirn 24 bits and not the general obj_id which is 32 bits. Fixes: 7efce3691d33 ("IB/mlx5: Add obj create and destroy functionality") Link: https://lore.kernel.org/r/20201230130121.180350-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-18RDMA/cxgb4: Fix the reported max_recv_sge valueKamal Heib
The max_recv_sge value is wrongly reported when calling query_qp, This is happening due to a typo when assigning the max_recv_sge value, the value of sq_max_sges was assigned instead of rq_max_sges. Fixes: 3e5c02c9ef9a ("iw_cxgb4: Support query_qp() verb") Link: https://lore.kernel.org/r/20210114191423.423529-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-18IB/isert: Simplify signature cap checkMax Gurtovoy
Use if/else clause instead of "condition ? val1 : val2" to make the code cleaner and simpler. Link: https://lore.kernel.org/r/20210110111903.486681-3-mgurtovoy@nvidia.com Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-18IB/isert: Remove unneeded semicolonMax Gurtovoy
No need to add semicolon after closing bracket. Link: https://lore.kernel.org/r/20210110111903.486681-2-mgurtovoy@nvidia.com Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-18IB/isert: Remove unneeded new linesMax Gurtovoy
The Linux convention is to have only 1 new line between functions. Link: https://lore.kernel.org/r/20210110111903.486681-1-mgurtovoy@nvidia.com Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-18RDMA/bnxt_re: Allow bigger MR creationSelvin Xavier
Allow users to create bigger MRs. Remove the check that prevented creating MRs with number of pages more than 512. Link: https://lore.kernel.org/r/1610012608-14528-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-18RDMA/bnxt_re: Code refactor while populating user MRsSelvin Xavier
Refactor code that populates MR page buffer list. Instead of allocating a pbl_tbl to hold the buffer list, pass the struct ib_umem directly to bnxt_qplib_alloc_init_hwq() as done for other user space memories. Fix the PBL level to handle the above mentioned change. Also, remove an unwanted flag from the input to bnxt_qplib_reg_mr() function. Link: https://lore.kernel.org/r/1610012608-14528-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-18RDMA/hns: Create CQ with selected CQN for bank load balanceYangyang Li
In order to improve performance by balancing the load between different banks of cache, the CQC cache is desigend to choose one of 4 banks according to lower 2 bits of CQN. The hns driver needs to count the number of CQ on each bank and then assigns the CQ being created to the bank with the minimum load first. Link: https://lore.kernel.org/r/1610008589-35770-1-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-18RDMA/nldev: Return an error message on failure to turn auto modePatrisious Haddad
The bounded counter can't be reconfigured to be in auto mode, in attempt to do it, the user will get an error, but without any hint why. Update nldev interface to return an error message through extack mechanism. Link: https://lore.kernel.org/r/20201230130240.180737-1-leon@kernel.org Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs: Fix KASAN: stack-out-of-bounds bugJack Wang
When KASAN is enabled, we notice warning below: [ 483.436975] ================================================================== [ 483.437234] BUG: KASAN: stack-out-of-bounds in _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib] [ 483.437430] Read of size 4 at addr ffff88a195fd7d30 by task kworker/1:3/6954 [ 483.437731] CPU: 1 PID: 6954 Comm: kworker/1:3 Kdump: loaded Tainted: G O 5.4.82-pserver #5.4.82-1+feature+linux+5.4.y+dbg+20201210.1532+987e7a6~deb10 [ 483.437976] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020 [ 483.438168] Workqueue: rtrs_server_wq hb_work [rtrs_core] [ 483.438323] Call Trace: [ 483.438486] dump_stack+0x96/0xe0 [ 483.438646] ? _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib] [ 483.438802] print_address_description.constprop.6+0x1b/0x220 [ 483.438966] ? _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib] [ 483.439133] ? _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib] [ 483.439285] __kasan_report.cold.9+0x1a/0x32 [ 483.439444] ? _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib] [ 483.439597] kasan_report+0x10/0x20 [ 483.439752] _mlx5_ib_post_send+0x188a/0x2560 [mlx5_ib] [ 483.439910] ? update_sd_lb_stats+0xfb1/0xfc0 [ 483.440073] ? set_reg_wr+0x520/0x520 [mlx5_ib] [ 483.440222] ? update_group_capacity+0x340/0x340 [ 483.440377] ? find_busiest_group+0x314/0x870 [ 483.440526] ? update_sd_lb_stats+0xfc0/0xfc0 [ 483.440683] ? __bitmap_and+0x6f/0x100 [ 483.440832] ? __lock_acquire+0xa2/0x2150 [ 483.440979] ? __lock_acquire+0xa2/0x2150 [ 483.441128] ? __lock_acquire+0xa2/0x2150 [ 483.441279] ? debug_lockdep_rcu_enabled+0x23/0x60 [ 483.441430] ? lock_downgrade+0x390/0x390 [ 483.441582] ? __lock_acquire+0xa2/0x2150 [ 483.441729] ? __lock_acquire+0xa2/0x2150 [ 483.441876] ? newidle_balance+0x425/0x8f0 [ 483.442024] ? __lock_acquire+0xa2/0x2150 [ 483.442172] ? debug_lockdep_rcu_enabled+0x23/0x60 [ 483.442330] hb_work+0x15d/0x1d0 [rtrs_core] [ 483.442479] ? schedule_hb+0x50/0x50 [rtrs_core] [ 483.442627] ? lock_downgrade+0x390/0x390 [ 483.442781] ? process_one_work+0x40d/0xa50 [ 483.442931] process_one_work+0x4ee/0xa50 [ 483.443082] ? pwq_dec_nr_in_flight+0x110/0x110 [ 483.443231] ? do_raw_spin_lock+0x119/0x1d0 [ 483.443383] worker_thread+0x65/0x5c0 [ 483.443532] ? process_one_work+0xa50/0xa50 [ 483.451839] kthread+0x1e2/0x200 [ 483.451983] ? kthread_create_on_node+0xc0/0xc0 [ 483.452139] ret_from_fork+0x3a/0x50 The problem is we use wrong type when send wr, hw driver expect the type of IB_WR_RDMA_WRITE_WITH_IMM wr should be ib_rdma_wr, and doing container_of to access member. The fix is simple use ib_rdma_wr instread of ib_send_wr. Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules") Link: https://lore.kernel.org/r/20201217141915.56989-20-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-srv: Init wr_cnt as 1Jack Wang
Fix up wr_avail accounting. if wr_cnt is 0, then we do SIGNAL for first wr, in completion we add queue_depth back, which is not right in the sense of tracking for available wr. So fix it by init wr_cnt to 1. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-19-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-srv: Do not signal REG_MRJack Wang
We do not need to wait for REG_MR completion, so remove the SIGNAL flag. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-18-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Use bitmask to check sess->flagsJack Wang
We may want to add new flags, so it's better to use bitmask to check flags. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-17-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs: Do not signal for heatbeatJack Wang
For HB, there is no need to generate signal for completion. Also remove a comment accordingly. Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules") Link: https://lore.kernel.org/r/20201217141915.56989-16-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reported-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Refactor the failure cases in alloc_cltGuoqing Jiang
Make all failure cases go to the common path to avoid duplicate code. And some issued existed before. 1. clt need to be freed to avoid memory leak. 2. return ERR_PTR(-ENOMEM) if kobject_create_and_add fails, because rtrs_clt_open checks the return value of by call "IS_ERR(clt)". Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-15-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-srv: Fix missing wr_cqeJack Wang
We had a few places wr_cqe is not set, which could lead to NULL pointer deref or GPF in error case. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-14-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Rename __rtrs_clt_change_state to rtrs_clt_change_stateGuoqing Jiang
Let's rename it to rtrs_clt_change_state since the previous one is killed. Also update the comment to make it more clear. Link: https://lore.kernel.org/r/20201217141915.56989-13-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Kill rtrs_clt_change_stateGuoqing Jiang
It is just a wrapper of rtrs_clt_change_state_get_old, and we can reuse rtrs_clt_change_state_get_old with add the checking of 'old_state' is valid or not. Link: https://lore.kernel.org/r/20201217141915.56989-12-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Remove unnecessary 'goto out'Guoqing Jiang
This is not needed since the label is just after the place. Link: https://lore.kernel.org/r/20201217141915.56989-11-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Kill wait_for_inflight_permitsGuoqing Jiang
Let's wait the inflight permits before free it. Link: https://lore.kernel.org/r/20201217141915.56989-10-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Consolidate rtrs_clt_destroy_sysfs_root_{folder,files}Guoqing Jiang
Since the two functions are called together, let's consolidate them in a new function rtrs_clt_destroy_sysfs_root. Link: https://lore.kernel.org/r/20201217141915.56989-9-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs: Call kobject_put in the failure pathGuoqing Jiang
Per the comment of kobject_init_and_add, we need to free the memory by call kobject_put. Fixes: 215378b838df ("RDMA/rtrs: client: sysfs interface functions") Fixes: 91b11610af8d ("RDMA/rtrs: server: sysfs interface functions") Link: https://lore.kernel.org/r/20201217141915.56989-8-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-srv: Jump to dereg_mr label if allocate iu failsGuoqing Jiang
The rtrs_iu_free is called in rtrs_iu_alloc if memory is limited, so we don't need to free the same iu again. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-7-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Set mininum limit when create QPJack Wang
Currently rtrs when create_qp use a coarse numbers (bigger in general), which leads to hardware create more resources which only waste memory with no benefits. - SERVICE con, For max_send_wr/max_recv_wr, it's 2 times SERVICE_CON_QUEUE_DEPTH + 2 - IO con For max_send_wr/max_recv_wr, it's sess->queue_depth * 3 + 1 Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-6-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-srv: Use sysfs_remove_file_self for disconnectJack Wang
Remove self first to avoid deadlock, we don't want to use close_work to remove sess sysfs. Fixes: 91b11610af8d ("RDMA/rtrs: server: sysfs interface functions") Link: https://lore.kernel.org/r/20201217141915.56989-5-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Tested-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-srv: Release lock before call into close_sessJack Wang
In this error case, we don't need hold mutex to call close_sess. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-4-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Tested-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs: Extend ibtrs_cq_qp_createJack Wang
rtrs does not have same limit for both max_send_wr and max_recv_wr, To allow client and server set different values, export in a separate parameter for rtrs_cq_qp_create. Also fix the type accordingly, u32 should be used instead of u16. Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules") Link: https://lore.kernel.org/r/20201217141915.56989-2-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-14RDMA/cma: Fix error flow in default_roce_mode_storeNeta Ostrovsky
In default_roce_mode_store(), we took a reference to cma_dev, but didn't return it with cma_dev_put in the error flow. Fixes: 1c15b4f2a42f ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type") Link: https://lore.kernel.org/r/20210113130214.562108-1-leon@kernel.org Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-14RDMA/mlx5: Fix wrong free of blue flame register on errorMark Bloch
If the allocation of the fast path blue flame register fails, the driver should free the regular blue flame register allocated a statement above, not the one that it just failed to allocate. Fixes: 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") Link: https://lore.kernel.org/r/20210113121703.559778-6-leon@kernel.org Reported-by: Hans Petter Selasky <hanss@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-14IB/mlx5: Fix error unwinding when set_has_smi_cap failsParav Pandit
When set_has_smi_cap() fails, multiport master cleanup is missed. Fix it by doing the correct error unwinding goto. Fixes: a989ea01cb10 ("RDMA/mlx5: Move SMI caps logic") Link: https://lore.kernel.org/r/20210113121703.559778-3-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-14RDMA/umem: Avoid undefined behavior of rounddown_pow_of_two()Aharon Landau
rounddown_pow_of_two() is undefined when the input is 0. Therefore we need to avoid it in ib_umem_find_best_pgsz and return 0. Otherwise, it could result in not rejecting an invalid page size which eventually causes a kernel oops due to the logical inconsistency. Fixes: 3361c29e9279 ("RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()") Link: https://lore.kernel.org/r/20210113121703.559778-2-leon@kernel.org Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-12RDMA/rxe: Fix race in rxe_mcast.cBob Pearson
Fix a race in rxe_mcast.c that occurs when two QPs try at the same time to attach a multicast address. Both QPs lookup the mgid address in a pool of multicast groups and if they do not find it create a new group elem. Fix this by locking the lookup/alloc/add key sequence and using the unlocked APIs added in this patch set. Link: https://lore.kernel.org/r/20201216231550.27224-8-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-12RDMA/rxe: Add unlocked versions of pool APIsBob Pearson
The existing pool APIs use the rw_lock pool_lock to protect critical sections that change the pool state. This does not correctly implement a typical sequence like the following elem = <lookup key in pool> if found use elem else elem = <alloc new elem in pool> <add key to elem> Which is racy if multiple threads are attempting to perform this at the same time. We want the second thread to use the elem created by the first thread not create two equivalent elems. This patch adds new APIs that are the same as existing APIs but do not take the pool_lock. A caller can then take the lock and perform a sequence of pool operations and then release the lock. Link: https://lore.kernel.org/r/20201216231550.27224-7-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-12RDMA/rxe: Make add/drop key/index APIs type safeBob Pearson
Replace 'void *' parameters with 'struct rxe_pool_entry *' and use a macro to allow: rxe_add_index, rxe_drop_index, rxe_add_key, rxe_drop_key and rxe_add_to_pool APIs to be type safe against changing the position of pelem in the objects. Link: https://lore.kernel.org/r/20201216231550.27224-6-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-12RDMA/rxe: Make pool lookup and alloc APIs type safeBob Pearson
The allocate, lookup index, lookup key and cleanup routines in rxe_pool.c currently are not type safe against relocating the pelem field in the objects. Planned changes to move allocation of objects into rdma-core make addressing this a requirement. Use the elem_offset field in rxe_type_info make these APIs safe against moving the pelem field. Link: https://lore.kernel.org/r/20201216231550.27224-5-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-12RDMA/rxe: Add elem_offset field to rxe_type_infoBob Pearson
The rxe verbs objects each include an rdma-core object 'ib_xxx' and a rxe_pool_entry 'pelem' in addition to rxe specific data. Originally these all had pelem first and ib_xxx second. Currently about half have ib_xxx first and half have pelem first. Saving the offset of the pelem field in rxe_type info will enable making the rxe_pool APIs type safe as the pelem field continues to vary. Link: https://lore.kernel.org/r/20201216231550.27224-4-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-12RDMA/rxe: Let pools support both keys and indicesBob Pearson
Allow both indices and keys to exist for objects in pools. Previously you were limited to one or the other. This is required for later implementing rxe memory windows. Link: https://lore.kernel.org/r/20201216231550.27224-3-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-12RDMA/rxe: Remove unneeded RXE_POOL_ATOMIC flagBob Pearson
Remove RXE_POOL_ATOMIC flag from rxe_type_info for AH objects. These objects are now allocated by rdma/core so there is no further reason for this flag. Link: https://lore.kernel.org/r/20201216231550.27224-2-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-12RDMA/rxe: Add check for supported QP typesXiao Yang
Current rdma_rxe only supports five QP types, attempting to create any others should return an error - the type check was missed. Link: https://lore.kernel.org/r/20201216071755.149449-2-yangx.jy@cn.fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-08RDMA/siw: Fix handling of zero-sized Read and Receive Queues.Bernard Metzler
During connection setup, the application may choose to zero-size inbound and outbound READ queues, as well as the Receive queue. This patch fixes handling of zero-sized queues, but not prevents it. Kamal Heib says in an initial error report: When running the blktests over siw the following shift-out-of-bounds is reported, this is happening because the passed IRD or ORD from the ulp could be zero which will lead to unexpected behavior when calling roundup_pow_of_two(), fix that by blocking zero values of ORD or IRD. UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 20 PID: 3957 Comm: kworker/u64:13 Tainted: G S 5.10.0-rc6 #2 Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016 Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: dump_stack+0x99/0xcb ubsan_epilogue+0x5/0x40 __ubsan_handle_shift_out_of_bounds.cold.11+0xb4/0xf3 ? down_write+0x183/0x3d0 siw_qp_modify.cold.8+0x2d/0x32 [siw] ? __local_bh_enable_ip+0xa5/0xf0 siw_accept+0x906/0x1b60 [siw] ? xa_load+0x147/0x1f0 ? siw_connect+0x17a0/0x17a0 [siw] ? lock_downgrade+0x700/0x700 ? siw_get_base_qp+0x1c2/0x340 [siw] ? _raw_spin_unlock_irqrestore+0x39/0x40 iw_cm_accept+0x1f4/0x430 [iw_cm] rdma_accept+0x3fa/0xb10 [rdma_cm] ? check_flush_dependency+0x410/0x410 ? cma_rep_recv+0x570/0x570 [rdma_cm] nvmet_rdma_queue_connect+0x1a62/0x2680 [nvmet_rdma] ? nvmet_rdma_alloc_cmds+0xce0/0xce0 [nvmet_rdma] ? lock_release+0x56e/0xcc0 ? lock_downgrade+0x700/0x700 ? lock_downgrade+0x700/0x700 ? __xa_alloc_cyclic+0xef/0x350 ? __xa_alloc+0x2d0/0x2d0 ? rdma_restrack_add+0xbe/0x2c0 [ib_core] ? __ww_mutex_die+0x190/0x190 cma_cm_event_handler+0xf2/0x500 [rdma_cm] iw_conn_req_handler+0x910/0xcb0 [rdma_cm] ? _raw_spin_unlock_irqrestore+0x39/0x40 ? trace_hardirqs_on+0x1c/0x150 ? cma_ib_handler+0x8a0/0x8a0 [rdma_cm] ? __kasan_kmalloc.constprop.7+0xc1/0xd0 cm_work_handler+0x121c/0x17a0 [iw_cm] ? iw_cm_reject+0x190/0x190 [iw_cm] ? trace_hardirqs_on+0x1c/0x150 process_one_work+0x8fb/0x16c0 ? pwq_dec_nr_in_flight+0x320/0x320 worker_thread+0x87/0xb40 ? __kthread_parkme+0xd1/0x1a0 ? process_one_work+0x16c0/0x16c0 kthread+0x35f/0x430 ? kthread_mod_delayed_work+0x180/0x180 ret_from_fork+0x22/0x30 Fixes: a531975279f3 ("rdma/siw: main include file") Fixes: f29dd55b0236 ("rdma/siw: queue pair methods") Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Fixes: b9be6f18cf9e ("rdma/siw: transmit path") Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Link: https://lore.kernel.org/r/20210108125845.1803-1-bmt@zurich.ibm.com Reported-by: Kamal Heib <kamalheib1@gmail.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-07RDMA: Use kzalloc for allocating only one thingZheng Yongjun
Use kzalloc rather than kcalloc(1,...) The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ @@ - kcalloc(1, + kzalloc( ...) // </smpl> Link: https://lore.kernel.org/r/20201229135223.23815-1-zhengyongjun3@huawei.com Link: https://lore.kernel.org/r/20201229135232.23869-1-zhengyongjun3@huawei.com Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-07RDMA/ocrdma: Fix use after free in ocrdma_dealloc_ucontext_pd()Tom Rix
In ocrdma_dealloc_ucontext_pd() uctx->cntxt_pd is assigned to the variable pd and then after uctx->cntxt_pd is freed, the variable pd is passed to function _ocrdma_dealloc_pd() which dereferences pd directly or through its call to ocrdma_mbx_dealloc_pd(). Reorder the free using the variable pd. Cc: stable@vger.kernel.org Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Link: https://lore.kernel.org/r/20201230024653.1516495-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>