summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2024-09-18IB/mlx5: Rename 400G_8X speed to comply to naming conventionPatrisious Haddad
[ Upstream commit b28ad32442bec2f0d9cb660d7d698a1a53c13d08 ] Rename 400G_8X speed to comply to naming convention. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/ac98447cac8379a43fbdb36d56e5fb2b741a97ff.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 80bf474242b2 ("net/mlx5e: Add missing link mode to ptys2ext_ethtool_map") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-08RDMA/efa: Properly handle unexpected AQ completionsMichael Margolin
[ Upstream commit 2d0e7ba468eae365f3c4bc9266679e1f8dd405f0 ] Do not try to handle admin command completion if it has an unexpected command id and print a relevant error message. Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20240513064630.6247-1-mrgolin@amazon.com Reviewed-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29IB/hfi1: Fix potential deadlock on &irq_src_lock and &dd->uctxt_lockChengfeng Ye
[ Upstream commit 2f19c4b8395ccb6eb25ccafee883c8cfbe3fc193 ] handle_receive_interrupt_napi_sp() running inside interrupt handler could introduce inverse lock ordering between &dd->irq_src_lock and &dd->uctxt_lock, if read_mod_write() is preempted by the isr. [CPU0] | [CPU1] hfi1_ipoib_dev_open() | --> hfi1_netdev_enable_queues() | --> enable_queues(rx) | --> hfi1_rcvctrl() | --> set_intr_bits() | --> read_mod_write() | --> spin_lock(&dd->irq_src_lock) | | hfi1_poll() | --> poll_next() | --> spin_lock_irq(&dd->uctxt_lock) | | --> hfi1_rcvctrl() | --> set_intr_bits() | --> read_mod_write() | --> spin_lock(&dd->irq_src_lock) <interrupt> | --> handle_receive_interrupt_napi_sp() | --> set_all_fastpath() | --> hfi1_rcd_get_by_index() | --> spin_lock_irqsave(&dd->uctxt_lock) | This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch use spin_lock_irqsave() on &dd->irq_src_lock inside read_mod_write() to prevent the possible deadlock scenario. Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Link: https://lore.kernel.org/r/20230926101116.2797-1-dg573847474@gmail.com Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29RDMA/rtrs: Fix the problem of variable not initialized fullyZhu Yanjun
[ Upstream commit c5930a1aa08aafe6ffe15b5d28fe875f88f6ac86 ] No functionality change. The variable which is not initialized fully will introduce potential risks. Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230919020806.534183-1-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/iwcm: Fix a use-after-free related to destroying CM IDsBart Van Assche
commit aee2424246f9f1dadc33faa78990c1e2eb7826e4 upstream. iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with an existing struct iw_cm_id (cm_id) as follows: conn_id->cm_id.iw = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_iw_handler; rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make sure that cm_work_handler() does not trigger a use-after-free by only freeing of the struct rdma_id_private after all pending work has finished. Cc: stable@vger.kernel.org Fixes: 59c68ac31e15 ("iw_cm: free cm_id resources on the last deref") Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-6-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-03bnxt_re: Fix imm_data endiannessJack Wang
[ Upstream commit 95b087f87b780daafad1dbb2c84e81b729d5d33f ] When map a device between servers with MLX and BCM RoCE nics, RTRS server complain about unknown imm type, and can't map the device, After more debug, it seems bnxt_re wrongly handle the imm_data, this patch fixed the compat issue with MLX for us. In off list discussion, Selvin confirmed HW is working in little endian format and all data needs to be converted to LE while providing. This patch fix the endianness for imm_data Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20240710122102.37569-1-jinpu.wang@ionos.com Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA: Fix netdev tracker in ib_device_set_netdevDavid Ahern
[ Upstream commit 2043a14fb3de9d88440b21590f714306fcbbd55f ] If a netdev has already been assigned, ib_device_set_netdev needs to release the reference on the older netdev but it is mistakenly being called for the new netdev. Fix it and in the process use netdev_put to be symmetrical with the netdev_hold. Fixes: 09f530f0c6d6 ("RDMA: Add netdevice_tracker to ib_device_set_netdev()") Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240710203310.19317-1-dsahern@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/core: Remove NULL check before dev_{put, hold}Jules Irenge
[ Upstream commit 7a1c2abf9a2be7d969b25e8d65567933335ca88e ] The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}, remove it to silence the warning: ./drivers/infiniband/core/nldev.c:375:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7047 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20231024003815.89742-1-yang.lee@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 2043a14fb3de ("RDMA: Fix netdev tracker in ib_device_set_netdev") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/hns: Fix insufficient extend DB for VFs.Chengchang Tang
[ Upstream commit 0b8e658f70ffd5dc7cda3872fd524d657d4796b7 ] VFs and its PF will share the memory of the extend DB. Currently, the number of extend DB allocated by driver is only enough for PF. This leads to a probability of DB loss and some other problems in scenarios where both PF and VFs use a large number of QPs. Fixes: 6b63597d3540 ("RDMA/hns: Add TSQ link table support") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/hns: Fix undifined behavior caused by invalid max_sgeChengchang Tang
[ Upstream commit 36397b907355e2fdb5a25a02a7921a937fd8ef4c ] If max_sge has been set to 0, roundup_pow_of_two() in set_srq_basic_param() may have undefined behavior. Fixes: 9dd052474a26 ("RDMA/hns: Allocate one more recv SGE for HIP08") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/hns: Fix shift-out-bounds when max_inline_data is 0Chengchang Tang
[ Upstream commit 24c6291346d98c7ece4f4bfeb5733bec1d6c7b4f ] A shift-out-bounds may occur, if the max_inline_data has not been set. The related log: UBSAN: shift-out-of-bounds in kernel/include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' Call trace: dump_backtrace+0xb0/0x118 show_stack+0x20/0x38 dump_stack_lvl+0xbc/0x120 dump_stack+0x1c/0x28 __ubsan_handle_shift_out_of_bounds+0x104/0x240 set_ext_sge_param+0x40c/0x420 [hns_roce_hw_v2] hns_roce_create_qp+0xf48/0x1c40 [hns_roce_hw_v2] create_qp.part.0+0x294/0x3c0 ib_create_qp_kernel+0x7c/0x150 create_mad_qp+0x11c/0x1e0 ib_mad_init_device+0x834/0xc88 add_client_context+0x248/0x318 enable_device_and_get+0x158/0x280 ib_register_device+0x4ac/0x610 hns_roce_init+0x890/0xf98 [hns_roce_hw_v2] __hns_roce_hw_v2_init_instance+0x398/0x720 [hns_roce_hw_v2] hns_roce_hw_v2_init_instance+0x108/0x1e0 [hns_roce_hw_v2] hclge_init_roce_client_instance+0x1a0/0x358 [hclge] hclge_init_client_instance+0xa0/0x508 [hclge] hnae3_register_client+0x18c/0x210 [hnae3] hns_roce_hw_v2_init+0x28/0xff8 [hns_roce_hw_v2] do_one_initcall+0xe0/0x510 do_init_module+0x110/0x370 load_module+0x2c6c/0x2f20 init_module_from_file+0xe0/0x140 idempotent_init_module+0x24c/0x350 __arm64_sys_finit_module+0x88/0xf8 invoke_syscall+0x68/0x1a0 el0_svc_common.constprop.0+0x11c/0x150 do_el0_svc+0x38/0x50 el0_svc+0x50/0xa0 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 Fixes: 0c5e259b06a8 ("RDMA/hns: Fix incorrect sge nums calculation") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/hns: Fix missing pagesize and alignment check in FRMRChengchang Tang
[ Upstream commit d387d4b54eb84208bd4ca13572e106851d0a0819 ] The offset requires 128B alignment and the page size ranges from 4K to 128M. Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/hns: Fix unmatch exception handling when init eq table failsJunxian Huang
[ Upstream commit 543fb987bd63ed27409b5dea3d3eec27b9c1eac9 ] The hw ctx should be destroyed when init eq table fails. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/hns: Check atomic wr lengthJunxian Huang
[ Upstream commit 6afa2c0bfb8ef69f65715ae059e5bd5f9bbaf03b ] 8 bytes is the only supported length of atomic. Add this check in set_rc_wqe(). Besides, stop processing WQEs and return from set_rc_wqe() if there is any error. Fixes: 384f88185112 ("RDMA/hns: Add atomic support") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/device: Return error earlier if port in not validLeon Romanovsky
[ Upstream commit 917918f57a7b139c043e78c502876f2c286f4f0a ] There is no need to allocate port data if port provided is not valid. Fixes: c2261dd76b54 ("RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev") Link: https://lore.kernel.org/r/022047a8b16988fc88d4426da50bf60a4833311b.1719235449.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/rxe: Don't set BTH_ACK_MASK for UC or UD QPsHonggang LI
[ Upstream commit 4adcaf969d77d3d3aa3871bbadc196258a38aec6 ] BTH_ACK_MASK bit is used to indicate that an acknowledge (for this packet) should be scheduled by the responder. Both UC and UD QPs are unacknowledged, so don't set BTH_ACK_MASK for UC or UD QPs. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240624020348.494338-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/mlx4: Fix truncated output warning in alias_GUID.cLeon Romanovsky
[ Upstream commit 5953e0647cec703ef436ead37fed48943507b433 ] drivers/infiniband/hw/mlx4/alias_GUID.c: In function ‘mlx4_ib_init_alias_guid_service’: drivers/infiniband/hw/mlx4/alias_GUID.c:878:74: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 5 [-Werror=format-truncation=] 878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); | ^~ drivers/infiniband/hw/mlx4/alias_GUID.c:878:63: note: directive argument in the range [-2147483641, 2147483646] 878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); | ^~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/alias_GUID.c:878:17: note: ‘snprintf’ output between 12 and 22 bytes into a destination of size 15 878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Fixes: a0c64a17aba8 ("mlx4: Add alias_guid mechanism") Link: https://lore.kernel.org/r/1951c9500109ca7e36dcd523f8a5f2d0d2a608d1.1718554641.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/mlx4: Fix truncated output warning in mad.cLeon Romanovsky
[ Upstream commit 0d2e6992fc956e3308cd5376c18567def4cb3967 ] Increase size of the name array to avoid truncated output warning. drivers/infiniband/hw/mlx4/mad.c: In function ‘mlx4_ib_alloc_demux_ctx’: drivers/infiniband/hw/mlx4/mad.c:2197:47: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 4 [-Werror=format-truncation=] 2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port); | ^~ drivers/infiniband/hw/mlx4/mad.c:2197:38: note: directive argument in the range [-2147483645, 2147483647] 2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port); | ^~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2197:9: note: ‘snprintf’ output between 10 and 20 bytes into a destination of size 12 2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2205:48: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 3 [-Werror=format-truncation=] 2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port); | ^~ drivers/infiniband/hw/mlx4/mad.c:2205:38: note: directive argument in the range [-2147483645, 2147483647] 2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port); | ^~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2205:9: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 12 2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2213:48: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 3 [-Werror=format-truncation=] 2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port); | ^~ drivers/infiniband/hw/mlx4/mad.c:2213:38: note: directive argument in the range [-2147483645, 2147483647] 2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port); | ^~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2213:9: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 12 2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:244: drivers/infiniband/hw/mlx4/mad.o] Error 1 Fixes: fc06573dfaf8 ("IB/mlx4: Initialize SR-IOV IB support for slaves in master context") Link: https://lore.kernel.org/r/f3798b3ce9a410257d7e1ec7c9e285f1352e256a.1718554569.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/cache: Release GID table even if leak is detectedLeon Romanovsky
[ Upstream commit a92fbeac7e94a420b55570c10fe1b90e64da4025 ] When the table is released, we nullify pointer to GID table, it means that in case GID entry leak is detected, we will leak table too. Delete code that prevents table destruction. Fixes: b150c3862d21 ("IB/core: Introduce GID entry reference counts") Link: https://lore.kernel.org/r/a62560af06ba82c88ef9194982bfa63d14768ff9.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03RDMA/mlx5: Set mkeys for dmabuf at PAGE_SIZEChiara Meiohas
[ Upstream commit a4e540119be565f47c305f295ed43f8e0bc3f5c3 ] Set the mkey for dmabuf at PAGE_SIZE to support any SGL after a move operation. ib_umem_find_best_pgsz returns 0 on error, so it is incorrect to check the returned page_size against PAGE_SIZE Fixes: 90da7dc8206a ("RDMA/mlx5: Support dma-buf based userspace memory region") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/1e2289b9133e89f273a4e68d459057d032cbc2ce.1718301631.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11IB/core: Implement a limit on UMAD receive ListMichael Guralnik
[ Upstream commit ca0b44e20a6f3032224599f02e7c8fb49525c894 ] The existing behavior of ib_umad, which maintains received MAD packets in an unbounded list, poses a risk of uncontrolled growth. As user-space applications extract packets from this list, the rate of extraction may not match the rate of incoming packets, leading to potential list overflow. To address this, we introduce a limit to the size of the list. After considering typical scenarios, such as OpenSM processing, which can handle approximately 100k packets per second, and the 1-second retry timeout for most packets, we set the list size limit to 200k. Packets received beyond this limit are dropped, assuming they are likely timed out by the time they are handled by user-space. Notably, packets queued on the receive list due to reasons like timed-out sends are preserved even when the list is full. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/7197cb58a7d9e78399008f25036205ceab07fbd5.1713268818.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05RDMA/restrack: Fix potential invalid address accessWenchao Hao
[ Upstream commit ca537a34775c103f7b14d7bbd976403f1d1525d8 ] struct rdma_restrack_entry's kern_name was set to KBUILD_MODNAME in ib_create_cq(), while if the module exited but forgot del this rdma_restrack_entry, it would cause a invalid address access in rdma_restrack_clean() when print the owner of this rdma_restrack_entry. These code is used to help find one forgotten PD release in one of the ULPs. But it is not needed anymore, so delete them. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20240318092320.1215235-1-haowenchao2@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27RDMA/mlx5: Follow rb_key.ats when creating new mkeysJason Gunthorpe
commit f637040c3339a2ed8c12d65ad03f9552386e2fe7 upstream. When a cache ent already exists but doesn't have any mkeys in it the cache will automatically create a new one based on the specification in the ent->rb_key. ent->ats was missed when creating the new key and so ma_translation_mode was not being set even though the ent requires it. Cc: stable@vger.kernel.org Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/7c5613458ecb89fbe5606b7aa4c8d990bdea5b9a.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27RDMA/mlx5: Remove extra unlock on error pathJason Gunthorpe
commit c1eb2512596fb3542357bb6c34c286f5e0374538 upstream. The below commit lifted the locking out of this function but left this error path unlock behind resulting in unbalanced locking. Remove the missed unlock too. Cc: stable@vger.kernel.org Fixes: 627122280c87 ("RDMA/mlx5: Add work to remove temporary entries from the cache") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/78090c210c750f47219b95248f9f782f34548bb1.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27RDMA/rxe: Fix data copy for IB_SEND_INLINEHonggang LI
commit 03fa18a992d5626fd7bf3557a52e826bf8b326b3 upstream. For RDMA Send and Write with IB_SEND_INLINE, the memory buffers specified in sge list will be placed inline in the Send Request. The data should be copied by CPU from the virtual addresses of corresponding sge list DMA addresses. Cc: stable@kernel.org Fixes: 8d7c7c0eeb74 ("RDMA: Add ib_virt_dma_to_page()") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240516095052.542767-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27RDMA/mana_ib: Ignore optional access flags for MRsKonstantin Taranov
[ Upstream commit 82a5cc783d49b86afd2f60e297ecd85223c39f88 ] Ignore optional ib_access_flags when an MR is created. Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717575368-14879-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27RDMA/mlx5: Add check for srq max_sge attributePatrisious Haddad
[ Upstream commit 36ab7ada64caf08f10ee5a114d39964d1f91e81d ] max_sge attribute is passed by the user, and is inserted and used unchecked, so verify that the value doesn't exceed maximum allowed value before using it. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/277ccc29e8d57bfd53ddeb2ac633f2760cf8cdd0.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27RDMA/mlx5: Fix unwind flow as part of mlx5_ib_stage_init_initYishai Hadas
[ Upstream commit 81497c148b7a2e4a4fbda93aee585439f7323e2e ] Fix unwind flow as part of mlx5_ib_stage_init_init to use the correct goto upon an error. Fixes: 758ce14aee82 ("RDMA/mlx5: Implement MACsec gid addition and deletion") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/aa40615116eda14ec9eca21d52017d632ea89188.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27RDMA/rxe: Fix responder length checking for UD request packetsHonggang LI
[ Upstream commit f67ac0061c7614c1548963d3ef1ee1606efd8636 ] According to the IBA specification: If a UD request packet is detected with an invalid length, the request shall be an invalid request and it shall be silently dropped by the responder. The responder then waits for a new request packet. commit 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length checking") defers responder length check for UD QPs in function `copy_data`. But it introduces a regression issue for UD QPs. When the packet size is too large to fit in the receive buffer. `copy_data` will return error code -EINVAL. Then `send_data_in` will return RESPST_ERR_MALFORMED_WQE. UD QP will transfer into ERROR state. Fixes: 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length checking") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240523094617.141148-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27RDMA/bnxt_re: Fix the max msix vectors macroSelvin Xavier
[ Upstream commit 056620da899527c14cf36e5019a0decaf4cf0f79 ] bnxt_re no longer decide the number of MSI-x vectors used by itself. Its decided by bnxt_en now. So when bnxt_en changes this value, system crash is seen. Depend on the max value reported by bnxt_en instead of using the its own macros. Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1716195418-11767-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/bnxt_re: Fix the sparse warningsSelvin Xavier
commit 82a8903a9f9f3ff31027b9a0b92f7505f981f09c upstream. Fix the following warnings reported drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: warning: invalid assignment: |= drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: left side has type restricted __le16 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: right side has type unsigned long ... drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: warning: invalid assignment: |= drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: left side has type restricted __le64 drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: right side has type unsigned long long Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312200537.HoNqPL5L-lkp@intel.com/ Fixes: 07f830ae4913 ("RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1703046717-8914-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siwZhu Yanjun
[ Upstream commit 9c0731832d3b7420cbadba6a7f334363bc8dfb15 ] When running blktests nvme/rdma, the following kmemleak issue will appear. kmemleak: Kernel memory leak detector initialized (mempool available:36041) kmemleak: Automatic memory scanning thread started kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 8 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 17 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak) unreferenced object 0xffff88855da53400 (size 192): comm "rdma", pid 10630, jiffies 4296575922 hex dump (first 32 bytes): 37 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00 7............... 10 34 a5 5d 85 88 ff ff 10 34 a5 5d 85 88 ff ff .4.].....4.].... backtrace (crc 47f66721): [<ffffffff911251bd>] kmalloc_trace+0x30d/0x3b0 [<ffffffffc2640ff7>] alloc_gid_entry+0x47/0x380 [ib_core] [<ffffffffc2642206>] add_modify_gid+0x166/0x930 [ib_core] [<ffffffffc2643468>] ib_cache_update.part.0+0x6d8/0x910 [ib_core] [<ffffffffc2644e1a>] ib_cache_setup_one+0x24a/0x350 [ib_core] [<ffffffffc263949e>] ib_register_device+0x9e/0x3a0 [ib_core] [<ffffffffc2a3d389>] 0xffffffffc2a3d389 [<ffffffffc2688cd8>] nldev_newlink+0x2b8/0x520 [ib_core] [<ffffffffc2645fe3>] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core] [<ffffffffc264648c>] rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core] [<ffffffff9270e7b5>] netlink_unicast+0x445/0x710 [<ffffffff9270f1f1>] netlink_sendmsg+0x761/0xc40 [<ffffffff9249db29>] __sys_sendto+0x3a9/0x420 [<ffffffff9249dc8c>] __x64_sys_sendto+0xdc/0x1b0 [<ffffffff92db0ad3>] do_syscall_64+0x93/0x180 [<ffffffff92e00126>] entry_SYSCALL_64_after_hwframe+0x71/0x79 The root cause: rdma_put_gid_attr is not called when sgid_attr is set to ERR_PTR(-ENODEV). Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/19bf5745-1b3b-4b8a-81c2-20d945943aaf@linux.dev/T/ Fixes: f8ef1be816bf ("RDMA/cma: Avoid GID lookups on iWARP devices") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240510211247.31345-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/IPoIB: Fix format truncation compilation errorsLeon Romanovsky
[ Upstream commit 49ca2b2ef3d003402584c68ae7b3055ba72e750a ] Truncate the device name to store IPoIB VLAN name. [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 allmodconfig [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 W=1 drivers/infiniband/ulp/ipoib/ drivers/infiniband/ulp/ipoib/ipoib_vlan.c: In function ‘ipoib_vlan_add’: drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:52: error: ‘%04x’ directive output may be truncated writing 4 bytes into a region of size between 0 and 15 [-Werror=format-truncation=] 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:48: note: directive argument in the range [0, 65535] 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~~~~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:9: note: ‘snprintf’ output between 6 and 21 bytes into a destination of size 16 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 | ppriv->dev->name, pkey); | ~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:244: drivers/infiniband/ulp/ipoib/ipoib_vlan.o] Error 1 make[6]: *** Waiting for unfinished jobs.... Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Link: https://lore.kernel.org/r/e9d3e1fef69df4c9beaf402cc3ac342bad680791.1715240029.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwqMichal Schmidt
[ Upstream commit 78cfd17142ef70599d6409cbd709d94b3da58659 ] Undefined behavior is triggered when bnxt_qplib_alloc_init_hwq is called with hwq_attr->aux_depth != 0 and hwq_attr->aux_stride == 0. In that case, "roundup_pow_of_two(hwq_attr->aux_stride)" gets called. roundup_pow_of_two is documented as undefined for 0. Fix it in the one caller that had this combination. The undefined behavior was detected by UBSAN: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 24 PID: 1075 Comm: (udev-worker) Not tainted 6.9.0-rc6+ #4 Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super Server/H12SSW-iN, BIOS 2.7 10/25/2023 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 ubsan_epilogue+0x5/0x30 __ubsan_handle_shift_out_of_bounds.cold+0x61/0xec __roundup_pow_of_two+0x25/0x35 [bnxt_re] bnxt_qplib_alloc_init_hwq+0xa1/0x470 [bnxt_re] bnxt_qplib_create_qp+0x19e/0x840 [bnxt_re] bnxt_re_create_qp+0x9b1/0xcd0 [bnxt_re] ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kmalloc+0x1b6/0x4f0 ? create_qp.part.0+0x128/0x1c0 [ib_core] ? __pfx_bnxt_re_create_qp+0x10/0x10 [bnxt_re] create_qp.part.0+0x128/0x1c0 [ib_core] ib_create_qp_kernel+0x50/0xd0 [ib_core] create_mad_qp+0x8e/0xe0 [ib_core] ? __pfx_qp_event_handler+0x10/0x10 [ib_core] ib_mad_init_device+0x2be/0x680 [ib_core] add_client_context+0x10d/0x1a0 [ib_core] enable_device_and_get+0xe0/0x1d0 [ib_core] ib_register_device+0x53c/0x630 [ib_core] ? srso_alias_return_thunk+0x5/0xfbef5 bnxt_re_probe+0xbd8/0xe50 [bnxt_re] ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re] auxiliary_bus_probe+0x49/0x80 ? driver_sysfs_add+0x57/0xc0 really_probe+0xde/0x340 ? pm_runtime_barrier+0x54/0x90 ? __pfx___driver_attach+0x10/0x10 __driver_probe_device+0x78/0x110 driver_probe_device+0x1f/0xa0 __driver_attach+0xba/0x1c0 bus_for_each_dev+0x8f/0xe0 bus_add_driver+0x146/0x220 driver_register+0x72/0xd0 __auxiliary_driver_register+0x6e/0xd0 ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re] bnxt_re_mod_init+0x3e/0xff0 [bnxt_re] ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re] do_one_initcall+0x5b/0x310 do_init_module+0x90/0x250 init_module_from_file+0x86/0xc0 idempotent_init_module+0x121/0x2b0 __x64_sys_finit_module+0x5e/0xb0 do_syscall_64+0x82/0x160 ? srso_alias_return_thunk+0x5/0xfbef5 ? syscall_exit_to_user_mode_prepare+0x149/0x170 ? srso_alias_return_thunk+0x5/0xfbef5 ? syscall_exit_to_user_mode+0x75/0x230 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_syscall_64+0x8e/0x160 ? srso_alias_return_thunk+0x5/0xfbef5 ? __count_memcg_events+0x69/0x100 ? srso_alias_return_thunk+0x5/0xfbef5 ? count_memcg_events.constprop.0+0x1a/0x30 ? srso_alias_return_thunk+0x5/0xfbef5 ? handle_mm_fault+0x1f0/0x300 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_user_addr_fault+0x34e/0x640 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4e5132821d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 db 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffca9c906a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000563ec8a8f130 RCX: 00007f4e5132821d RDX: 0000000000000000 RSI: 00007f4e518fa07d RDI: 000000000000003b RBP: 00007ffca9c90760 R08: 00007f4e513f6b20 R09: 00007ffca9c906f0 R10: 0000563ec8a8faa0 R11: 0000000000000246 R12: 00007f4e518fa07d R13: 0000000000020000 R14: 0000563ec8409e90 R15: 0000563ec8a8fa60 </TASK> ---[ end trace ]--- Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/20240507103929.30003-1-mschmidt@redhat.com Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/bnxt_re: Adds MSN table capability for Gen P7 adaptersSelvin Xavier
[ Upstream commit 07f830ae4913d0b986c8c0ff88a7d597948b9bd8 ] GenP7 HW expects an MSN table instead of PSN table. Check for the HW retransmission capability and populate the MSN table if HW retansmission is supported. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1701946060-13931-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 78cfd17142ef ("bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/bnxt_re: Update the HW interface definitionsSelvin Xavier
[ Upstream commit 880a5dd1880a296575e92dec9816a7f35a7011d1 ] Adds HW interface definitions to support the new chip revision. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1701946060-13931-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 78cfd17142ef ("bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/bnxt_re: Remove roundup_pow_of_two depth for all hardware queue resourcesChandramohan Akula
[ Upstream commit 48f996d4adf15a0a0af8b8184d3ec6042a684ea4 ] Rounding up the queue depth to power of two is not a hardware requirement. In order to optimize the per connection memory usage, removing drivers implementation which round up to the queue depths to the power of 2. Implements a mask to maintain backward compatibility with older library. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1698069803-1787-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 78cfd17142ef ("bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/bnxt_re: Refactor the queue index updateChandramohan Akula
[ Upstream commit 3a4304d82695015d0703ee0c3331458d22e3ba7c ] The queue index wrap around logic is based on power of 2 size depth. All queues are created with power of 2 depth. This increases the memory usage by the driver. This change is required for the next patches that avoids the power of 2 depth requirement for each of the queues. Update the function that increments producer index and consumer index during wrap around. Also, changes the index handling across multiple functions. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1698069803-1787-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 78cfd17142ef ("bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12IB/mlx5: Use __iowrite64_copy() for write combining storesJason Gunthorpe
[ Upstream commit ef302283ddfceaba2657923af3f90fd58e6dff06 ] mlx5 has a built in self-test at driver startup to evaluate if the platform supports write combining to generate a 64 byte PCIe TLP or not. This has proven necessary because a lot of common scenarios end up with broken write combining (especially inside virtual machines) and there is other way to learn this information. This self test has been consistently failing on new ARM64 CPU designs (specifically with NVIDIA Grace's implementation of Neoverse V2). The C loop around writeq() generates some pretty terrible ARM64 assembly, but historically this has worked on a lot of existing ARM64 CPUs till now. We see it succeed about 1 time in 10,000 on the worst effected systems. The CPU architects speculate that the load instructions interspersed with the stores makes the WC buffers statistically flush too often and thus the generation of large TLPs becomes infrequent. This makes the boot up test unreliable in that it indicates no write-combining, however userspace would be fine since it uses a ST4 instruction. Further, S390 has similar issues where only the special zpci_memcpy_toio() will actually generate large TLPs, and the open coded loop does not trigger it at all. Fix both ARM64 and S390 by switching to __iowrite64_copy() which now provides architecture specific variants that have a high change of generating a large TLP with write combining. x86 continues to use a similar writeq loop in the generate __iowrite64_copy(). Fixes: 11f552e21755 ("IB/mlx5: Test write combining support") Link: https://lore.kernel.org/r/6-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/rxe: Fix incorrect rxe_put in error pathBob Pearson
[ Upstream commit 8776618dbbd1b6f210b31509507e1aad461d6435 ] In rxe_send() a ref is taken on the qp to keep it alive until the kfree_skb() has a chance to call the skb destructor rxe_skb_tx_dtor() which drops the reference. If the packet has an incorrect protocol the error path just calls kfree_skb() which will call the destructor which will drop the ref. Currently the driver also calls rxe_put() which is incorrect. Additionally since the packets sent to rxe_send() are under the control of the driver and it only ever produces IPV4 or IPV6 packets the simplest fix is to remove all the code in this block. Link: https://lore.kernel.org/r/20240329145513.35381-12-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 9eb7f8e44d13 ("IB/rxe: Move refcounting earlier in rxe_send()") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/rxe: Allow good work requests to be executedBob Pearson
[ Upstream commit b703374837a8f8422fa3f1edcf65505421a65a6a ] A previous commit incorrectly added an 'if(!err)' before scheduling the requester task in rxe_post_send_kernel(). But if there were send wrs successfully added to the send queue before a bad wr they might never get executed. This commit fixes this by scheduling the requester task if any wqes were successfully posted in rxe_post_send_kernel() in rxe_verbs.c. Link: https://lore.kernel.org/r/20240329145513.35381-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 5bf944f24129 ("RDMA/rxe: Add error messages") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/rxe: Fix seg fault in rxe_comp_queue_pktBob Pearson
[ Upstream commit 2b23b6097303ed0ba5f4bc036a1c07b6027af5c6 ] In rxe_comp_queue_pkt() an incoming response packet skb is enqueued to the resp_pkts queue and then a decision is made whether to run the completer task inline or schedule it. Finally the skb is dereferenced to bump a 'hw' performance counter. This is wrong because if the completer task is already running in a separate thread it may have already processed the skb and freed it which can cause a seg fault. This has been observed infrequently in testing at high scale. This patch fixes this by changing the order of enqueuing the packet until after the counter is accessed. Link: https://lore.kernel.org/r/20240329145513.35381-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 0b1e5b99a48b ("IB/rxe: Add port protocol stats") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/hns: Modify the print level of CQE errorChengchang Tang
[ Upstream commit 349e859952285ab9689779fb46de163f13f18f43 ] Too much print may lead to a panic in kernel. Change ibdev_err() to ibdev_err_ratelimited(), and change the printing level of cqe dump to debug level. Fixes: 7c044adca272 ("RDMA/hns: Simplify the cqe code of poll cq") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-11-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/hns: Use complete parentheses in macrosChengchang Tang
[ Upstream commit 4125269bb9b22e1d8cdf4412c81be8074dbc61ca ] Use complete parentheses to ensure that macro expansion does not produce unexpected results. Fixes: a25d13cbe816 ("RDMA/hns: Add the interfaces to support multi hop addressing for the contexts in hip08") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-10-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/hns: Fix GMV table pagesizeChengchang Tang
[ Upstream commit ee045493283403969591087bd405fa280103282a ] GMV's BA table only supports 4K pages. Currently, PAGESIZE is used to calculate gmv_bt_num, which will cause an abnormal number of gmv_bt_num in a 64K OS. Fixes: d6d91e46210f ("RDMA/hns: Add support for configuring GMV table") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/hns: Fix UAF for cq async eventChengchang Tang
[ Upstream commit a942ec2745ca864cd8512142100e4027dc306a42 ] The refcount of CQ is not protected by locks. When CQ asynchronous events and CQ destruction are concurrent, CQ may have been released, which will cause UAF. Use the xa_lock() to protect the CQ refcount. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/hns: Fix deadlock on SRQ async events.Chengchang Tang
[ Upstream commit b46494b6f9c19f141114a57729e198698f40af37 ] xa_lock for SRQ table may be required in AEQ. Use xa_store_irq()/ xa_erase_irq() to avoid deadlock. Fixes: 81fce6291d99 ("RDMA/hns: Add SRQ asynchronous event support") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/hns: Fix return value in hns_roce_map_mr_sgZhengchao Shao
[ Upstream commit 203b70fda63425a4eb29f03f9074859afe821a39 ] As described in the ib_map_mr_sg function comment, it returns the number of sg elements that were mapped to the memory region. However, hns_roce_map_mr_sg returns the number of pages required for mapping the DMA area. Fix it. Fixes: 9b2cf76c9f05 ("RDMA/hns: Optimize PBL buffer allocation process") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240411033851.2884771-1-shaozhengchao@huawei.com Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/mlx5: Adding remote atomic access flag to updatable flagsOr Har-Toov
[ Upstream commit 2ca7e93bc963d9ec2f5c24d117176851454967af ] Currently IB_ACCESS_REMOTE_ATOMIC is blocked from being updated via UMR although in some cases it should be possible. These cases are checked in mlx5r_umr_can_reconfig function. Fixes: ef3642c4f54d ("RDMA/mlx5: Fix error unwinds for rereg_mr") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/24dac73e2fa48cb806f33a932d97f3e402a5ea2c.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_entOr Har-Toov
[ Upstream commit 0611a8e8b475fc5230b9a24d29c8397aaab20b63 ] As some mkeys can't be modified with UMR due to some UMR limitations, like the size of translation that can be updated, not all user mkeys can be cached. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/f2742dd934ed73b2d32c66afb8e91b823063880c.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>