summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)Author
2025-03-09RDMA/mlx5: Check enabled UCAPs when creating ucontextChiara Meiohas
Verify that the enabled UCAPs are supported by the device before creating the ucontext. If supported, create the ucontext with the associated capabilities. Store the privileged ucontext UID on creation and remove it when destroying the privileged ucontext. This allows the command interface to recognize privileged commands through its UID. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/8b180583a207cb30deb7a2967934079749cdcc44.1741261611.git.leon@kernel.org Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-09RDMA/mlx5: Create UCAP char devices for supported device capabilitiesChiara Meiohas
Create UCAP character devices when probing an IB device with supported firmware capabilities. If the RDMA_CTRL general object type is supported, check for specific UCTX capabilities: Create /dev/infiniband/mlx5_perm_ctrl_local for RDMA_UCAP_MLX5_CTRL_LOCAL Create /dev/infiniband/mlx5_perm_ctrl_other_vhca for RDMA_UCAP_MLX5_CTRL_OTHER_VHCA Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/30ed40e7a12a694cf4ee257459ed61b145b7837d.1741261611.git.leon@kernel.org Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-08RDMA/mana_ib: Use safer allocation function()Dan Carpenter
My static checker says this multiplication can overflow. I'm not an expert in this code but the call tree would be: ib_uverbs_handler_UVERBS_METHOD_QP_CREATE() <- reads cap from the user -> ib_create_qp_user() -> create_qp() -> mana_ib_create_qp() -> mana_ib_create_ud_qp() -> create_shadow_queue() It can't hurt to use safer interfaces. Fixes: c8017f5b4856 ("RDMA/mana_ib: UD/GSI work requests") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/58439ac0-1ee5-4f96-a595-7ab83b59139b@stanley.mountain Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-06RDMA/erdma: Prevent use-after-free in erdma_accept_newconn()Cheng Xu
After the erdma_cep_put(new_cep) being called, new_cep will be freed, and the following dereference will cause a UAF problem. Fix this issue. Fixes: 920d93eac8b9 ("RDMA/erdma: Add connection management (CM) support") Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-06RDMA/vmw_pvrdma: Remove unused pvrdma_modify_deviceDr. David Alan Gilbert
pvrdma_modify_device() was added in 2016 as part of commit 29c8d9eba550 ("IB: Add vmw_pvrdma driver") but accidentally it was never wired into the device_ops struct. After some discussion the best course seems to be just to remove it, see discussion at: https://lore.kernel.org/all/Z8TWF6coBUF3l_jk@gallifrey/ Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250304215637.68559-1-linux@treblig.org Acked-by: Vishnu Dasa <vishnu.dasa@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-06RDMA/mlx5: Handle errors returned from mlx5r_ib_rate()Qasim Ijaz
In function create_ib_ah() the following line attempts to left shift the return value of mlx5r_ib_rate() by 4 and store it in the stat_rate_sl member of av: However the code overlooks the fact that mlx5r_ib_rate() may return -EINVAL if the rate passed to it is less than IB_RATE_2_5_GBPS or greater than IB_RATE_800_GBPS. Because of this, the code may invoke undefined behaviour when shifting a signed negative value when doing "-EINVAL << 4". To fix this check for errors before assigning stat_rate_sl and propagate any error value to the callers. Fixes: c534ffda781f ("RDMA/mlx5: Fix AH static rate parsing") Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Link: https://patch.msgid.link/20250304140246.205919-1-qasdev00@gmail.com Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-03RDMA/bnxt_re: Fix reporting maximum SRQs on P7 chipsPreethi G
Firmware reports support for additional SRQs in the max_srq_ext field. In CREQ_QUERY_FUNC response, if MAX_SRQ_EXTENDED flag is set, driver should derive the total number of max SRQs by the summation of "max_srq" and "max_srq_ext" fields. Fixes: b1b66ae094cd ("bnxt_en: Use FW defined resource limits for RoCE") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Preethi G <preethi.gurusiddalingeswaraswamy@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1741021178-2569-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-03RDMA/bnxt_re: Add missing paranthesis in map_qp_id_to_tbl_indxKashyap Desai
The modulo operation returns wrong result without the paranthesis and that resulted in wrong QP table indexing. Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1741021178-2569-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-03RDMA/bnxt_re: Fix allocation of QP tableKashyap Desai
Driver is creating QP table too early while probing before querying firmware capabilities. Driver currently is using a hard coded values of 64K as size while creating QP table. This resulted in a crash when firmwre supported QP count is more than 64K. To fix the issue, move the QP tabel creation after the firmware capabilities are queried. Use the firmware returned maximum value of QPs while creating the QP table. Fixes: b1b66ae094cd ("bnxt_en: Use FW defined resource limits for RoCE") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1741021178-2569-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-03-03RDMA/mlx5: Reorder capability check lastChristian Göttsche
capable() calls refer to enabled LSMs whether to permit or deny the request. This is relevant in connection with SELinux, where a capability check results in a policy decision and by default a denial message on insufficient permission is issued. It can lead to three undesired cases: 1. A denial message is generated, even in case the operation was an unprivileged one and thus the syscall succeeded, creating noise. 2. To avoid the noise from 1. the policy writer adds a rule to ignore those denial messages, hiding future syscalls, where the task performs an actual privileged operation, leading to hidden limited functionality of that task. 3. To avoid the noise from 1. the policy writer adds a rule to permit the task the requested capability, while it does not need it, violating the principle of least privilege. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Link: https://patch.msgid.link/20250302160657.127253-10-cgoettsche@seltendoof.de Reviewed-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24RDMA/hfi1: Remove unused one_qsfp_writeDr. David Alan Gilbert
The last use of one_qsfp_write() was removed in 2016's commit 145dd2b39958 ("IB/hfi1: Always turn on CDRs for low power QSFP modules") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250223215543.153312-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-23RDMA/bnxt_re: Fix the page details for the srq created by kernel consumersKashyap Desai
While using nvme target with use_srq on, below kernel panic is noticed. [ 549.698111] bnxt_en 0000:41:00.0 enp65s0np0: FEC autoneg off encoding: Clause 91 RS(544,514) [ 566.393619] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI .. [ 566.393799] <TASK> [ 566.393807] ? __die_body+0x1a/0x60 [ 566.393823] ? die+0x38/0x60 [ 566.393835] ? do_trap+0xe4/0x110 [ 566.393847] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393867] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393881] ? do_error_trap+0x7c/0x120 [ 566.393890] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393911] ? exc_divide_error+0x34/0x50 [ 566.393923] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393939] ? asm_exc_divide_error+0x16/0x20 [ 566.393966] ? bnxt_qplib_alloc_init_hwq+0x1d4/0x580 [bnxt_re] [ 566.393997] bnxt_qplib_create_srq+0xc9/0x340 [bnxt_re] [ 566.394040] bnxt_re_create_srq+0x335/0x3b0 [bnxt_re] [ 566.394057] ? srso_return_thunk+0x5/0x5f [ 566.394068] ? __init_swait_queue_head+0x4a/0x60 [ 566.394090] ib_create_srq_user+0xa7/0x150 [ib_core] [ 566.394147] nvmet_rdma_queue_connect+0x7d0/0xbe0 [nvmet_rdma] [ 566.394174] ? lock_release+0x22c/0x3f0 [ 566.394187] ? srso_return_thunk+0x5/0x5f Page size and shift info is set only for the user space SRQs. Set page size and page shift for kernel space SRQs also. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1740237621-29291-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-23RDMA/mana_ib: Ensure variable err is initializedKees Bakker
In the function mana_ib_gd_create_dma_region if there are no dma blocks to process the variable `err` remains uninitialized. Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Kees Bakker <kees@ijzerbout.nl> Link: https://patch.msgid.link/20250221195833.7516C16290A@bout3.ijzerbout.nl Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-23RDMA/mlx5: Fix bind QP error cleanup flowPatrisious Haddad
When there is a failure during bind QP, the cleanup flow destroys the counter regardless if it is the one that created it or not, which is problematic since if it isn't the one that created it, that counter could still be in use. Fix that by destroying the counter only if it was created during this call. Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://patch.msgid.link/25dfefddb0ebefa668c32e06a94d84e3216257cf.1740033937.git.leon@kernel.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-20RDMA/mlx5: Fix AH static rate parsingPatrisious Haddad
Previously static rate wasn't translated according to our PRM but simply used the 4 lower bytes. Correctly translate static rate value passed in AH creation attribute according to our PRM expected values. In addition change 800GB mapping to zero, which is the PRM specified value. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://patch.msgid.link/18ef4cc5396caf80728341eb74738cd777596f60.1739187089.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-20RDMA/mlx5: Fix implicit ODP hang on parent deregistrationYishai Hadas
Fix the destroy_unused_implicit_child_mr() to prevent hanging during parent deregistration as of below [1]. Upon entering destroy_unused_implicit_child_mr(), the reference count for the implicit MR parent is incremented using: refcount_inc_not_zero(). A corresponding decrement must be performed if free_implicit_child_mr_work() is not called. The code has been updated to properly manage the reference count that was incremented. [1] INFO: task python3:2157 blocked for more than 120 seconds. Not tainted 6.12.0-rc7+ #1633 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:python3 state:D stack:0 pid:2157 tgid:2157 ppid:1685 flags:0x00000000 Call Trace: <TASK> __schedule+0x420/0xd30 schedule+0x47/0x130 __mlx5_ib_dereg_mr+0x379/0x5d0 [mlx5_ib] ? __pfx_autoremove_wake_function+0x10/0x10 ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0x116/0x170 [ib_uverbs] ? lock_release+0xc6/0x280 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 ? kmem_cache_free+0x221/0x400 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f20f21f017b RSP: 002b:00007ffcfc4a77c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffcfc4a78d8 RCX: 00007f20f21f017b RDX: 00007ffcfc4a78c0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffcfc4a78a0 R08: 000056147d125190 R09: 00007f20f1f14c60 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffcfc4a7890 R13: 000000000000001c R14: 000056147d100fc0 R15: 00007f20e365c9d0 </TASK> Fixes: d3d930411ce3 ("RDMA/mlx5: Fix implicit ODP use after free") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@nvidia.com> Link: https://patch.msgid.link/80f2fcd19952dfa7d9981d93fd6359b4471f8278.1739186929.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-18RDMA/mana_ib: Implement DMABUF MR supportKonstantin Taranov
Add support of dmabuf MRs to mana_ib. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1739454861-4456-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-18RDMA: Switch to use hrtimer_setup()Nam Cao
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://lore.kernel.org/all/37bd6895bb946f6d785ab5fe32f1a6f4b9e77c26.1738746904.git.namcao@linutronix.de
2025-02-14RDMA/irdma: Switch to using the crc32c libraryEric Biggers
Now that the crc32c() library function directly takes advantage of architecture-specific optimizations, it is unnecessary to go through the crypto API. Just use crc32c(). This is much simpler, and it improves performance due to eliminating the crypto API overhead. Note that for crc32c the equivalent of crypto_shash_digest() is cpu_to_le32(~crc32c(~0, ...)), considering that crypto_shash_digest() had before and inversions as well as a cpu_to_le32() built-in. This means that this driver is using u32 for fixed-endian types; this patch does not try to fix that but rather just keep the exact same behavior. Link: https://lore.kernel.org/r/20250207033643.59904-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://patch.msgid.link/20250207040816.69163-1-ebiggers@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-10RDMA/bnxt_re: Fix the statistics for Gen P7 VFSelvin Xavier
Gen P7 VF support the extended stats and is prevented by a VF check. Fix the check to issue the FW command for GenP7 VFs also. Fixes: 1801d87b3598 ("RDMA/bnxt_re: Support new 5760X P7 devices") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-10RDMA/bnxt_re: Fix issue in the unload pathKalesh AP
The cited comment removed the netdev notifier register call from the driver. But, it did not remove the cleanup code from the unload path. As a result, driver unload is not clean and resulted in undesired behaviour. Fixes: d3b15fcc4201 ("RDMA/bnxt_re: Remove deliver net device event") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-10RDMA/bnxt_re: Add sanity checks on rdev validityKalesh AP
There is a possibility that ulp_irq_stop and ulp_irq_start callbacks will be called when the device is in detached state. This can cause a crash due to NULL pointer dereference as the rdev is already freed. Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-10RDMA/bnxt_re: Fix an issue in bnxt_re_async_notifierKalesh AP
In the bnxt_re_async_notifier() callback, the way driver retrieves rdev pointer is wrong. The rdev pointer should be parsed from adev pointer as while registering with the L2 for ULP, driver uses the aux device pointer for the handle. Fixes: 7fea32784068 ("RDMA/bnxt_re: Add Async event handling support") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-09RDMA/bnxt_re: Fix the condition check while programming congestion controlSelvin Xavier
Program the Congestion control values when the CC gen matches. Fix the condition check for the same. Fixes: 656dff55da19 ("RDMA/bnxt_re: Congestion control settings using debugfs hook") Reported-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reported-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1739022506-8937-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-09RDMA/bnxt_re: Fix buffer overflow in debugfs codeDan Carpenter
Add some bounds checking to prevent memory corruption in bnxt_re_cc_config_set(). This is debugfs code so the bug can only be triggered by root. Fixes: 656dff55da19 ("RDMA/bnxt_re: Congestion control settings using debugfs hook") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/a6b081ab-55fe-4d0c-8f69-c5e5a59e9141@stanley.mountain Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-09RDMA/hns: Fix mbox timing out by adding retry mechanismJunxian Huang
If a QP is modified to error state and a flush CQE process is triggered, the subsequent QP destruction mbox can still be successfully posted but will be blocked in HW until the flush CQE process finishes. This causes further mbox posting timeouts in driver. The blocking time is related to QP depth. Considering an extreme case where SQ depth and RQ depth are both 32K, the blocking time can reach about 135ms. This patch adds a retry mechanism for mbox posting. For each try, FW waits 15ms for HW to complete the previous mbox, otherwise return a timeout error code to driver. Counting other time consumption in FW, set 8 tries for mbox posting and a 5ms time gap before each retry to increase to a sufficient timeout limit. Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250208105930.522796-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-09RDMA/mana_ib: Fix error code in probe()Dan Carpenter
Return -ENOMEM if dma_pool_create() fails. Don't return success. Fixes: df91c470d9e5 ("RDMA/mana_ib: create/destroy AH") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/2bbe900e-18b3-46b5-a08c-42eb71886da6@stanley.mountain Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mana_ib: Add port statistics supportShiraz Saleem
Implement alloc_hw_port_stats and get_hw_stats APIs to support querying MANA VF port level statistics from rdma stat tool. Example output from rdma stat tool: $rdma statistic show link mana_0/1 -p link mana_0/1 requester_timeout 45 requester_oos_nak 0 requester_rnr_nak 0 responder_rnr_nak 0 responder_oos 0 responder_dup_request 0 requester_implicit_nak 0 requester_readresp_psn_mismatch 0 nak_inv_req 0 nak_access_error 0 nak_opp_error 0 nak_inv_read 0 responder_local_len_error 0 requestor_local_prot_error 0 responder_rem_access_error 0 responder_local_qp_error 0 responder_malformed_wqe 0 general_hw_error 6 requester_rnr_nak_retries_exceeded 0 requester_retries_exceeded 5 total_fatal_error 6 received_cnps 0 num_qps_congested 0 rate_inc_events 0 num_qps_recovered 0 current_rate 100000 Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1738751527-15517-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mana_ib: Allocate PAGE aligned doorbell indexKonstantin Taranov
Allocate a PAGE aligned doorbell index to ensure each process gets a separate PAGE sized doorbell area space remapped to it in mana_ib_mmap Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1738751405-15041-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mana_ib: request error CQEs when supportedKonstantin Taranov
Request an adapter with error CQEs when it is supported. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1738751713-16169-3-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mana_ib: Query feature_flags bitmask from FWShiraz Saleem
Extend the mana_ib_gd_query_adapter_caps function to retrieve and store the feature_flags from the firmware response. Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1738751713-16169-2-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mlx5: Fix a WARN during dereg_mr for DM typeYishai Hadas
Memory regions (MR) of type DM (device memory) do not have an associated umem. In the __mlx5_ib_dereg_mr() -> mlx5_free_priv_descs() flow, the code incorrectly takes the wrong branch, attempting to call dma_unmap_single() on a DMA address that is not mapped. This results in a WARN [1], as shown below. The issue is resolved by properly accounting for the DM type and ensuring the correct branch is selected in mlx5_free_priv_descs(). [1] WARNING: CPU: 12 PID: 1346 at drivers/iommu/dma-iommu.c:1230 iommu_dma_unmap_page+0x79/0x90 Modules linked in: ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry ovelay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 12 UID: 0 PID: 1346 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1631 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:iommu_dma_unmap_page+0x79/0x90 Code: 2b 49 3b 29 72 26 49 3b 69 08 73 20 4d 89 f0 44 89 e9 4c 89 e2 48 89 ee 48 89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 07 b8 88 ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66 0f 1f 44 00 RSP: 0018:ffffc90001913a10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810194b0a8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88810194b0a8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f537abdd740(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f537aeb8000 CR3: 000000010c248001 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x84/0x190 ? iommu_dma_unmap_page+0x79/0x90 ? report_bug+0xf8/0x1c0 ? handle_bug+0x55/0x90 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? iommu_dma_unmap_page+0x79/0x90 dma_unmap_page_attrs+0xe6/0x290 mlx5_free_priv_descs+0xb0/0xe0 [mlx5_ib] __mlx5_ib_dereg_mr+0x37e/0x520 [mlx5_ib] ? _raw_spin_unlock_irq+0x24/0x40 ? wait_for_completion+0xfe/0x130 ? rdma_restrack_put+0x63/0xe0 [ib_core] ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0x116/0x170 [ib_uverbs] ? lock_release+0xc6/0x280 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f537adaf17b Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffff218f0b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffff218f1d8 RCX: 00007f537adaf17b RDX: 00007ffff218f1c0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffff218f1a0 R08: 00007f537aa8d010 R09: 0000561ee2e4f270 R10: 00007f537aace3a8 R11: 0000000000000246 R12: 00007ffff218f190 R13: 000000000000001c R14: 0000561ee2e4d7c0 R15: 00007ffff218f450 </TASK> Fixes: f18ec4223117 ("RDMA/mlx5: Use a union inside mlx5_ib_mr") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/2039c22cfc3df02378747ba4d623a558b53fc263.1738587076.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06RDMA/mlx5: Fix a race for DMABUF MR which can lead to CQE with errorYishai Hadas
This patch addresses a potential race condition for a DMABUF MR that can result in a CQE with an error on the UMR QP. During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs: mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait() At this point, the lkey is freed from the hardware's perspective. However, concurrently, mlx5_ib_dmabuf_invalidate_cb() might be triggered by another task attempting to invalidate the MR having that freed lkey. Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state. To resolve this race condition, the dma_resv_lock() which was hold as part of the mlx5_ib_dmabuf_invalidate_cb() is now also acquired as part of the mlx5_revoke_mr() scope. Upon a successful revoke, we set umem_dmabuf->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey. Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@mnvidia.com> Link: https://patch.msgid.link/70617067abbfaa0c816a2544c922e7f4346def58.1738587016.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-06IB/hfi1: Remove state transition log message and opa_lstate_name()Maher Sanalla
Remove the state transition log message from the hfi1 driver, as the IB core now logs the same information when handling a cache update event. While at it, replace the hfi1-specific opa_lstate_name() function with the ib_verbs equivalent function, ib_port_state_to_str(), for converting IB port state to a string. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/64e48bef00630e33f4b8c830cb7d9a69aedc34dc.1738586601.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-05ice, irdma: move interrupts code to irdmaMichal Swiatkowski
Move responsibility of MSI-X requesting for RDMA feature from ice driver to irdma driver. It is done to allow simple fallback when there is not enough MSI-X available. Change amount of MSI-X used for control from 4 to 1, as it isn't needed to have more than one MSI-X for this purpose. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-04RDMA/bnxt_re: Congestion control settings using debugfs hookSelvin Xavier
Implements routines to set and get different settings of the congestion control. This will enable the users to modify the settings according to their network. Currently supporting only GEN 0 version of the parameters. Reading these files queries the firmware and report the values currently programmed. Writing to the files sends commands that update the congestion control settings Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1737301535-6599-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: indicate CM supportKonstantin Taranov
Set max_mad_size and IB_PORT_CM_SUP capability to enable connection manager. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-14-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: polling of CQs for GSI/UDKonstantin Taranov
Add polling for the kernel CQs. Process completion events for UD/GSI QPs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-13-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: extend mana QP tableKonstantin Taranov
Enable mana QP table to store UD/GSI QPs. For send queues, set the most significant bit to one, as send and receive WQs can have the same ID in mana. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-12-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: implement req_notify_cqKonstantin Taranov
Arm a CQ when req_notify_cq is called. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-11-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: UD/GSI work requestsKonstantin Taranov
Implement post send and post recv for UD/GSI QPs. Add information about posted requests into shadow queues. Co-developed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-10-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: create/destroy AHKonstantin Taranov
Implement create and destroy AH for kernel. In mana_ib, AV is passed as an sge in WQE. Allocate DMA memory and write an AV there. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-8-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: UD/GSI QP creation for kernelKonstantin Taranov
Implement UD/GSI QPs for the kernel. Allow create/modify/destroy for such QPs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-7-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: Create and destroy UD/GSI QPKonstantin Taranov
Implement HW requests to create and destroy UD/GSI QPs. An UD/GSI QP has send and receive queues. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-6-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: create kernel-level CQsKonstantin Taranov
Implement creation of CQs for the kernel. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-5-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-04RDMA/mana_ib: helpers to allocate kernel queuesKonstantin Taranov
Introduce helpers to allocate queues for kernel-level use. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-4-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-03RDMA/mana_ib: implement get_dma_mrKonstantin Taranov
Implement allocation of DMA-mapped memory regions. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-03RDMA/mana_ib: Allow registration of DMA-mapped memory in PDsKonstantin Taranov
Allow the HW to register DMA-mapped memory for kernel-level PDs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1737394039-28772-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-03IB/mlx5: Set and get correct qp_num for a DCT QPMark Zhang
When a DCT QP is created on an active lag, it's dctc.port is assigned in a round-robin way, which is from 1 to dev->lag_port. In this case when querying this QP, we may get qp_attr.port_num > 2. Fix this by setting qp->port when modifying a DCT QP, and read port_num from qp->port instead of dctc.port when querying it. Fixes: 7c4b1ab9f167 ("IB/mlx5: Add DCT RoCE LAG support") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/94c76bf0adbea997f87ffa27674e0a7118ad92a9.1737290358.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>