summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2024-11-05sysfs: treewide: constify attribute callback of bin_is_visible()Thomas Weißschuh
The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-04RDMA/nldev: Add IB device and net device rename eventsChiara Meiohas
Implement event sending for IB device rename and IB device port associated netdevice rename. In iproute2, rdma monitor displays the IB device name, port and the netdevice name when displaying event info. Since users can modiy these names, we track and notify on renaming events. Note: In order to receive netdevice rename events, drivers must use the ib_device_set_netdev() API when attaching net devices to IB devices. $ rdma monitor $ rmmod mlx5_ib [UNREGISTER] dev 1 rocep8s0f1 [UNREGISTER] dev 0 rocep8s0f0 $ modprobe mlx5_ib [REGISTER] dev 2 mlx5_0 [NETDEV_ATTACH] dev 2 mlx5_0 port 1 netdev 4 eth2 [REGISTER] dev 3 mlx5_1 [NETDEV_ATTACH] dev 3 mlx5_1 port 1 netdev 5 eth3 [RENAME] dev 2 rocep8s0f0 [RENAME] dev 3 rocep8s0f1 $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev [UNREGISTER] dev 2 rocep8s0f0 [REGISTER] dev 4 mlx5_0 [NETDEV_ATTACH] dev 4 mlx5_0 port 30 netdev 4 eth2 [RENAME] dev 4 rdmap8s0f0 $ echo 4 > /sys/class/net/eth2/device/sriov_numvfs [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 2 netdev 7 eth4 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 3 netdev 8 eth5 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 4 netdev 9 eth6 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 5 netdev 10 eth7 [REGISTER] dev 5 mlx5_0 [NETDEV_ATTACH] dev 5 mlx5_0 port 1 netdev 11 eth8 [REGISTER] dev 6 mlx5_1 [NETDEV_ATTACH] dev 6 mlx5_1 port 1 netdev 12 eth9 [RENAME] dev 5 rocep8s0f0v0 [RENAME] dev 6 rocep8s0f0v1 [REGISTER] dev 7 mlx5_0 [NETDEV_ATTACH] dev 7 mlx5_0 port 1 netdev 13 eth10 [RENAME] dev 7 rocep8s0f0v2 [REGISTER] dev 8 mlx5_0 [NETDEV_ATTACH] dev 8 mlx5_0 port 1 netdev 14 eth11 [RENAME] dev 8 rocep8s0f0v3 $ ip link set eth2 name myeth2 [NETDEV_RENAME] netdev 4 myeth2 $ ip link set eth1 name myeth1 ** no events received, because eth1 is not attached to an IB device ** Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/093c978ef2766fd3ab4ff8798eeb68f2f11582f6.1730367038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Add implementation for ufile_hw_cleanup device operationPatrisious Haddad
Implement the device API for ufile_hw_cleanup operation, which iterates over the ufile uobjects lists, and attempts to destroy DevX QPs, by issuing up to 8 commands in parallel. This function is responsible only for cleaning the FW resources of the QP, and doesn't necessarily cleanup all of its resources. Hence the normal serialized cleanup flow is still executed after it in __uverbs_cleanup_ufile() to cleanup the remaining resources and handle the cleanup of SW objects. In order to avoid double cleanup for the FW resources, new DevX flag was added DEVX_OBJ_FLAGS_HW_FREED, which marks the object's FW resources as already freed. Since QP destruction is the most time-consuming operation in FW, parallelizing it reduces the cleanup time of applications that use DevX QPs. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/2f82675d0412542cba1c47a6b86f589521ae41e1.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Move ib_uverbs_file struct to uverbs_types.hPatrisious Haddad
In light of the previous commit, make the ib_uverbs_file accessible to drivers by moving its definition to uverbs_types.h, to allow drivers to freely access the struct argument and create a personalized cleanup flow. For the same reason expose uverbs_try_lock_object function to allow driver to safely access the uverbs objects. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/29b718e0dca35daa5f496320a39284fc1f5a1722.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Add device ufile cleanup operationPatrisious Haddad
Add a driver operation to allow preemptive cleanup of ufile HW resources before the standard ufile cleanup flow begins. Thus, expediting the final cleanup phase which leads to fast teardown overall. This allows the use of driver specific clean up procedures to make the cleanup process more efficient. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/cabe00d75132b5732cb515944e3c500a01fb0b4a.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Ensure active slave attachment to the bond IB deviceChiara Meiohas
Fix a race condition when creating a lag bond in active backup mode where after the bond creation the backup slave was attached to the IB device, instead of the active slave. This caused stale entries in the GID table, as the gid updating mechanism relies on ib_device_get_netdev(), which would return the backup slave. Send an MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE event when activating the lag, additionally to when modifying the lag. This ensures that eventually the active netdevice is stored in the bond IB device. When handling this event remove the GIDs of the previously attached netdevice in this port and rescan the GIDs of the newly attached netdevice. This ensures that eventually the active slave netdevice is correctly stored in the IB device port. While there might be a brief moment where the backup slave GIDs appear in the GID table, it will eventually stabilize with the correct GIDs (of the bond and the active slave). Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/91fc2cb24f63add266a528c1c702668a80416d9f.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Implement RoCE GID port rescan and export delete functionChiara Meiohas
rdma_roce_rescan_port() scans all network devices in the system and adds the gids if relevant to the RoCE device port. When not in bonding mode it adds the GIDs of the netdevice in this port. When in bonding mode it adds the GIDs of both the port's netdevice and the bond master netdevice. Export roce_del_all_netdev_gids(), which removes all GIDs associated with a specific netdevice for a given port. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/674d498da4637a1503ff1367e28bd09ff942fd5e.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Call dev_put() after the blocking notifierChiara Meiohas
Move dev_put() call to occur directly after the blocking notifier, instead of within the event handler. Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/342ff94b3dcbb07da1c7dab862a73933d604b717.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Support querying per-plane IB PortCountersMark Zhang
On a SMI device, set requested plane_num when querying PPCNT register with the PortCounters Attribute group. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/828d57444a0a41042556bb0a4394ecf2fcaed639.1730368052.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Support OOO RX WQE consumptionEdward Srouji
Support QP with out-of-order (OOO) capabilities enabled. This allows WRs on the receiver side of the QP to be consumed OOO, permitting the sender side to transmit messages without guaranteeing arrival order on the receiver side. When enabled, the completion ordering of WRs remains in-order, regardless of the Receive WRs consumption order. RDMA Read and RDMA Atomic operations on the responder side continue to be executed in-order, while the ordering of data placement for RDMA Write and Send operations is not guaranteed. Atomic operations larger than 8 bytes are currently not supported. Therefore, when this feature is enabled, the created QP restricts its atomic support to 8 bytes at most. In addition, when querying the device, a new flag is returned in response to indicate that the Kernel supports OOO QP. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/06ac609a5f358c8fb0a090d22c61a2f9329d82e6.1725362773.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Add debugfs hook in the driverKalesh AP
Adding support for a per device debugfs folder for exporting some of the device specific debug information. Added support to get QP info for now. The same folder can be used to export other debug features in future. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Support raw data query for each resourcesKashyap Desai
Support interfaces to get the raw data for each of the resources. Use this interface to get some of the HW structures from active resources. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Add support for querying HW contextsKashyap Desai
Implements support for querying the hardware resource contexts. This raw data can be used for the debugging of the field issues. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/bnxt_re: Support driver specific data collection using rdma toolKashyap Desai
Allow users to dump driver specific resource details when queried through rdma tool. This supports the driver data for QP, CQ, MR and SRQ. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730428483-17841-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/rxe: Set queue pair cur_qp_state when being queriedLiu Jian
Same with commit e375b9c92985 ("RDMA/cxgb4: Set queue pair state when being queried"). The API for ib_query_qp requires the driver to set cur_qp_state on return, add the missing set. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Liu Jian <liujian56@huawei.com> Link: https://patch.msgid.link/20241031092019.2138467-1-liujian56@huawei.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-03RDMA/bnxt_re: Remove some dead codeChristophe JAILLET
If the probe succeeds, then auxiliary_get_drvdata() can't return a NULL pointer. So several NULL checks can be removed to simplify code. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/f02eb630734ee530315dce9f60b078f631ae93d0.1730477345.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-03RDMA/bnxt_re: Fix some error handling paths in bnxt_re_probe()Christophe JAILLET
If bnxt_re_add_device() fails, 'en_info' still needs to be freed, as already done in the .remove() function. The commit in Fixes incorrectly removed this call, certainly because it was expecting the .remove() function was called anyway. But if the probe fails, the remove function is not called. There is no need to call bnxt_re_remove() as it was done before, kfree() is enough. Fixes: a5e099e0c464 ("RDMA/bnxt_re: Fix an error path in bnxt_re_add_device") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/9e48ff955ae55fc39a9eb1eb590d374539eab5ba.1730477345.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-03deal with the last remaing boolean uses of fd_file()Al Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03fdget(), more trivial conversionsAl Viro
all failure exits prior to fdget() leave the scope, all matching fdput() are immediately followed by leaving the scope. [xfs_ioc_commit_range() chunk moved here as well] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-30RDMA/efa: Report link speed according to device attributesMichael Margolin
Set port link speed and width based on max bandwidth acquired from the device instead of using constant 100 Gbps. Use a default value in case the device didn't set the field. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas Jahjah <firasj@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20241030093006.21352-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/bnxt_re: Check cqe flags to know imm_data vs inv_irkeyKashyap Desai
Invalidate rkey is cpu endian and immediate data is in big endian format. Both immediate data and invalidate the remote key returned by HW is in little endian format. While handling the commit in fixes tag, the difference between immediate data and invalidate rkey endianness was not considered. Without changes of this patch, Kernel ULP was failing while processing inv_rkey. dmesg log snippet - nvme nvme0: Bogus remote invalidation for rkey 0x2000019Fix in this patch Do endianness conversion based on completion queue entry flag. Also, the HW completions are already converted to host endianness in bnxt_qplib_cq_process_res_rc and bnxt_qplib_cq_process_res_ud and there is no need to convert it again in bnxt_re_poll_cq. Modified the union to hold the correct data type. Fixes: 95b087f87b78 ("bnxt_re: Fix imm_data endianness") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730110014-20755-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/rxe: Fix the qp flush warnings in reqZhu Yanjun
When the qp is in error state, the status of WQEs in the queue should be set to error. Or else the following will appear. [ 920.617269] WARNING: CPU: 1 PID: 21 at drivers/infiniband/sw/rxe/rxe_comp.c:756 rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.617744] Modules linked in: rnbd_client(O) rtrs_client(O) rtrs_core(O) rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop brd null_blk ipv6 [ 920.618516] CPU: 1 PID: 21 Comm: ksoftirqd/1 Tainted: G O 6.1.113-storage+ #65 [ 920.618986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 920.619396] RIP: 0010:rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.619658] Code: 0f b6 84 24 3a 02 00 00 41 89 84 24 44 04 00 00 e9 2a f7 ff ff 39 ca bb 03 00 00 00 b8 0e 00 00 00 48 0f 45 d8 e9 15 f7 ff ff <0f> 0b e9 cb f8 ff ff 41 bf f5 ff ff ff e9 08 f8 ff ff 49 8d bc 24 [ 920.620482] RSP: 0018:ffff97b7c00bbc38 EFLAGS: 00010246 [ 920.620817] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000008 [ 920.621183] RDX: ffff960dc396ebc0 RSI: 0000000000005400 RDI: ffff960dc4e2fbac [ 920.621548] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffffffac406450 [ 920.621884] R10: ffffffffac4060c0 R11: 0000000000000001 R12: ffff960dc4e2f800 [ 920.622254] R13: ffff960dc4e2f928 R14: ffff97b7c029c580 R15: 0000000000000000 [ 920.622609] FS: 0000000000000000(0000) GS:ffff960ef7d00000(0000) knlGS:0000000000000000 [ 920.622979] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 920.623245] CR2: 00007fa056965e90 CR3: 00000001107f1000 CR4: 00000000000006e0 [ 920.623680] Call Trace: [ 920.623815] <TASK> [ 920.623933] ? __warn+0x79/0xc0 [ 920.624116] ? rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.624356] ? report_bug+0xfb/0x150 [ 920.624594] ? handle_bug+0x3c/0x60 [ 920.624796] ? exc_invalid_op+0x14/0x70 [ 920.624976] ? asm_exc_invalid_op+0x16/0x20 [ 920.625203] ? rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.625474] ? rxe_completer+0x329/0xcc0 [rdma_rxe] [ 920.625749] rxe_do_task+0x80/0x110 [rdma_rxe] [ 920.626037] rxe_requester+0x625/0xde0 [rdma_rxe] [ 920.626310] ? rxe_cq_post+0xe2/0x180 [rdma_rxe] [ 920.626583] ? do_complete+0x18d/0x220 [rdma_rxe] [ 920.626812] ? rxe_completer+0x1a3/0xcc0 [rdma_rxe] [ 920.627050] rxe_do_task+0x80/0x110 [rdma_rxe] [ 920.627285] tasklet_action_common.constprop.0+0xa4/0x120 [ 920.627522] handle_softirqs+0xc2/0x250 [ 920.627728] ? sort_range+0x20/0x20 [ 920.627942] run_ksoftirqd+0x1f/0x30 [ 920.628158] smpboot_thread_fn+0xc7/0x1b0 [ 920.628334] kthread+0xd6/0x100 [ 920.628504] ? kthread_complete_and_exit+0x20/0x20 [ 920.628709] ret_from_fork+0x1f/0x30 [ 920.628892] </TASK> Fixes: ae720bdb703b ("RDMA/rxe: Generate error completion for error requester QP state") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20241025152036.121417-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix cpu stuck caused by printings during resetwenglianfa
During reset, cmd to destroy resources such as qp, cq, and mr may fail, and error logs will be printed. When a large number of resources are destroyed, there will be lots of printings, and it may lead to a cpu stuck. Delete some unnecessary printings and replace other printing functions in these paths with the ratelimited version. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode") Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Use dev_* printings in hem code instead of ibdev_*Junxian Huang
The hem code is executed before ib_dev is registered, so use dev_* printing instead of ibdev_* to avoid log like this: (null): set HEM address to HW failed! Fixes: 2f49de21f3e9 ("RDMA/hns: Optimize mhop get flow for multi-hop addressing") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Modify debugfs nameYuyu Li
The sub-directory of hns_roce debugfs is named after the device's kernel name currently, but it will be inconvenient to use when the device is renamed. Modify the name to pci name as users can always easily find the correspondence between an RDMA device and its pci name. Fixes: eb7854d63db5 ("RDMA/hns: Support SW stats with debugfs") Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix flush cqe error when racing with destroy qpwenglianfa
QP needs to be modified to IB_QPS_ERROR to trigger HW flush cqe. But when this process races with destroy qp, the destroy-qp process may modify the QP to IB_QPS_RESET first. In this case flush cqe will fail since it is invalid to modify qp from IB_QPS_RESET to IB_QPS_ERROR. Add lock and bit flag to make sure pending flush cqe work is completed first and no more new works will be added. Fixes: ffd541d45726 ("RDMA/hns: Add the workqueue framework for flush cqe handler") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-3-huangjunxian6@hisilicon.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA/hns: Fix an AEQE overflow error caused by untimely update of eq_db_ciwenglianfa
eq_db_ci is updated only after all AEQEs are processed in the AEQ interrupt handler, which is not timely enough and may result in AEQ overflow. Two optimization methods are proposed: 1. Set an upper limit for AEQE processing. 2. Move time-consuming operations such as printings to the bottom half of the interrupt. cmd events and flush_cqe events are still fully processed in the top half to ensure timely handling. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-30RDMA: Use ethtool string helpersRosen Penev
Avoids having to manually increment the pointer. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241021011543.5922-1-rosenp@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Fix access flags for MR and QP modifyHongguang Gao
Access flag definition in MR and QP is different in FW. Currently both reg/bind MR and modify/query QP uses the same flags. Add a different function to map the QP access flags for newer adapters. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Add support for modify_device hookKalesh AP
Adds support for modify_device in the driver for node desc changes. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Add support for CQ rx coalescingChandramohan Akula
RoCE message rate performance is heavily degraded without the use of cq coalescing. With proper coalescing, message rates get better. Furthermore, coalescing significantly reduces contention on the PCIe Root Complex/Memory subsystems. Add the changes to configure CQ rx colascing parameters based on adapter revision when CQ is created. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/bnxt_re: Add support for optimized modify QPKalesh AP
Modify QP improvements are for state transitions from INIT -> RTR and RTR -> RTS. In order to support the Modify QP Optimization feature, the driver is expected to check for the feature support in the CMDQ_QUERY_FUNC and register its support for this feature with the FW in CMDQ_INITIALIZE_FIRMWARE. Additionally, the driver is required to specify the new fields and attribute masks for the transitions as follows: 1. INIT -> RTR: - New fields: srq_used, type. - enable srq_used when RC QP is configured to use SRQ. - set the type based on the QP type. - Mandatory masks: - RC: CMDQ_MODIFY_QP_MODIFY_MASK_ACCESS, CMDQ_MODIFY_QP_MODIFY_MASK_PKEY - UD QP and QP1: CMDQ_MODIFY_QP_MODIFY_MASK_PKEY, CMDQ_MODIFY_QP_MODIFY_MASK_QKEY 2. RTR -> RTS: - New fields: type - set the type based on the QP type. - Mandatory masks: - RC: CMDQ_MODIFY_QP_MODIFY_MASK_ACCESS - UD QP and QP1: CMDQ_MODIFY_QP_MODIFY_MASK_QKEY Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Tushar Rane <tushar.rane@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1729065346-1364-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-28RDMA/ipoib: Use the networking stack default for txqueuelenGal Pressman
There is no need for a special txqueuelen value for IPoIB. This value represents the qdisc size which is not related to the SQ size, and the default value provided by the stack (DEFAULT_TX_QUEUE_LEN) is sufficient for typical use cases. Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/cc97764b5a8def4ea879b371549a5867fe75c756.1728555243.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-21RDMA/efa: Add option to set QP service level on createMichael Margolin
Using modify QP with AH attributes and IB_QP_AV flag set doesn't make much sense for connectionless QP types like SRD. Add SL parameter to EFA create QP user ABI and pass it to the device. Link: https://patch.msgid.link/r/20241015174242.3490-3-mrgolin@amazon.com Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/efa: Update device interfaceMichael Margolin
Update device interface header files. Link: https://patch.msgid.link/r/20241015174242.3490-2-mrgolin@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/bnxt_re: synchronize the qp-handle table arraySelvin Xavier
There is a race between the CREQ tasklet and destroy qp when accessing the qp-handle table. There is a chance of reading a valid qp-handle in the CREQ tasklet handler while the QP is already moving ahead with the destruction. Fixing this race by implementing a table-lock to synchronize the access. Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") Link: https://patch.msgid.link/r/1728912975-19346-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/bnxt_re: Fix the usage of control path spin locksSelvin Xavier
Control path completion processing always runs in tasklet context. To synchronize with the posting thread, there is no need to use the irq variant of spin lock. Use spin_lock_bh instead. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/1728912975-19346-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/mlx5: Round max_rd_atomic/max_dest_rd_atomic up instead of downPatrisious Haddad
After the cited commit below max_dest_rd_atomic and max_rd_atomic values are being rounded down to the next power of 2. As opposed to the old behavior and mlx4 driver where they used to be rounded up instead. In order to stay consistent with older code and other drivers, revert to using fls round function which rounds up to the next power of 2. Fixes: f18e26af6aba ("RDMA/mlx5: Convert modify QP to use MLX5_SET macros") Link: https://patch.msgid.link/r/d85515d6ef21a2fa8ef4c8293dce9b58df8a6297.1728550179.git.leon@kernel.org Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-21RDMA/cxgb4: Dump vendor specific QP detailsLeon Romanovsky
Restore the missing functionality to dump vendor specific QP details, which was mistakenly removed in the commit mentioned in Fixes line. Fixes: 5cc34116ccec ("RDMA: Add dedicated QP resource tracker function") Link: https://patch.msgid.link/r/ed9844829135cfdcac7d64285688195a5cd43f82.1728323026.git.leonro@nvidia.com Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Closes: https://lore.kernel.org/all/Zv_4qAxuC0dLmgXP@gallifrey Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix the GID table lengthKalesh AP
GID table length is reported by FW. The gid index which is passed to the driver during modify_qp/create_ah is restricted by the sgid_index field of struct ib_global_route. sgid_index is u8 and the max sgid possible is 256. Each GID entry in HW will have 2 GID entries in the kernel gid table. So we can support twice the gid table size reported by FW. Also, restrict the max GID to 256 also. Fixes: 847b97887ed4 ("RDMA/bnxt_re: Restrict the max_gids to 256") Link: https://patch.msgid.link/r/1728373302-19530-11-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pagesBhargava Chenna Marreddy
Avoid memory corruption while setting up Level-2 PBL pages for the non MR resources when num_pages > 256K. There will be a single PDE page address (contiguous pages in the case of > PAGE_SIZE), but, current logic assumes multiple pages, leading to invalid memory access after 256K PBL entries in the PDE. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Link: https://patch.msgid.link/r/1728373302-19530-10-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Change the sequence of updating the CQ toggle valueChandramohan Akula
Currently the CQ toggle value in the shared page (read by the userlib) is updated as part of the cqn_handler. There is a potential race of application calling the CQ ARM doorbell immediately and using the old toggle value. Change the sequence of updating CQ toggle value to update in the bnxt_qplib_service_nq function immediately after reading the toggle value to be in sync with the HW updated value. Fixes: e275919d9669 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace") Link: https://patch.msgid.link/r/1728373302-19530-9-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix an error path in bnxt_re_add_deviceKalesh AP
In bnxt_re_add_device(), when register netdev notifier fails, driver is not unregistering the IB device in the error cleanup path. Also, removed the duplicate cleanup in error path of bnxt_re_probe. Fixes: 94a9dc6ac8f7 ("RDMA/bnxt_re: Group all operations under add_device and remove_device") Link: https://patch.msgid.link/r/1728373302-19530-8-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loopSelvin Xavier
Driver waits indefinitely for the fifo occupancy to go below a threshold as soon as the pacing interrupt is received. This can cause soft lockup on one of the processors, if the rate of DB is very high. Add a loop count for FPGA and exit the __wait_for_fifo_occupancy_below_th if the loop is taking more time. Pacing will be continuing until the occupancy is below the threshold. This is ensured by the checks in bnxt_re_pacing_timer_exp and further scheduling the work for pacing based on the fifo occupancy. Fixes: 2ad4e6303a6d ("RDMA/bnxt_re: Implement doorbell pacing algorithm") Link: https://patch.msgid.link/r/1728373302-19530-7-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix a possible NULL pointer dereferenceKalesh AP
There is a possibility of a NULL pointer dereference in the failure path of bnxt_re_add_device(). To address that, moved the update of "rdev->adev" to bnxt_re_dev_add(). Fixes: dee3da3422d5 ("RDMA/bnxt_re: Change aux driver data to en_info to hold more information") Link: https://patch.msgid.link/r/1728373302-19530-6-git-send-email-selvin.xavier@broadcom.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-rdma/CAH-L+nMCwymKGqf5pd8-FZNhxEkDD=kb6AoCaE6fAVi7b3e5Qw@mail.gmail.com/T/#t Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Return more meaningful errorKalesh AP
When the HWRM command fails, driver currently returns -EFAULT(Bad address). This does not look correct. Modified to return -EIO(I/O error). Fixes: cc1ec769b87c ("RDMA/bnxt_re: Fixing the Control path command and response handling") Fixes: 65288a22ddd8 ("RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command") Link: https://patch.msgid.link/r/1728373302-19530-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix incorrect dereference of srq in async eventKashyap Desai
Currently driver is not getting correct srq. Dereference only if qplib has a valid srq. Fixes: b02fd3f79ec3 ("RDMA/bnxt_re: Report async events and errors") Link: https://patch.msgid.link/r/1728373302-19530-4-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix out of bound checkKalesh AP
Driver exports pacing stats only on GenP5 and P7 adapters. But while parsing the pacing stats, driver has a check for "rdev->dbr_pacing". This caused a trace when KASAN is enabled. BUG: KASAN: slab-out-of-bounds in bnxt_re_get_hw_stats+0x2b6a/0x2e00 [bnxt_re] Write of size 8 at addr ffff8885942a6340 by task modprobe/4809 Fixes: 8b6573ff3420 ("bnxt_re: Update the debug counters for doorbell pacing") Link: https://patch.msgid.link/r/1728373302-19530-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix the max CQ WQEs for older adaptersAbhishek Mohapatra
Older adapters doesn't support the MAX CQ WQEs reported by older FW. So restrict the value reported to 1M always for older adapters. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/1728373302-19530-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Abhishek Mohapatra<abhishek.mohapatra@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/srpt: Make slab cache names uniqueBart Van Assche
Since commit 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y"), slab complains about duplicate cache names. Hence this patch. The approach is as follows: - Maintain an xarray with the slab size as index and a reference count and a kmem_cache pointer as contents. Use srpt-${slab_size} as kmem cache name. - Use 512-byte alignment for all slabs instead of only for some of the slabs. - Increment the reference count instead of calling kmem_cache_create(). - Decrement the reference count instead of calling kmem_cache_destroy(). Fixes: 5dabcd0456d7 ("RDMA/srpt: Add support for immediate data") Link: https://patch.msgid.link/r/20241009210048.4122518-1-bvanassche@acm.org Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-block/xpe6bea7rakpyoyfvspvin2dsozjmjtjktpph7rep3h25tv7fb@ooz4cu5z6bq6/ Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>