summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-22RDMA/rxe: Don't schedule rxe_completer from rxe_requesterBob Pearson
Now that rxe_completer() is always called serially after rxe_requester() there is no reason to schedule rxe_completer() from rxe_requester(). Link: https://lore.kernel.org/r/20240329145513.35381-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Remove save/rollback_state in rxe_requesterBob Pearson
Now that req.task and comp.task are merged it is no longer necessary to call save_state() before calling rxe_xmit_pkt() and rollback_state() if rxe_xmit_pkt() fails. This was done originally to prevent races between rxe_completer() and rxe_requester() which now cannot happen. Link: https://lore.kernel.org/r/20240329145513.35381-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Merge request and complete tasksBob Pearson
Currently the rxe driver has three work queue tasks per qp. These are the req.task, comp.task and resp.task which call rxe_requester(), rxe_completer() and rxe_responder() respectively directly or on work queues. Each of these subroutines checks to see if there is work to be performed on the send queue or on the response packet queue or the request packet queue and will run until there is no work remaining or yield the cpu and reschedule itself until there is no work remaining. This commit combines the req.task and comp.task into a single send.task and renames the resp.task to the recv.task. The combined send.task calls rxe_requester() and rxe_completer() serially and continues until all work on both the send queue and the response packet queue are done. In various benchmarks the performance is either improved or left the same. At high scale there is a significant reduction in the load on the cpu. This is the first step in combining these two tasks. Once they are serialized cross rescheduling of req.task and comp.task can be more efficiently handled by just letting the send.task continue to run. This will be done in the next several patches. Link: https://lore.kernel.org/r/20240329145513.35381-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Remove redundant scheduling of rxe_completerBob Pearson
In rxe_post_send_kernel() if the qp is in the error state after posting the work requests the rxe_completer() task is scheduled. But, the only way to move the qp into the error state is to call rxe_qp_error() which also schedules the rxe_completer() task to drain the queues. Calling it a second time has no effect. This commit removes the redundant call. Link: https://lore.kernel.org/r/20240329145513.35381-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Allow good work requests to be executedBob Pearson
A previous commit incorrectly added an 'if(!err)' before scheduling the requester task in rxe_post_send_kernel(). But if there were send wrs successfully added to the send queue before a bad wr they might never get executed. This commit fixes this by scheduling the requester task if any wqes were successfully posted in rxe_post_send_kernel() in rxe_verbs.c. Link: https://lore.kernel.org/r/20240329145513.35381-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 5bf944f24129 ("RDMA/rxe: Add error messages") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Fix seg fault in rxe_comp_queue_pktBob Pearson
In rxe_comp_queue_pkt() an incoming response packet skb is enqueued to the resp_pkts queue and then a decision is made whether to run the completer task inline or schedule it. Finally the skb is dereferenced to bump a 'hw' performance counter. This is wrong because if the completer task is already running in a separate thread it may have already processed the skb and freed it which can cause a seg fault. This has been observed infrequently in testing at high scale. This patch fixes this by changing the order of enqueuing the packet until after the counter is accessed. Link: https://lore.kernel.org/r/20240329145513.35381-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 0b1e5b99a48b ("IB/rxe: Add port protocol stats") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-21IB/core: Implement a limit on UMAD receive ListMichael Guralnik
The existing behavior of ib_umad, which maintains received MAD packets in an unbounded list, poses a risk of uncontrolled growth. As user-space applications extract packets from this list, the rate of extraction may not match the rate of incoming packets, leading to potential list overflow. To address this, we introduce a limit to the size of the list. After considering typical scenarios, such as OpenSM processing, which can handle approximately 100k packets per second, and the 1-second retry timeout for most packets, we set the list size limit to 200k. Packets received beyond this limit are dropped, assuming they are likely timed out by the time they are handled by user-space. Notably, packets queued on the receive list due to reasons like timed-out sends are preserved even when the list is full. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/7197cb58a7d9e78399008f25036205ceab07fbd5.1713268818.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Modify the print level of CQE errorChengchang Tang
Too much print may lead to a panic in kernel. Change ibdev_err() to ibdev_err_ratelimited(), and change the printing level of cqe dump to debug level. Fixes: 7c044adca272 ("RDMA/hns: Simplify the cqe code of poll cq") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-11-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Use complete parentheses in macrosChengchang Tang
Use complete parentheses to ensure that macro expansion does not produce unexpected results. Fixes: a25d13cbe816 ("RDMA/hns: Add the interfaces to support multi hop addressing for the contexts in hip08") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-10-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Add mutex_destroy()wenglianfa
Add mutex_destroy(). Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-9-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Fix GMV table pagesizeChengchang Tang
GMV's BA table only supports 4K pages. Currently, PAGESIZE is used to calculate gmv_bt_num, which will cause an abnormal number of gmv_bt_num in a 64K OS. Fixes: d6d91e46210f ("RDMA/hns: Add support for configuring GMV table") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Fix mismatch exception rollbackwenglianfa
When dma_alloc_coherent() fails in hns_roce_alloc_hem(), just call kfree() to release hem instead of hns_roce_free_hem(). Fixes: c00743cbf2b8 ("RDMA/hns: Simplify 'struct hns_roce_hem' allocation") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Fix UAF for cq async eventChengchang Tang
The refcount of CQ is not protected by locks. When CQ asynchronous events and CQ destruction are concurrent, CQ may have been released, which will cause UAF. Use the xa_lock() to protect the CQ refcount. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Fix deadlock on SRQ async events.Chengchang Tang
xa_lock for SRQ table may be required in AEQ. Use xa_store_irq()/ xa_erase_irq() to avoid deadlock. Fixes: 81fce6291d99 ("RDMA/hns: Add SRQ asynchronous event support") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Add max_ah and cq moderation capacities in query_device()Chengchang Tang
Add max_ah and cq moderation capacities to hns_roce_query_device(). Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Remove unused parameters and variablesChengchang Tang
Remove unused parameters and variables. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Use macro instead of magic numberYangyang Li
Use macro instead of magic number. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/hns: Fix return value in hns_roce_map_mr_sgZhengchao Shao
As described in the ib_map_mr_sg function comment, it returns the number of sg elements that were mapped to the memory region. However, hns_roce_map_mr_sg returns the number of pages required for mapping the DMA area. Fix it. Fixes: 9b2cf76c9f05 ("RDMA/hns: Optimize PBL buffer allocation process") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240411033851.2884771-1-shaozhengchao@huawei.com Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/mana_ib: Configure mac address in RNICKonstantin Taranov
Set local mac address in RNIC, which is required by the HW. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-7-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/mana_ib: Adding and deleting GIDsKonstantin Taranov
Implement add_gid and del_gid for RNIC. IPv4 and IPv6 addresses are supported. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-6-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/mana_ib: Enable RoCE on port 1Konstantin Taranov
Set netdev and RoCEv2 flag to enable GID population on port 1. Use GIDs of the master netdev. As mc->ports[] stores slave devices, use a helper to get the master netdev. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-5-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/mana_ib: Implement port parametersKonstantin Taranov
Implement port parameters for RNIC: 1) extend query_port() method 2) implement get_link_layer() 3) implement query_pkey() Only port 1 can store GIDs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-4-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/mana_ib: Create and destroy rnic adapterKonstantin Taranov
Add functions for RNIC creation and destruction. If creation fails, the ib_probe fails as well. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-3-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/mana_ib: Add EQ creation for rnic adapterKonstantin Taranov
Create an error EQ for the RNIC adapter. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-2-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-16RDMA/mana_ib: Use num_comp_vectors of ib_deviceKonstantin Taranov
Use num_comp_vectors of struct ib_device instead of max_num_queues from gdma_context. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712911656-17352-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-11RDMA/rxe: Return the correct errnoZhu Yanjun
In the function __rxe_add_to_pool, the function xa_alloc_cyclic is called. The return value of the function xa_alloc_cyclic is as below: " Return: 0 if the allocation succeeded without wrapping. 1 if the allocation succeeded after wrapping, -ENOMEM if memory could not be allocated or -EBUSY if there are no free entries in @limit. " But now the function __rxe_add_to_pool only returns -EINVAL. All the returned error value should be returned to the caller. Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240408142142.792413-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-11RDMA/mana_ib: remove useless return values from dbg printsKonstantin Taranov
Remove printing ret value on success as it was always 0. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712672465-29960-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-11RDMA/mana_ib: Add flex array to struct mana_cfg_rx_steer_req_v2Leon Romanovsky
The "struct mana_cfg_rx_steer_req_v2" uses a dynamically sized set of trailing elements. Specifically, it uses a "mana_handle_t" array. So, use the preferred way in the kernel declaring a flexible array [1]. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Also, avoid the open-coded arithmetic in the memory allocator functions [2] using the "struct_size" macro. Moreover, use the "offsetof" helper to get the indirect table offset instead of the "sizeof" operator and avoid the open-coded arithmetic in pointers using the new flex member. This new structure member also allow us to remove the "req_indir_tab" variable since it is no longer needed. Now, it is also possible to use the "flex_array_size" helper to compute the size of these trailing elements in the "memcpy" function. Specifically, the first commit adds the flex member and the patches 2 and 3 refactor the consumers of the "struct mana_cfg_rx_steer_req_v2". This code was detected with the help of Coccinelle, and audited and modified manually. The Coccinelle script used to detect this code pattern is the following: virtual report @rule1@ type t1; type t2; identifier i0; identifier i1; identifier i2; identifier ALLOC =~ "kmalloc|kzalloc|kmalloc_node|kzalloc_node|vmalloc|vzalloc|kvmalloc|kvzalloc"; position p1; @@ i0 = sizeof(t1) + sizeof(t2) * i1; ... i2 = ALLOC@p1(..., i0, ...); @script:python depends on report@ p1 << rule1.p1; @@ msg = "WARNING: verify allocation on line %s" % (p1[0].line) coccilib.report.print_report(p1[0],msg) [1] https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays [2] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Link: https://lore.kernel.org/all/AS8PR02MB72374BD1B23728F2E3C3B1A18B022@AS8PR02MB7237.eurprd02.prod.outlook.com Signed-off-by: Erick Archer <erick.archer@outlook.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> * mana-ib-flex: net: mana: Avoid open coded arithmetic RDMA/mana_ib: Prefer struct_size over open coded arithmetic net: mana: Add flex array to struct mana_cfg_rx_steer_req_v2
2024-04-11net: mana: Avoid open coded arithmeticErick Archer
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "req" variable is a pointer to "struct mana_cfg_rx_steer_req_v2" and this structure ends in a flexible array: struct mana_cfg_rx_steer_req_v2 { [...] mana_handle_t indir_tab[] __counted_by(num_indir_entries); }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + size * count" in the kzalloc() function. Moreover, use the "offsetof" helper to get the indirect table offset instead of the "sizeof" operator and avoid the open-coded arithmetic in pointers using the new flex member. This new structure member also allow us to remove the "req_indir_tab" variable since it is no longer needed. Now, it is also possible to use the "flex_array_size" helper to compute the size of these trailing elements in the "memcpy" function. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer <erick.archer@outlook.com> Link: https://lore.kernel.org/r/AS8PR02MB7237A21355C86EC0DCC0D83B8B022@AS8PR02MB7237.eurprd02.prod.outlook.com Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-11RDMA/mana_ib: Prefer struct_size over open coded arithmeticErick Archer
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "req" variable is a pointer to "struct mana_cfg_rx_steer_req_v2" and this structure ends in a flexible array: struct mana_cfg_rx_steer_req_v2 { [...] mana_handle_t indir_tab[] __counted_by(num_indir_entries); }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + size * count" in the kzalloc() function. Moreover, use the "offsetof" helper to get the indirect table offset instead of the "sizeof" operator and avoid the open-coded arithmetic in pointers using the new flex member. This new structure member also allow us to remove the "req_indir_tab" variable since it is no longer needed. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer <erick.archer@outlook.com> Link: https://lore.kernel.org/r/AS8PR02MB72375EB06EE1A84A67BE722E8B022@AS8PR02MB7237.eurprd02.prod.outlook.com Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-11net: mana: Add flex array to struct mana_cfg_rx_steer_req_v2Erick Archer
The "struct mana_cfg_rx_steer_req_v2" uses a dynamically sized set of trailing elements. Specifically, it uses a "mana_handle_t" array. So, use the preferred way in the kernel declaring a flexible array [1]. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). This is a previous step to refactor the two consumers of this structure. drivers/infiniband/hw/mana/qp.c drivers/net/ethernet/microsoft/mana/mana_en.c The ultimate goal is to avoid the open-coded arithmetic in the memory allocator functions [2] using the "struct_size" macro. Link: https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays [1] Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [2] Signed-off-by: Erick Archer <erick.archer@outlook.com> Link: https://lore.kernel.org/r/AS8PR02MB7237E2900247571C9CB84C678B022@AS8PR02MB7237.eurprd02.prod.outlook.com Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-09RDMA/hns: Support DSCPJunxian Huang
Add support for DSCP configuration. For DSCP, get dscp-prio mapping via hns3 nic driver api .get_dscp_prio() and fill the SL (in WQE for UD or in QPC for RC) with the priority value. The prio-tc mapping is configured to HW by hns3 nic driver. HW will select a corresponding TC according to SL and the prio-tc mapping. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240315093551.1650088-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-08RDMA/mlx5: Adding remote atomic access flag to updatable flagsOr Har-Toov
Currently IB_ACCESS_REMOTE_ATOMIC is blocked from being updated via UMR although in some cases it should be possible. These cases are checked in mlx5r_umr_can_reconfig function. Fixes: ef3642c4f54d ("RDMA/mlx5: Fix error unwinds for rereg_mr") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/24dac73e2fa48cb806f33a932d97f3e402a5ea2c.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-08RDMA/mlx5: Change check for cacheable mkeysOr Har-Toov
umem can be NULL for user application mkeys in some cases. Therefore umem can't be used for checking if the mkey is cacheable and it is changed for checking a flag that indicates it. Also make sure that all mkeys which are not returned to the cache will be destroyed. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/2690bc5c6896bcb937f89af16a1ff0343a7ab3d0.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-08RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_entOr Har-Toov
As some mkeys can't be modified with UMR due to some UMR limitations, like the size of translation that can be updated, not all user mkeys can be cached. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/f2742dd934ed73b2d32c66afb8e91b823063880c.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-02RDMA/mana_ib: Use struct mana_ib_queue for RAW QPsKonstantin Taranov
Use struct mana_ib_queue and its helpers for RAW QPs Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1711483688-24358-5-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-02RDMA/mana_ib: Use struct mana_ib_queue for WQsKonstantin Taranov
Use struct mana_ib_queue and its helpers for WQs Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1711483688-24358-4-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-02RDMA/mana_ib: Use struct mana_ib_queue for CQsKonstantin Taranov
Use struct mana_ib_queue and its helpers for CQs Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1711483688-24358-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-02RDMA/mana_ib: Introduce helpers to create and destroy mana queuesKonstantin Taranov
Intoduce helpers to work with mana ib queues (struct mana_ib_queue). A queue always consists of umem, gdma_region, and id. A queue can become a WQ or a CQ. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1711483688-24358-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-01RDMA/restrack: Fix potential invalid address accessWenchao Hao
struct rdma_restrack_entry's kern_name was set to KBUILD_MODNAME in ib_create_cq(), while if the module exited but forgot del this rdma_restrack_entry, it would cause a invalid address access in rdma_restrack_clean() when print the owner of this rdma_restrack_entry. These code is used to help find one forgotten PD release in one of the ULPs. But it is not needed anymore, so delete them. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20240318092320.1215235-1-haowenchao2@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-01RDMA/erdma: Remove unnecessary __GFP_ZERO flagBoshi Yu
The dma_alloc_coherent() interface automatically zero the memory returned. Thus, we do not need to specify the __GFP_ZERO flag explicitly when we call dma_alloc_coherent(). Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://lore.kernel.org/r/20240311113821.22482-4-boshiyu@alibaba-inc.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-01RDMA/erdma: Unify the names related to doorbell recordsBoshi Yu
There exist two different names for the doorbell records: db_info and db_record. We use dbrec for cpu address of the doorbell record and dbrec_dma for dma address of the doorbell recordi uniformly. Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://lore.kernel.org/r/20240311113821.22482-3-boshiyu@alibaba-inc.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-01RDMA/erdma: Allocate doorbell records from dma poolBoshi Yu
Currently, the 8 byte doorbell record is allocated along with the queue buffer, which may result in waste of dma space when the queue buffer is page aligned. To address this issue, we introduce a dma pool named db_pool and allocate doorbell record from it. Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://lore.kernel.org/r/20240311113821.22482-2-boshiyu@alibaba-inc.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-03-31Linux 6.9-rc2v6.9-rc2Linus Torvalds
2024-03-31Merge tag 'kbuild-fixes-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Deduplicate Kconfig entries for CONFIG_CXL_PMU - Fix unselectable choice entry in MIPS Kconfig, and forbid this structure - Remove unused include/asm-generic/export.h - Fix a NULL pointer dereference bug in modpost - Enable -Woverride-init warning consistently with W=1 - Drop KCSAN flags from *.mod.c files * tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: Fix typo HEIGTH to HEIGHT Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries kbuild: make -Woverride-init warnings more consistent modpost: do not make find_tosym() return NULL export.h: remove include/asm-generic/export.h kconfig: do not reparent the menu inside a choice block MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
2024-03-31Merge tag 'edac_urgent_for_v6.9_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Fix more issues in the AMD FMPM driver * tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: RAS: Avoid build errors when CONFIG_DEBUG_FS=n RAS/AMD/FMPM: Safely handle saved records of various sizes RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
2024-03-31Merge tag 'irq_urgent_for_v6.9_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix an unused function warning on irqchip/irq-armada-370-xp - Fix the IRQ sharing with pinctrl-amd and ACPI OSL * tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-370-xp: Suppress unused-function warning genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
2024-03-31Merge tag 'perf_urgent_for_v6.9_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Borislav Petkov: - Define the correct set of default hw events on AMD Zen4 - Use the correct stalled cycles PMCs on AMD Zen2 and newer - Fix detection of the LBR freeze feature on AMD * tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later perf/x86/amd/lbr: Use freeze based on availability x86/cpufeatures: Add new word for scattered features
2024-03-31Merge tag 'timers_urgent_for_v6.9_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers update from Borislav Petkov: - Volunteer in Anna-Maria and Frederic as timers co-maintainers so that tglx can relax more :-P * tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Add co-maintainers for time[rs]
2024-03-31Merge tag 'objtool_urgent_for_v6.9_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: - Fix a format specifier build error in objtool during an x32 build * tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix compile failure when using the x32 compiler