summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)Author
2023-02-17IB/hfi1: Fix sdma.h tx->num_descs off-by-one errorsPatrick Kelsey
Fix three sources of error involving struct sdma_txreq.num_descs. When _extend_sdma_tx_descs() extends the descriptor array, it uses the value of tx->num_descs to determine how many existing entries from the tx's original, internal descriptor array to copy to the newly allocated one. As this value was incremented before the call, the copy loop will access one entry past the internal descriptor array, copying its contents into the corresponding slot in the new array. If the call to _extend_sdma_tx_descs() fails, _pad_smda_tx_descs() then invokes __sdma_tx_clean() which uses the value of tx->num_desc to drive a loop that unmaps all descriptor entries in use. As this value was incremented before the call, the unmap loop will invoke sdma_unmap_desc() on a descriptor entry whose contents consist of whatever random data was copied into it during (1), leading to cascading further calls into the kernel and driver using arbitrary data. _sdma_close_tx() was using tx->num_descs instead of tx->num_descs - 1. Fix all of the above by: - Only increment .num_descs after .descp is extended. - Use .num_descs - 1 instead of .num_descs for last .descp entry. Fixes: f4d26d81ad7f ("staging/rdma/hfi1: Add coalescing support for SDMA TX descriptors") Link: https://lore.kernel.org/r/167656658879.2223096.10026561343022570690.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17IB/hfi1: Fix math bugs in hfi1_can_pin_pages()Patrick Kelsey
Fix arithmetic and logic errors in hfi1_can_pin_pages() that would allow hfi1 to attempt pinning pages in cases where it should not because of resource limits or lack of required capability. Fixes: 2c97ce4f3c29 ("IB/hfi1: Add pin query function") Link: https://lore.kernel.org/r/167656658362.2223096.10954762619837718026.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17RDMA/irdma: Add support for dmabuf pin memory regionsZhu Yanjun
This is a followup to the EFA dmabuf[1]. Irdma driver currently does not support on-demand-paging(ODP). So it uses habanalabs as the dmabuf exporter, and irdma as the importer to allow for peer2peer access through libibverbs. In this commit, the function ib_umem_dmabuf_get_pinned() is used. This function is introduced in EFA dmabuf[1] which allows the driver to get a dmabuf umem which is pinned and does not require move_notify callback implementation. The returned umem is pinned and DMA mapped like standard cpu umems, and is released through ib_umem_release(). [1]https://lore.kernel.org/lkml/20211007114018.GD2688930@ziepe.ca/t/ Link: https://lore.kernel.org/r/20230217011425.498847-1-yanjun.zhu@intel.com Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17Merge mlx5-next into rdma.git for-nextJason Gunthorpe
Synchronize the shared mlx5 branch with net: - From Jiri: fixe a deadlock in mlx5_ib's netdev notifier unregister. - From Mark and Patrisious: add IPsec RoCEv2 support. - From Or: Rely on firmware to get special mkeys * branch mlx5-next: RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys net/mlx5: Configure IPsec steering for egress RoCEv2 traffic net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic net/mlx5: Add IPSec priorities in RDMA namespaces net/mlx5: Implement new destination type TABLE_TYPE net/mlx5: Introduce new destination type TABLE_TYPE RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17RDMA/mlx5: Use query_special_contexts for mkeysOr Har-Toov
Use query_sepcial_contexts to get the correct value of mkeys such as null_mkey, terminate_scatter_list_mkey and dump_fill_mkey, as FW will change them in certain configurations. Link: https://lore.kernel.org/r/000236f0a9487d48809f87bcc3620a3964b2d3d3.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17net/mlx5: Change define name for 0x100 lkey valueOr Har-Toov
Change define of 0x100 lkey value from MLX5_INVALID_LKEY to be MLX5_TERMINATE_SCATTER_LIST_LKEY as 0x100 is the value of terminate_scatter_list_mkey. Link: https://lore.kernel.org/r/3a116dc3fbae4cb6b76a63d27d418830b06ade0c.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Some of the devlink bits were tricky, but I think I got it right. Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-16RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering ↵Long Li
memory on first packet When registering memory in a large chunk that doesn't fit into a single PF message, the PF may return GDMA_STATUS_MORE_ENTRIES on the first message if there are more messages needed for registering more chunks. Fix the VF to make it process the correct return code. Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Link: https://lore.kernel.org/r/1676507522-21018-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-15iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()Dan Carpenter
This condition needs to match the previous "if (epcp->state == LISTEN) {" exactly to avoid a NULL dereference of either "listen_ep" or "ep". The problem is that "epcp" has been re-assigned so just testing "if (epcp->state == LISTEN) {" a second time is not sufficient. Fixes: 116aeb887371 ("iw_cxgb4: provide detailed provider-specific CM_ID information") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y+usKuWIKr4dimZh@kili Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15RDMA/mlx5: Use rdma_umem_for_each_dma_block()Jason Gunthorpe
Replace an open coding of rdma_umem_for_each_dma_block() with the proper function. Fixes: b3d47ebd4908 ("RDMA/mlx5: Use mlx5_umr_post_send_wait() to update MR pas") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-c13a5b88359b+556d0-mlx5_umem_block_jgg@nvidia.com Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-14net/mlx5: Lag, Add single RDMA device in multiport modeMark Bloch
In MultiPort E-Switch mode a single RDMA is created. This device has multiple RDMA ports that represent the uplink ports that are connected to the E-Switch. Account for this when creating the RDMA device so it has an additional port for the non native uplink. As a side effect of this patch, use shared fdb in multiport eswitch mode. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-08Merge tag 'mlx5-next-netdev-deadlock' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next-netdev-deadlock This series from Jiri solves a deadlock when removing a network namespace with mlx5 devlink instance being in it. The deadlock is between: 1) mlx5_ib->unregister_netdevice_notifier() AND 2) mlx5_core->devlink_reload->cleanup_net() To slove this introduced mlx5 netdev added/removed events to track uplink netdev to be used for register_netdevice_notifier_dev_net() purposes. * tag 'mlx5-next-netdev-deadlock' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling net/mlx5: Introduce CQE error syndrome ==================== Link: https://lore.kernel.org/r/20230208005626.72930-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregisterJiri Pirko
When removing a network namespace with mlx5 devlink instance being in it, following callchain is performed: cleanup_net (takes down_read(&pernet_ops_rwsem) devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one_devl_locked() mlx5_detach_device() del_adev() mlx5r_remove() __mlx5_ib_remove() mlx5_ib_roce_cleanup() mlx5_remove_netdev_notifier() unregister_netdevice_notifier (takes down_write(&pernet_ops_rwsem) This deadlocks. Resolve this by converting to register_netdevice_notifier_dev_net() which does not take pernet_ops_rwsem and moves the notifier block around according to netdev it takes as arg. Use previously introduced netdev added/removed events to track uplink netdev to be used for register_netdevice_notifier_dev_net() purposes. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08RDMA/irdma: Cap MSIX used to online CPUs + 1Mustafa Ismail
The irdma driver can use a maximum number of msix vectors equal to num_online_cpus() + 1 and the kernel warning stack below is shown if that number is exceeded. The kernel throws a warning as the driver tries to update the affinity hint with a CPU mask greater than the max CPU IDs. Fix this by capping the MSIX vectors to num_online_cpus() + 1. WARNING: CPU: 7 PID: 23655 at include/linux/cpumask.h:106 irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma] RIP: 0010:irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma] Call Trace: irdma_rt_init_hw+0xa62/0x1290 [irdma] ? irdma_alloc_local_mac_entry+0x1a0/0x1a0 [irdma] ? __is_kernel_percpu_address+0x63/0x310 ? rcu_read_lock_held_common+0xe/0xb0 ? irdma_lan_unregister_qset+0x280/0x280 [irdma] ? irdma_request_reset+0x80/0x80 [irdma] ? ice_get_qos_params+0x84/0x390 [ice] irdma_probe+0xa40/0xfc0 [irdma] ? rcu_read_lock_bh_held+0xd0/0xd0 ? irdma_remove+0x140/0x140 [irdma] ? rcu_read_lock_sched_held+0x62/0xe0 ? down_write+0x187/0x3d0 ? auxiliary_match_id+0xf0/0x1a0 ? irdma_remove+0x140/0x140 [irdma] auxiliary_bus_probe+0xa6/0x100 __driver_probe_device+0x4a4/0xd50 ? __device_attach_driver+0x2c0/0x2c0 driver_probe_device+0x4a/0x110 __driver_attach+0x1aa/0x350 bus_for_each_dev+0x11d/0x1b0 ? subsys_dev_iter_init+0xe0/0xe0 bus_add_driver+0x3b1/0x610 driver_register+0x18e/0x410 ? 0xffffffffc0b88000 irdma_init_module+0x50/0xaa [irdma] do_one_initcall+0x103/0x5f0 ? perf_trace_initcall_level+0x420/0x420 ? do_init_module+0x4e/0x700 ? __kasan_kmalloc+0x7d/0xa0 ? kmem_cache_alloc_trace+0x188/0x2b0 ? kasan_unpoison+0x21/0x50 do_init_module+0x1d1/0x700 load_module+0x3867/0x5260 ? layout_and_allocate+0x3990/0x3990 ? rcu_read_lock_held_common+0xe/0xb0 ? rcu_read_lock_sched_held+0x62/0xe0 ? rcu_read_lock_bh_held+0xd0/0xd0 ? __vmalloc_node_range+0x46b/0x890 ? lock_release+0x5c8/0xba0 ? alloc_vm_area+0x120/0x120 ? selinux_kernel_module_from_file+0x2a5/0x300 ? __inode_security_revalidate+0xf0/0xf0 ? __do_sys_init_module+0x1db/0x260 __do_sys_init_module+0x1db/0x260 ? load_module+0x5260/0x5260 ? do_syscall_64+0x22/0x450 do_syscall_64+0xa5/0x450 entry_SYSCALL_64_after_hwframe+0x66/0xdb Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Link: https://lore.kernel.org/r/20230207201938.1329-1-sindhu.devale@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-07RDMA/mlx5: Check reg_create() create for errorsDan Carpenter
The reg_create() can fail. Check for errors before dereferencing it. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y+ERYy4wN0LsKsm+@kili Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-06Merge branch 'aux-bus-v11' of https://github.com/ajitkhaparde1/linuxJakub Kicinski
Ajit Khaparde says: ==================== bnxt: Add Auxiliary driver support Add auxiliary device driver for Broadcom devices. The bnxt_en driver will register and initialize an aux device if RDMA is enabled in the underlying device. The bnxt_re driver will then probe and initialize the RoCE interfaces with the infiniband stack. We got rid of the bnxt_en_ops which the bnxt_re driver used to communicate with bnxt_en. Similarly We have tried to clean up most of the bnxt_ulp_ops. In most of the cases we used the functions and entry points provided by the auxiliary bus driver framework. And now these are the minimal functions needed to support the functionality. We will try to work on getting rid of the remaining if we find any other viable option in future. * 'aux-bus-v11' of https://github.com/ajitkhaparde1/linux: bnxt_en: Remove runtime interrupt vector allocation RDMA/bnxt_re: Remove the sriov config callback bnxt_en: Remove struct bnxt access from RoCE driver bnxt_en: Use auxiliary bus calls over proprietary calls bnxt_en: Use direct API instead of indirection bnxt_en: Remove usage of ulp_id RDMA/bnxt_re: Use auxiliary driver interface bnxt_en: Add auxiliary driver support ==================== Link: https://lore.kernel.org/r/20230202033809.3989-1-ajit.khaparde@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-06RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()Nikita Zhandarovich
If get_ep_from_tid() fails to lookup non-NULL value for ep, ep is dereferenced later regardless of whether it is empty. This patch adds a simple sanity check to fix the issue. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 944661dd97f4 ("RDMA/iw_cxgb4: atomically lookup ep and get a reference") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230202184850.29882-1-n.zhandarovich@fintech.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-06RDMA/mlx5: Remove impossible check of mkey cache cleanup failureLeon Romanovsky
mlx5_mkey_cache_cleanup() can't fail and can be changed to be void. Link: https://lore.kernel.org/r/1acd9528995d083114e7dec2a2afc59436406583.1675328463.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06RDMA/mlx5: Fix MR cache debugfs error in IB representors modeLeon Romanovsky
Block MR cache debugfs creation for IB representor flow as MR cache shouldn't be used at all in that mode. As part of this change, add missing debugfs cleanup in error path too. This change fixes the following debugfs errors: bond0: (slave enp8s0f1): Enslaving as a backup interface with an up link mlx5_core 0000:08:00.0: lag map: port 1:1 port 2:1 mlx5_core 0000:08:00.0: shared_fdb:1 mode:queue_affinity mlx5_core 0000:08:00.0: Operation mode is single FDB debugfs: Directory '2' with parent '/' already present! ... debugfs: Directory '22' with parent '/' already present! Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/482a78c54acbcfa1742a0e06a452546428900ffa.1675328463.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06RDMA/mana_ib: Prevent array underflow in mana_ib_create_qp_raw()Dan Carpenter
The "port" comes from the user and if it is zero then the: ndev = mc->ports[port - 1]; assignment does an out of bounds read. I have changed the if statement to fix this and to mirror how it is done in mana_ib_create_qp_rss(). Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y8/3Vn8qx00kE9Kk@kili Acked-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-02RDMA/cxgb4: add null-ptr-check after ip_dev_find()Nikita Zhandarovich
ip_dev_find() may return NULL and assign it to pdev which is dereferenced later. Fix this by checking the return value of ip_dev_find() for NULL similar to the way it is done with other instances of said function. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1cab775c3e75 ("RDMA/cxgb4: Fix LE hash collision bug for passive open connection") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230201172103.17261-1-n.zhandarovich@fintech.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-01bnxt_en: Remove runtime interrupt vector allocationAjit Khaparde
Modified the bnxt_en code to create and pre-configure RDMA devices with the right MSI-X vector count for the ROCE driver to use. This is to align the ROCE driver to the auxiliary device model which will simply bind the driver without getting into PCI-related handling. All PCI-related logic will now be in the bnxt_en driver. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01RDMA/bnxt_re: Remove the sriov config callbackAjit Khaparde
Remove the SRIOV config callback which the bnxt_en was calling to reconfigure the chip resources for a PF device when VFs are created. The code is now modified to provision the VF resources based on the total VF count instead of the actual VF count. This allows the SRIOV config callback to be removed from the list of ulp_ops. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Remove struct bnxt access from RoCE driverHongguang Gao
Decouple RoCE driver from directly accessing L2's private bnxt structure. Move the fields needed by RoCE driver into bnxt_en_dev. They'll be passed to RoCE driver by bnxt_rdma_aux_device_add() function. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Use auxiliary bus calls over proprietary callsAjit Khaparde
Wherever possible use the function ops provided by auxiliary bus instead of using proprietary ops. Defined bnxt_re_suspend and bnxt_re_resume calls which can be invoked by the bnxt_en driver instead of the ULP stop/start calls. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Use direct API instead of indirectionAjit Khaparde
For a single ULP user there is no need for complicating function indirection calls. Remove all this complexity in favour of direct function calls exported by the bnxt_en driver. This allows to simplify the code greatly. Also remove unused ulp_async_notifier. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Remove usage of ulp_idAjit Khaparde
Since the driver continues to use the single ULP model, the extra complexity and indirection is unnecessary. Remove the usage of ulp_id from the code. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01RDMA/bnxt_re: Use auxiliary driver interfaceAjit Khaparde
Use auxiliary driver interface for driver load, unload ROCE driver. The driver does not need to register the interface using the netdev notifier anymore. Removed the bnxt_re_dev_list which is not needed. Currently probe, remove and shutdown ops have been implemented for the auxiliary device. Also remove exccessve validation checks for rdev. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-01-31IB/hfi1: Assign npages earlierDean Luick
Improve code clarity and enable earlier use of tidbuf->npages by moving its assignment to structure creation time. Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/167329104884.1472990.4639750192433251493.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-30RDMA/usnic: use iommu_map_atomic() under spin_lock()Yang Yingliang
usnic_uiom_map_sorted_intervals() is called under spin_lock(), iommu_map() might sleep, use iommu_map_atomic() to avoid potential sleep in atomic context. Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20230129093757.637354-1-yangyingliang@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-29RDMA/irdma: Fix potential NULL-ptr-dereferenceNikita Zhandarovich
in_dev_get() can return NULL which will cause a failure once idev is dereferenced in in_dev_for_each_ifa_rtnl(). This patch adds a check for NULL value in idev beforehand. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230126185230.62464-1-n.zhandarovich@fintech.ru Reviewed-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27RDMA/mlx5: Add work to remove temporary entries from the cacheMichael Guralnik
The non-cache mkeys are stored in the cache only to shorten restarting application time. Don't store them longer than needed. Configure cache entries that store non-cache MRs as temporary entries. If 30 seconds have passed and no user reclaimed the temporarily cached mkeys, an asynchronous work will destroy the mkeys entries. Link: https://lore.kernel.org/r/20230125222807.6921-7-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flowMichael Guralnik
Currently, when dereging an MR, if the mkey doesn't belong to a cache entry, it will be destroyed. As a result, the restart of applications with many non-cached mkeys is not efficient since all the mkeys are destroyed and then recreated. This process takes a long time (for 100,000 MRs, it is ~20 seconds for dereg and ~28 seconds for re-reg). To shorten the restart runtime, insert all cacheable mkeys to the cache. If there is no fitting entry to the mkey properties, create a temporary entry that fits it. After a predetermined timeout, the cache entries will shrink to the initial high limit. The mkeys will still be in the cache when consuming them again after an application restart. Therefore, the registration will be much faster (for 100,000 MRs, it is ~4 seconds for dereg and ~5 seconds for re-reg). The temporary cache entries created to store the non-cache mkeys are not exposed through sysfs like the default cache entries. Link: https://lore.kernel.org/r/20230125222807.6921-6-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Introduce mlx5r_cache_rb_keyMichael Guralnik
Switch from using the mkey order to using the new struct as the key to the RB tree of cache entries. The key is all the mkey properties that UMR operations can't modify. Using this key to define the cache entries and to search and create cache mkeys. Link: https://lore.kernel.org/r/20230125222807.6921-5-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Change the cache structure to an RB-treeMichael Guralnik
Currently, the cache structure is a static linear array. Therefore, his size is limited to the number of entries in it and is not expandable. The entries are dedicated to mkeys of size 2^x and no access_flags. Mkeys with different properties are not cacheable. In this patch, we change the cache structure to an RB-tree. This will allow to extend the cache to support more entries with different mkey properties. Link: https://lore.kernel.org/r/20230125222807.6921-4-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Remove implicit ODP cache entryAharon Landau
Implicit ODP mkey doesn't have unique properties. It shares the same properties as the order 18 cache entry. There is no need to devote a special entry for that. Link: https://lore.kernel.org/r/20230125222807.6921-3-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Don't keep umrable 'page_shift' in cache entriesAharon Landau
mkc.log_page_size can be changed using UMR. Therefore, don't treat it as a cache entry property. Removing it from struct mlx5_cache_ent. All cache mkeys will be created with default PAGE_SHIFT, and updated with the needed page_shift using UMR when passing them to a user. Link: https://lore.kernel.org/r/20230125222807.6921-2-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26RDMA/irdma: Split CQ handler into irdma_reg_user_mr_type_cqZhu Yanjun
Split the source codes related with CQ handling into a new function. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230116193502.66540-5-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-26RDMA/irdma: Split QP handler into irdma_reg_user_mr_type_qpZhu Yanjun
Split the source codes related with QP handling into a new function. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230116193502.66540-4-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-26RDMA/irdma: Split mr alloc and free into new functionsZhu Yanjun
In the function irdma_reg_user_mr, the mr allocation and free will be used by other functions. As such, the source codes related with mr allocation and free are split into the new functions. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230116193502.66540-3-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-26RDMA/irdma: Split MEM handler into irdma_reg_user_mr_type_memZhu Yanjun
The source codes related with IRDMA_MEMREG_TYPE_MEM are split into a new function irdma_reg_user_mr_type_mem. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230116193502.66540-2-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-25iommu: Add a gfp parameter to iommu_map()Jason Gunthorpe
The internal mechanisms support this, but instead of exposting the gfp to the caller it wrappers it into iommu_map() and iommu_map_atomic() Fix this instead of adding more variants for GFP_KERNEL_ACCOUNT. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/1-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-23net/sock: Introduce trace_sk_data_ready()Peilin Ye
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-22IB/hfi1: Restore allocated resources on failed copyoutDean Luick
Fix a resource leak if an error occurs. Fixes: f404ca4c7ea8 ("IB/hfi1: Refactor hfi_user_exp_rcv_setup() IOCTL") Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/167354736291.2132367.10894218740150168180.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-15RDMA/erdma: Replace zero-length arrays with flexible-array membersGustavo A. R. Silva
Zero-length arrays are deprecated[1] and we are moving towards adopting C99 flexible-array members instead. So, replace zero-length arrays, in a couple of structures, with flex-array members. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [2]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays [1] Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2] Link: https://github.com/KSPP/linux/issues/78 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/Y7zCBqwC1LtabRJ9@work Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-15RDMA/mlx5: Print error syndrome in case of fatal QP errorsPatrisious Haddad
Print syndromes in case of fatal QP events. This is helpful for upper level debugging, as there maybe no CQEs. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/edc794f622a33e4ee12d7f5d218d1a59aa7c6af5.1672821186.git.leonro@nvidia.com Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-15RDMA/mlx: Calling qp event handler in workqueue contextMark Zhang
Move the call of qp event handler from atomic to workqueue context, so that the handler is able to block. This is needed by following patches. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/0cd17b8331e445f03942f4bb28d447f24ac5669d.1672821186.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-15Merge branch 'mlx5-next' into HEADLeon Romanovsky
Bring HW bits for mlx5 QP events series. Link: https://lore.kernel.org/all/cover.1672821186.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>