summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5/main.c
AgeCommit message (Collapse)Author
2023-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Quite a small cycle this time, even with the rc8. I suppose everyone went to sleep over xmas. - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits) IB/mlx5: Extend debug control for CC parameters IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix math bugs in hfi1_can_pin_pages() RDMA/irdma: Add support for dmabuf pin memory regions RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys RDMA/rxe: Fix missing memory barriers in rxe_queue.h RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet RDMA/rxe: Remove rxe_alloc() RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size Subject: RDMA/rxe: Handle zero length rdma iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry() RDMA/mlx5: Use rdma_umem_for_each_dma_block() RDMA/umem: Remove unused 'work' member from struct ib_umem RDMA/irdma: Cap MSIX used to online CPUs + 1 RDMA/mlx5: Check reg_create() create for errors RDMA/restrack: Correct spelling RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish() ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-17Merge mlx5-next into rdma.git for-nextJason Gunthorpe
Synchronize the shared mlx5 branch with net: - From Jiri: fixe a deadlock in mlx5_ib's netdev notifier unregister. - From Mark and Patrisious: add IPsec RoCEv2 support. - From Or: Rely on firmware to get special mkeys * branch mlx5-next: RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys net/mlx5: Configure IPsec steering for egress RoCEv2 traffic net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic net/mlx5: Add IPSec priorities in RDMA namespaces net/mlx5: Implement new destination type TABLE_TYPE net/mlx5: Introduce new destination type TABLE_TYPE RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17RDMA/mlx5: Use query_special_contexts for mkeysOr Har-Toov
Use query_sepcial_contexts to get the correct value of mkeys such as null_mkey, terminate_scatter_list_mkey and dump_fill_mkey, as FW will change them in certain configurations. Link: https://lore.kernel.org/r/000236f0a9487d48809f87bcc3620a3964b2d3d3.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-08RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregisterJiri Pirko
When removing a network namespace with mlx5 devlink instance being in it, following callchain is performed: cleanup_net (takes down_read(&pernet_ops_rwsem) devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one_devl_locked() mlx5_detach_device() del_adev() mlx5r_remove() __mlx5_ib_remove() mlx5_ib_roce_cleanup() mlx5_remove_netdev_notifier() unregister_netdevice_notifier (takes down_write(&pernet_ops_rwsem) This deadlocks. Resolve this by converting to register_netdevice_notifier_dev_net() which does not take pernet_ops_rwsem and moves the notifier block around according to netdev it takes as arg. Use previously introduced netdev added/removed events to track uplink netdev to be used for register_netdevice_notifier_dev_net() purposes. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-06RDMA/mlx5: Remove impossible check of mkey cache cleanup failureLeon Romanovsky
mlx5_mkey_cache_cleanup() can't fail and can be changed to be void. Link: https://lore.kernel.org/r/1acd9528995d083114e7dec2a2afc59436406583.1675328463.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-01-15RDMA/mlx: Calling qp event handler in workqueue contextMark Zhang
Move the call of qp event handler from atomic to workqueue context, so that the handler is able to block. This is needed by following patches. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/0cd17b8331e445f03942f4bb28d447f24ac5669d.1672821186.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-06Merge tag 'v6.0' into rdma.git for-nextJason Gunthorpe
Trvial merge conflicts against rdma.git for-rc resolved matching linux-next: drivers/infiniband/hw/hns/hns_roce_hw_v2.c drivers/infiniband/hw/hns/hns_roce_main.c https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-05RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profileMaher Sanalla
When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23IB/mlx5: Remove duplicate header inclusion related to ODPDaisuke Matsuda
rdma/ib_umem.h and rdma/ib_verbs.h are included by rdma/ib_umem_odp.h. This patch removes the redundant entries. Link: https://lore.kernel.org/r/20220823025131.862811-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16RDMA/mlx5: Use the proper number of portsMark Bloch
The cited commit allowed the driver to operate over HCAs that have 4 physical ports. Use the number of ports of the RDMA device in the for loop instead of using the struct size. Fixes: 4cd14d44b11d ("net/mlx5: Support devices with more than 2 ports") Link: https://lore.kernel.org/r/a54a56c2ede16044a29d119209b35189c662ac72.1659944855.git.leonro@nvidia.com Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16RDMA/mlx5: Don't compare mkey tags in DEVX indirect mkeyAharon Landau
According to the ib spec: If the CI supports the Base Memory Management Extensions defined in this specification, the L_Key format must consist of: 24 bit index in the most significant bits of the R_Key, and 8 bit key in the least significant bits of the R_Key Through a successful Allocate L_Key verb invocation, the CI must let the consumer own the key portion of the returned R_Key Therefore, when creating a mkey using DEVX, the consumer is allowed to change the key part. The kernel should compare only the index part of a R_Key to determine equality with another R_Key. Adding capability in order not to break backward compatibility. Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Link: https://lore.kernel.org/r/3d669aacea85a3a15c3b3b953b3eaba3f80ef9be.1659255945.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-27RDMA/mlx5: Rename the mkey cache variables and functionsAharon Landau
After replacing the MR cache with an Mkey cache, rename the variables and functions to fit the new meaning. Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Small collection of incremental improvement patches: - Minor code cleanup patches, comment improvements, etc from static tools - Clean the some of the kernel caps, reducing the historical stealth uAPI leftovers - Bug fixes and minor changes for rdmavt, hns, rxe, irdma - Remove unimplemented cruft from rxe - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer - flush_workqueue(system_unbound_wq) removal - Ensure rxe waits for objects to be unused before allowing the core to free them - Several rc quality bug fixes for hfi1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits) RDMA/rtrs-clt: Fix one kernel-doc comment RDMA/hfi1: Remove all traces of diagpkt support RDMA/hfi1: Consolidate software versions RDMA/hfi1: Remove pointless driver version RDMA/hfi1: Fix potential integer multiplication overflow errors RDMA/hfi1: Prevent panic when SDMA is disabled RDMA/hfi1: Prevent use of lock before it is initialized RDMA/rxe: Fix an error handling path in rxe_get_mcg() IB/core: Fix typo in comment RDMA/core: Fix typo in comment IB/hf1: Fix typo in comment IB/qib: Fix typo in comment IB/iser: Fix typo in comment RDMA/mlx4: Avoid flush_scheduled_work() usage IB/isert: Avoid flush_scheduled_work() usage RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr() RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq() RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx() RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx() RDMA/irdma: Add SW mechanism to generate completions on error ...
2022-05-09net/mlx5: Lag, expose number of lag portsMark Bloch
Downstream patches will add support for hardware lag with more than 2 ports. Add a way for users to query the number of lag ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-04-25RDMA/mlx5: Move init and cleanup of UMR to umr.cAharon Landau
The first patch in a series to split UMR logic to a dedicated file. As a start, move the init and cleanup of UMR resources to umr.c. Link: https://lore.kernel.org/r/849e632dd1945a2534712a320cc5779f2149ba96.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-12Merge branch 'mlx5-next' of ↵Jason Gunthorpe
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== Mellanox shared branch that includes: * Removal of FPGA TLS code https://lore.kernel.org/all/cover.1649073691.git.leonro@nvidia.com Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code, clean build config options and delete useless kTLS vs. TLS separation. [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf * Removal of FPGA IPsec code https://lore.kernel.org/all/cover.1649232994.git.leonro@nvidia.com Together with FPGA TLS, the IPsec went to EOL state in the November of 2019 [1]. Exactly like FPGA TLS, no active customers exist for this upstream code and all the complexity around that area can be deleted. [2] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf * Fix to undefined behavior from Borislav https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de ==================== * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Remove not-implemented IPsec capabilities net/mlx5: Remove ipsec_ops function table net/mlx5: Reduce kconfig complexity while building crypto support net/mlx5: Move IPsec file to relevant directory net/mlx5: Remove not-needed IPsec config net/mlx5: Align flow steering allocation namespace to common style net/mlx5: Unify device IPsec capabilities check net/mlx5: Remove useless IPsec device checks net/mlx5: Remove ipsec vs. ipsec offload file separation RDMA/core: Delete IPsec flow action logic from the core RDMA/mlx5: Drop crypto flow steering API RDMA/mlx5: Delete never supported IPsec flow action net/mlx5: Remove FPGA ipsec specific statistics net/mlx5: Remove XFRM no_trailer flag net/mlx5: Remove not-used IDA field from IPsec struct net/mlx5: Delete metadata handling logic net/mlx5_fpga: Drop INNOVA IPsec support IB/mlx5: Fix undefined behavior due to shift overflowing the constant net/mlx5: Cleanup kTLS function names and their exposure net/mlx5: Remove tls vs. ktls separation as it is the same net/mlx5: Remove indirection in TLS build net/mlx5: Reliably return TLS device capabilities net/mlx5_fpga: Drop INNOVA TLS support Link: https://lore.kernel.org/r/20220409055303.1223644-1-leon@kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-09RDMA/mlx5: Drop crypto flow steering APILeon Romanovsky
The mlx5 flow steering crypto API was intended to be used in FPGA devices, which is not supported for years already. The removal of mlx5 crypto FPGA code together with inability to configure encryption keys makes the low steering API completely unusable. So delete the code, so any ESP flow steering requests will fail with not supported error, as it is happening now anyway as no device support this type of API. Link: https://lore.kernel.org/r/634a5face7734381463d809bfb89850f6998deac.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09RDMA/mlx5: Delete never supported IPsec flow actionLeon Romanovsky
The IPSEC_REQUIRED_METADATA capability bit is never set, and can be safely removed from the flow action flags. Link: https://lore.kernel.org/r/697cd60bd5c9b6a004c449c1a41c2798fac844ff.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-06RDMA: Split kernel-only global device caps from uverbs device capsJason Gunthorpe
Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part of the uapi. This limits the device_cap_flags to being the same bitmap that will be copied to userspace. This cleanly splits out the uverbs flags from the kernel flags to avoid confusion in the flags bitmap. Add some short comments describing which each of the kernel flags is connected to. Remove unused kernel flags. Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-09net/mlx5: Move debugfs entries to separate structMoshe Shemesh
Move the debugfs entry pointers under priv to their own struct. Add get function for device debugfs root. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22RDMA/irdma: Use auxiliary_device driver data helpersDavid E. Box
Use auxiliary_get_drvdata and auxiliary_set_drvdata helpers. Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20211221235852.323752-2-david.e.box@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-28IB/mlx5: Enable UAR to have DevX UIDMeir Lichtinger
UID field was added to alloc_uar and dealloc_uar PRM command, to specify DevX UID for UAR. This change enables firmware validating user access to its own UAR resources. For the kernel allocated UARs the UID will stay 0 as of today. Signed-off-by: Meir Lichtinger <meirl@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-09-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This is quite a small cycle, no major series stands out. The HNS and rxe drivers saw the most activity this cycle, with rxe being broken for a good chunk of time. The significant deleted line count is due to a SPDX cleanup series. Summary: - Various cleanup and small features for rtrs - kmap_local_page() conversions - Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns - Cache the IB subnet prefix - Rework how CRC is calcuated in rxe - Clean reference counting in iwpm's netlink - Pull object allocation and lifecycle for user QPs to the uverbs core code - Several small hns features and continued general code cleanups - Fix the scatterlist confusion of orig_nents/nents introduced in an earlier patch creating the append operation" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (90 commits) RDMA/mlx5: Relax DCS QP creation checks RDMA/hns: Delete unnecessary blank lines. RDMA/hns: Encapsulate the qp db as a function RDMA/hns: Adjust the order in which irq are requested and enabled RDMA/hns: Remove RST2RST error prints for hw v1 RDMA/hns: Remove dqpn filling when modify qp from Init to Init RDMA/hns: Fix QP's resp incomplete assignment RDMA/hns: Fix query destination qpn RDMA/hfi1: Convert to SPDX identifier IB/rdmavt: Convert to SPDX identifier RDMA/hns: Bugfix for incorrect association between dip_idx and dgid RDMA/hns: Bugfix for the missing assignment for dip_idx RDMA/hns: Bugfix for data type of dip_idx RDMA/hns: Fix incorrect lsn field RDMA/irdma: Remove the repeated declaration RDMA/core/sa_query: Retry SA queries RDMA: Use the sg_table directly and remove the opencoded version from umem lib/scatterlist: Fix wrong update of orig_nents lib/scatterlist: Provide a dedicated function to support table append RDMA/hns: Delete unused hns bitmap interface ...
2021-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-22Merge branch 'mlx5-next' of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== This pulls mlx5-next branch which includes patches already reviewed on net-next and rdma mailing lists. 1) mlx5 single E-Switch FDB for lag 2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq 3) Add DCS caps & fields support We need this in net-next as multiple features are dependent on the single FDB feature. ==================== Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> * mellanox/mlx5-next: net/mlx5: Lag, Create shared FDB when in switchdev mode net/mlx5: E-Switch, add logic to enable shared FDB net/mlx5: Lag, move lag destruction to a workqueue net/mlx5: Lag, properly lock eswitch if needed net/mlx5: Add send to vport rules on paired device net/mlx5: E-Switch, Add event callback for representors net/mlx5e: Use shared mappings for restoring from metadata net/mlx5e: Add an option to create a shared mapping net/mlx5: E-Switch, set flow source for send to uplink rule RDMA/mlx5: Add shared FDB support {net, RDMA}/mlx5: Extend send to vport rules RDMA/mlx5: Fill port info based on the relevant eswitch net/mlx5: Lag, add initial logic for shared FDB net/mlx5: Return mdev from eswitch IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
2021-08-19RDMA/mlx5: Fix crash when unbind multiport slaveMaor Gottlieb
Fix the below crash when deleting a slave from the unaffiliated list twice. First time when the slave is bound to the master and the second when the slave is unloaded. Fix it by checking if slave is unaffiliated (doesn't have ib device) before removing from the list. RIP: 0010:mlx5r_mp_remove+0x4e/0xa0 [mlx5_ib] Call Trace: auxiliary_bus_remove+0x18/0x30 __device_release_driver+0x177/x220 device_release_driver+0x24/0x30 bus_remove_device+0xd8/0x140 device_del+0x18a/0x3e0 mlx5_rescan_drivers_locked+0xa9/0x210 [mlx5_core] mlx5_unregister_device+0x34/0x60 [mlx5_core] mlx5_uninit_one+0x32/0x100 [mlx5_core] remove_one+0x6e/0xe0 [mlx5_core] pci_device_remove+0x36/0xa0 __device_release_driver+0x177/0x220 device_driver_detach+0x3c/0xa0 unbind_store+0x113/0x130 kernfs_fop_write_iter+0x110/0x1a0 new_sync_write+0x116/0x1a0 vfs_write+0x1ba/0x260 ksys_write+0x5f/0xe0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 93f8244431ad ("RDMA/mlx5: Convert mlx5_ib to use auxiliary bus") Link: https://lore.kernel.org/r/17ec98989b0ba88f7adfbad68eb20bce8d567b44.1628587493.git.leonro@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-05RDMA/mlx5: Add shared FDB supportMark Bloch
Shared FDB allows to create a single RDMA device that holds representors from both eswitches. As shared FDB is only active when both uplink representors are enslaved there is a single RDMA port that represents both uplinks. The number of ports is the number of vports on both eswitches minus one as we only need 1 port for both uplinks. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-03RDMA/mlx5: Drop in-driver verbs object creationsLeon Romanovsky
There is no real value in bypassing IB/core APIs for creating standard objects with standard types. The open-coded variant didn't have any restrack task management calls and caused to such objects to be not present when running rdmatoool. Link: https://lore.kernel.org/r/f745590e5fb7d56f90fdb25f64ee3983ba17e1e4.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA: Globally allocate and release QP memoryLeon Romanovsky
Convert QP object to follow IB/core general allocation scheme. That change allows us to make sure that restrack properly kref the memory. Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com Reviewed-by: Gal Pressman <galpress@amazon.com> #efa Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/mlx5: Rework custom driver QP type creationLeon Romanovsky
Starting from commit 2b1f747071c5 ("RDMA/core: Allow drivers to disable restrack DB") the restrack is able to handle non-standard QP types either. That change allows us to rewrite custom QP calls to their IB/core counterparts, so we will use general QP creation flow even for the driver QP types. Link: https://lore.kernel.org/r/51682ab82298748941f38bd23ee3bf77ef1cab7b.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/mlx5: Delete device resource mutex that didn't protect anythingLeon Romanovsky
The dev->devr.mutex was intended to protect GSI QP pointer change in the struct mlx5_ib_port_resources when it is accessed from the pkey_change_work. However that pointer isn't changed during the runtime and once IB/core adds MAD, it stays stable. Link: https://lore.kernel.org/r/6e338c561033df20d92e1371fc6a7a0d93aad945.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03RDMA/mlx5: Cancel pkey work before destroying device resourcesLeon Romanovsky
In the driver release flow, we are ensuring that notifier is disabled and no new works can be added to pkey_change_handler. It means that we can cancel that handler before destroying resources to make sure that our unwind routine is symmetrical to the allocation one. Link: https://lore.kernel.org/r/f2b1ea1bad952e4e7a48a6f731de9e0344986b29.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-20RDMA/mlx5: Add DCS offload supportLior Nahmanson
DCS is an offload to SW load balancing of DC initiator work requests. A single DCI can be connected to only one target at the time and can't start new connection until the previous work request is completed. This limitation will cause to delay when the initiator process needs to transfer data to multiple targets at the same time. The SW solution is to use a process that handling and spreading the work request on many DCIs according to destinations. This feature is an offload to this process and coming to reduce the load from the CPU and improve the performance. Link: https://lore.kernel.org/r/491c2c2afdb5b07de7f03eab3f93cf0704549dbc.1624258894.git.leonro@nvidia.com Reviewed-by: Meir Lichtinger <meirl@nvidia.com> Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-29RDMA/mlx5: Don't access NULL-cleared mpi pointerLeon Romanovsky
The "dev->port[i].mp.mpi" is set to NULL during mlx5_ib_unbind_slave_port() execution, however that field is needed to add device to unaffiliated list. Such flow causes to the following kernel panic while unloading mlx5_ib module in multi-port mode, hence the device should be added to the list prior to unbind call. RPC: Unregistered rdma transport module. RPC: Unregistered rdma backchannel transport module. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP NOPTI CPU: 4 PID: 1904 Comm: modprobe Not tainted 5.13.0-rc7_for_upstream_min_debug_2021_06_24_12_08 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5_ib_cleanup_multiport_master+0x18b/0x2d0 [mlx5_ib] Code: 00 04 0f 85 c4 00 00 00 48 89 df e8 ef fa ff ff 48 8b 83 40 0d 00 00 48 8b 15 b9 e8 05 00 4a 8b 44 28 20 48 89 05 ad e8 05 00 <48> c7 00 d0 57 c5 a0 48 89 50 08 48 89 02 39 ab 88 0a 00 00 0f 86 RSP: 0018:ffff888116ee3df8 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffff8881154f6000 RCX: 0000000000000080 RDX: ffffffffa0c557d0 RSI: ffff88810b69d200 RDI: 000000000002d8a0 RBP: 0000000000000002 R08: ffff888110780408 R09: 0000000000000000 R10: ffff88812452e1c0 R11: fffffffffff7e028 R12: 0000000000000000 R13: 0000000000000080 R14: ffff888102c58000 R15: 0000000000000000 FS: 00007f884393a740(0000) GS:ffff8882f5a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001249f6004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_ib_stage_init_cleanup+0x16/0xd0 [mlx5_ib] __mlx5_ib_remove+0x33/0x90 [mlx5_ib] mlx5r_remove+0x22/0x30 [mlx5_ib] auxiliary_bus_remove+0x18/0x30 __device_release_driver+0x177/0x220 driver_detach+0xc4/0x100 bus_remove_driver+0x58/0xd0 auxiliary_driver_unregister+0x12/0x20 mlx5_ib_cleanup+0x13/0x897 [mlx5_ib] __x64_sys_delete_module+0x154/0x230 ? exit_to_user_mode_prepare+0x104/0x140 do_syscall_64+0x3f/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f8842e095c7 Code: 73 01 c3 48 8b 0d d9 48 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 48 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc68f6e758 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005638207929c0 RCX: 00007f8842e095c7 RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000563820792a28 RBP: 00005638207929c0 R08: 00007ffc68f6d701 R09: 0000000000000000 R10: 00007f8842e82880 R11: 0000000000000206 R12: 0000563820792a28 R13: 0000000000000001 R14: 0000563820792a28 R15: 00007ffc68f6fb40 Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_ipoib ib_cm ib_umad mlx5_ib(-) mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core mlx5_core ptp pps_core [last unloaded: rpcrdma] CR2: 0000000000000000 ---[ end trace a0bb7e20804e9e9b ]--- Fixes: 7ce6095e3bff ("RDMA/mlx5: Don't add slave port to unaffiliated list") Link: https://lore.kernel.org/r/899ac1b33a995be5ec0e16a4765c4e43c2b1ba5b.1624956444.git.leonro@nvidia.com Reviewed-by: Itay Aveksis <itayav@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22Merge branch 'mlx5_realtime_ts' into rdma.git for-nextJason Gunthorpe
Aharon Landau says: ==================== In case device supports only real-time timestamp, the kernel will fail to create QP despite rdma-core requested such timestamp type. It is because device returns free-running timestamp, and the conversion from free-running to real-time is performed in the user space. This series fixes it, by returning real-time timestamp. ==================== * mlx5_realtime_ts: RDMA/mlx5: Support real-time timestamp directly from the device RDMA/mlx5: Refactor get_ts_format functions to simplify code Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22Merge tag 'v5.13-rc7' into rdma.git for-nextJason Gunthorpe
Linux 5.13-rc7 Needed for dependencies in following patches. Merge conflict in rxe_cmop.c resolved by compining both patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22RDMA/mlx5: Support real-time timestamp directly from the deviceAharon Landau
Currently, if the user asks for a real-time timestamp, the device will return a free-running one, and the timestamp will be translated to real-time in the user-space. When the device supports only real-time timestamp and not free-running, the creation of the QP will fail even though the user needs supported the real-time one. To prevent this, we will return the real-time timestamp directly from the device. Link: https://lore.kernel.org/r/c6cfc8e6f038575c5c2de6505830f7e74e4de80d.1623829775.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA: Remove rdma_set_device_sysfs_group()Jason Gunthorpe
The driver's device group can be specified as part of the ops structure like the device's port group. No need for the complicated API. Link: https://lore.kernel.org/r/8964785a34fd3a29ff5b6693493f575b717e594d.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/mlx5: Don't add slave port to unaffiliated listLeon Romanovsky
The mlx5_ib_bind_slave_port() doesn't remove multiport device from the unaffiliated list, but mlx5_ib_unbind_slave_port() did it. This unbalanced flow caused to the situation where mlx5_ib_unaffiliated_port_list was changed during iteration. Fixes: 32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE") Link: https://lore.kernel.org/r/2726e6603b1e6ecfe76aa5a12a063af72173bcf7.1622477058.git.leonro@nvidia.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-20RDMA/mlx5: Add SQD2RTS bit to the alloc ucontext responseSergey Gorenko
The new bit in the comp_mask is needed to mark that kernel supports SQD2RTS transition for the modify QP command. Link: https://lore.kernel.org/r/7ce705fedac1b2b8e3a2f4013e04244dc5946344.1620641808.git.leonro@nvidia.com Reviewed-by: Evgenii Kochetov <evgeniik@nvidia.com> Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-17RDMA/mlx5: Recover from fatal event in dual port modeMaor Gottlieb
When there is fatal event on the slave port, the device is marked as not active. We need to mark it as active again when the slave is recovered to regain full functionality. Fixes: d69a24e03659 ("IB/mlx5: Move IB event processing onto a workqueue") Link: https://lore.kernel.org/r/8906754455bb23019ef223c725d2c0d38acfb80b.1620711734.git.leonro@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-20IB/mlx5: Set right RoCE l3 type and roce version while deleting GIDParav Pandit
Currently when GID is deleted, it zero out all the fields of the RoCE address in the SET_ROCE_ADDRESS command for a specified index. roce_version = 0 means RoCEv1 in the SET_ROCE_ADDRESS command. This assumes that device has RoCEv1 always enabled which is not always correct. For example Subfunction does not support RoCEv1. Due to this assumption a previously added RoCEv2 GID is always deleted as RoCEv1 GID. This results in a below syndrome: mlx5_core.sf mlx5_core.sf.4: mlx5_cmd_check:777:(pid 4256): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x12822d) Hence set the right RoCE version during GID deletion provided by the core. Link: https://lore.kernel.org/r/d3f54129c90ca329caf438dbe31875d8ad08d91a.1618753425.git.leonro@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13Merge branch 'mlx5_memic_ops' of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Maor Gottlieb says: ==================== This series from Maor extends MEMIC to support atomic operations from the host in addition to already supported regular read/write. ==================== * 'memic_ops': RDMA/mlx5: Expose UAPI to query DM RDMA/mlx5: Add support in MEMIC operations RDMA/mlx5: Add support to MODIFY_MEMIC command RDMA/mlx5: Re-organize the DM code RDMA/mlx5: Move all DM logic to separate file RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number net/mlx5: Add MEMIC operations related bits
2021-04-13RDMA/mlx5: Add support in MEMIC operationsMaor Gottlieb
MEMIC buffer, in addition to regular read and write operations, can support atomic operations from the host. Introduce and implement new UAPI to allocate address space for MEMIC operations such as atomic. This includes: 1. Expose new IOCTL for request mapping of MEMIC operation. 2. Hold the operations address in a list, so same operation to same DM will be allocated only once. 3. Manage refcount on the mlx5_ib_dm object, so it would be keep valid until all addresses were unmapped. Link: https://lore.kernel.org/r/20210411122924.60230-7-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13RDMA/mlx5: Re-organize the DM codeMaor Gottlieb
1. Inline the checks from check_dm_type_support() into their respective allocation functions. 2. Fix use after free when driver fails to copy the MEMIC address to the user by moving the allocation code into their respective functions, hence we avoid the explicit call to free the DM in the error flow. 3. Split mlx5_ib_dm struct to memic and icm proper typesafety throughout. Fixes: dc2316eba73f ("IB/mlx5: Fix device memory flows") Link: https://lore.kernel.org/r/20210411122924.60230-5-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13RDMA/mlx5: Move all DM logic to separate fileMaor Gottlieb
Move all device memory related code to a separate file. Link: https://lore.kernel.org/r/20210411122924.60230-4-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12Merge branch 'mlx5-next' of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== This pr contains changes from mlx5-next branch, already reviewed on netdev and rdma mailing lists, links below. 1) From Leon, Dynamically assign MSI-X vectors count Already Acked by Bjorn Helgaas. https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/ 2) Cleanup series: https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/ From Mark, E-Switch cleanups and refactoring, and the addition of single FDB mode needed HW bits. From Mikhael, Remove unused struct field From Saeed, Cleanup W=1 prototype warning From Zheng, Esw related cleanup From Tariq, User order-0 page allocation for EQs ==================== * mlx5-next: net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks net/mlx5: Dynamically assign MSI-X vectors count net/mlx5: Add dynamic MSI-X capabilities bits PCI/IOV: Add sysfs MSI-X vector assignment interface net/mlx5: Use order-0 allocations for EQs net/mlx5: Add IFC bits needed for single FDB mode net/mlx5: E-Switch, Refactor send to vport to be more generic RDMA/mlx5: Use representor E-Switch when getting netdev and metadata net/mlx5: E-Switch, Add eswitch pointer to each representor net/mlx5: E-Switch, Add match on vhca id to default send rules net/mlx5: Remove unused mlx5_core_health member recover_work net/mlx5: simplify the return expression of mlx5_esw_offloads_pair() net/mlx5: Cleanup prototype warning Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26RDMA/mlx5: Set ODP caps only if device profile support ODPShay Drory
Currently, ODP caps are set during the init stage of mlx5_ib_dev, regardless of whether the device profile supports ODP or not. There is no point in setting ODP caps if the device profile doesn't support ODP. Hence, move setting the ODP caps to the odp_init stage. Link: https://lore.kernel.org/r/20210318135259.681264-1-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>