summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core
AgeCommit message (Collapse)Author
2020-12-10RDMA/core: Clean up cq pool mechanismJack Morgenstein
The CQ pool mechanism had two problems: 1. The CQ pool lists were uninitialized in the device registration error flow. As a result, all the list pointers remained NULL. This caused the kernel to crash (in procedure ib_cq_pool_destroy) when that error flow was taken (and unregister called). The stack trace snippet: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0×0000) ? not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI . . . RIP: 0010:ib_cq_pool_destroy+0x1b/0×70 [ib_core] . . . Call Trace: disable_device+0x9f/0×130 [ib_core] __ib_unregister_device+0x35/0×90 [ib_core] ib_register_device+0x529/0×610 [ib_core] __mlx5_ib_add+0x3a/0×70 [mlx5_ib] mlx5_add_device+0x87/0×1c0 [mlx5_core] mlx5_register_interface+0x74/0xc0 [mlx5_core] do_one_initcall+0x4b/0×1f4 do_init_module+0x5a/0×223 load_module+0x1938/0×1d40 2. At device unregister, when cleaning up the cq pool, the cq's in the pool lists were freed, but the cq entries were left in the list. The fix for the first issue is to initialize the cq pool lists when the ib_device structure is allocated for a new device (in procedure _ib_alloc_device). The fix for the second problem is to delete cq entries from the pool lists when cleaning up the cq pool. In addition, procedure ib_cq_pool_destroy() is renamed to the more appropriate name ib_cq_pool_cleanup(). Fixes: 4aa1615268a8 ("RDMA/core: Fix ordering of CQ pool destruction") Link: https://lore.kernel.org/r/20201208073545.9723-2-leon@kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-10RDMA/core: Update kernel documentation for ib_create_named_qp()Lukas Bulwahn
Commit 66f57b871efc ("RDMA/restrack: Support all QP types") extends ib_create_qp() to a named ib_create_named_qp(), which takes the caller's name as argument, but it did not add the new argument description to the function's kerneldoc. make htmldocs warns: ./drivers/infiniband/core/verbs.c:1206: warning: Function parameter or member 'caller' not described in 'ib_create_named_qp' Add a description for this new argument based on the description of the same argument in other related functions. Fixes: 66f57b871efc ("RDMA/restrack: Support all QP types") Link: https://lore.kernel.org/r/20201207173255.13355-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-09RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewaitLeon Romanovsky
If cm_create_timewait_info() fails, the timewait_info pointer will contain an error value and will be used in cm_remove_remote() later. general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127] CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978 Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00 RSP: 0018:ffff888013127918 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000 RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4 RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70 R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c FS: 00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155 cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline] rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107 rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140 ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069 ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724 vfs_write+0x220/0×830 fs/read_write.c:603 ksys_write+0x1df/0×240 fs/read_write.c:658 do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/20201204064205.145795-1-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reported-by: Amit Matityahu <mitm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07RDMA/core: Fix empty gid table for non IB/RoCE devicesGal Pressman
The query_gid_table ioctl skips non IB/RoCE ports, which as a result returns an empty gid table for devices such as EFA which have a GID table, but are not IB/RoCE. Fixes: c4b4d548fabc ("RDMA/core: Introduce new GID table query API") Link: https://lore.kernel.org/r/20201206153238.34878-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07IB: Fix kernel-doc markupsMauro Carvalho Chehab
Some functions have different names between their prototypes and the kernel-doc markup. Others need to be fixed, as kernel-doc markups should use this format: identifier - description Link: https://lore.kernel.org/r/78b98c41a5a0f4c0106433d305b143028a4168b0.1606823973.git.mchehab+huawei@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07RDMA/uverbs: Allow drivers to create a new HW object during rereg_mrJason Gunthorpe
mlx5 has an ugly flow where it tries to allocate a new MR and replace the existing MR in the same memory during rereg. This is very complicated and buggy. Instead of trying to replace in-place inside the driver, provide support from uverbs to change the entire HW object assigned to a handle during rereg_mr. Since destroying a MR is allowed to fail (ie if a MW is pointing at it) and can't be detected in advance, the algorithm creates a completely new uobject to hold the new MR and swaps the IDR entries of the two objects. The old MR in the temporary IDR entry is destroyed, and if it fails rereg_mr succeeds and destruction is deferred to FD release. This complexity is why this cannot live in a driver safely. Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07RDMA/uverbs: Check ODP in ib_check_mr_access() as wellJason Gunthorpe
No reason only one caller checks this. This properly blocks ODP from the rereg flow if the device does not support ODP. Link: https://lore.kernel.org/r/20201130075839.278575-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07RDMA/uverbs: Tidy input validation of ib_uverbs_rereg_mr()Jason Gunthorpe
Unknown flags should be EOPNOTSUPP, only zero flags is EINVAL. Flags is actually the rereg action to perform. The checking of the start/hca_va/etc is also redundant and ib_umem_get() does these checks and returns proper error codes. Fixes: 7e6edb9b2e0b ("IB/core: Add user MR re-registration support") Link: https://lore.kernel.org/r/20201130075839.278575-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27RDMA/restrack: Support all QP typesLeon Romanovsky
The latest changes in restrack name handling allowed to simplify the QP creation code to support all types of QPs. For example XRC QP are presented with rdmatool. $ ibv_xsrq_pingpong & $ rdma res show qp link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 7 type UD state RTS sq-psn 0 comm [mlx5_ib] link ibp0s9/1 lqpn 42 type XRC_TGT state INIT sq-psn 0 path-mig-state MIGRATED comm [ib_uverbs] link ibp0s9/1 lqpn 43 type XRC_INI state INIT sq-psn 0 path-mig-state MIGRATED pdn 197 pid 419 comm ibv_xsrq_pingpong Link: https://lore.kernel.org/r/20201117070148.1974114-4-leon@kernel.org Reviewed-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27RDMA/core: Allow drivers to disable restrack DBLeon Romanovsky
Driver QP types are special case with no IBTA restrictions. For example, EFA implemented creation of this QP type as regular one, while mlx5 separated create to two step: create and modify. That separation causes to the situation where DC QP (mlx5) is always added to the same xarray index zero. This change allows to drivers like mlx5 simply disable restrack DB tracking, but it doesn't disable kref on the memory. Fixes: 52e0a118a203 ("RDMA/restrack: Track driver QP types in resource tracker") Link: https://lore.kernel.org/r/20201117070148.1974114-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27RDMA/core: Track device memory MRsLeon Romanovsky
Device memory (DM) are registered as MR during initialization flow, these MRs were not tracked by resource tracker and had res->valid set as a false. Update the code to manage them too. Before this change: $ ibv_rc_pingpong -j & $ rdma res show mr <-- shows nothing After this change: $ ibv_rc_pingpong -j & $ rdma res show mr dev ibp0s9 mrn 0 mrlen 4096 pdn 3 pid 734 comm ibv_rc_pingpong Fixes: be934cca9e98 ("IB/uverbs: Add device memory registration ioctl support") Link: https://lore.kernel.org/r/20201117070148.1974114-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-25RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all() error unwindJason Gunthorpe
rdma_detroy_id() cannot be called under &lock - we must instead keep the error'd ID around until &lock can be released, then destroy it. This is complicated by the usual way listen IDs are destroyed through cma_process_remove() which can run at any time and will asynchronously destroy the same ID. Remove the ID from visiblity of cma_process_remove() before going down the destroy path outside the locking. Fixes: c80a0c52d85c ("RDMA/cma: Add missing error handling of listen_id") Link: https://lore.kernel.org/r/20201118133756.GK244516@ziepe.ca Reported-by: syzbot+1bc48bf7f78253f664a9@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-19Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17RDMA/core: remove use of dma_virt_opsChristoph Hellwig
Use the ib_dma_* helpers to skip the DMA translation instead. This removes the last user if dma_virt_ops and keeps the weird layering violation inside the RDMA core instead of burderning the DMA mapping subsystems with it. This also means the software RDMA drivers now don't have to mess with DMA parameters that are not relevant to them at all, and that in the future we can use PCI P2P transfers even for software RDMA, as there is no first fake layer of DMA mapping that the P2P DMA support. Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-17Merge branch 'for-rc' into rdma.gitJason Gunthorpe
From https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git The rc RDMA branch is needed due to dependencies on the next patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16RDMA/efa: Remove .create_ah callback assignmentGal Pressman
Drivers now expose two callbacks for address handle creation, one for uverbs and one for kverbs. EFA only supports uverbs so the .create_ah assignment can be removed. Fix the core code caller to check the proper function pointer. Link: https://lore.kernel.org/r/20201115103404.48829-3-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16RDMA/cma: Add missing error handling of listen_idLeon Romanovsky
Don't silently continue if rdma_listen() fails but destroy previously created CM_ID and return an error to the caller. Fixes: d02d1f5359e7 ("RDMA/cma: Fix deadlock destroying listen requests") Link: https://lore.kernel.org/r/20201104144008.3808124-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16RDMA/restrack: Store all special QPs in restrack DBLeon Romanovsky
Special QPs (SMI and GSI) have different rules in regards of their QP numbers. While all other QP numbers are unique per-device, the QP0 and QP1 are created per-port as requested by IBTA. In multiple port devices, the number of SMI and GSI QPs with be equal to the number ports. $ rdma dev 0: ibp0s9: node_type ca fw 4.4.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 $ rdma link 0/1: ibp0s9/1: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP 0/2: ibp0s9/2: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state UNKNOWN physical_state UNKNOWN Before: $ rdma res show qp type SMI,GSI link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] After: $ rdma res show qp type SMI,GSI link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] link ibp0s9/2 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/2 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] Link: https://lore.kernel.org/r/20201104144008.3808124-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16RDMA/counter: Combine allocation and bind logicLeon Romanovsky
RDMA counters are allocated and bounded to QP immediately after that. Only after this two step process they are really usable. By combining the logic, we are ensuring that once counter is returned to the caller, it will have everything set. Link: https://lore.kernel.org/r/20201104144008.3808124-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16treewide: rename nla_strlcpy to nla_strscpy.Francis Laniel
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new name of this function. Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_sizeChristoph Hellwig
RDMA ULPs must not call DMA mapping APIs directly but instead use the ib_dma_* wrappers. Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages") Link: https://lore.kernel.org/r/20201106181941.1878556-3-hch@lst.de Reported-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12RDMA/core: Make FD destroy callback voidLeon Romanovsky
All FD object destroy implementations return 0, so declare this callback void. Link: https://lore.kernel.org/r/20201104144556.3809085-3-leon@kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12RDMA/core: Postpone uobject cleanup on failure till FD closeLeon Romanovsky
Remove the ib_is_destroyable_retryable() concept. The idea here was to allow the drivers to forcibly clean the HW object even if they otherwise didn't want to (eg because of usecnt). This was an attempt to clean up in a world where drivers were not allowed to fail HW object destruction. Now that we are going back to allowing HW objects to fail destroy this doesn't make sense. Instead if a uobject's HW object can't be destroyed it is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to clean it. Multiple passes over the uobject list allow hidden dependencies to be resolved. If that fails the HW driver is broken, throw a WARN_ON and leak the HW object memory. All the other tricky failure paths (eg on creation error unwind) have already been updated to this new model. Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12RDMA/cm: Make the local_id_table xarray non-irqJason Gunthorpe
The xarray is never mutated from an IRQ handler, only from work queues under a spinlock_irq. Thus there is no reason for it be an IRQ type xarray. This was copied over from the original IDR code, but the recent rework put the xarray inside another spinlock_irq which will unbalance the unlocking. Fixes: c206f8bad15d ("RDMA/cm: Make it clearer how concurrency works in cm_req_handler()") Link: https://lore.kernel.org/r/0-v1-808b6da3bd3f+1857-cm_xarray_no_irq_jgg@nvidia.com Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02IB/core: Add support for NDR link speedMeir Lichtinger
Add new IBTA speed NDR, supporting signaling rate of 100Gb. Link: https://lore.kernel.org/r/20201026133738.1340432-2-leon@kernel.org Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02RDMA/mlx5: Use ib_umem_find_best_pgsz() for mkc'sJason Gunthorpe
Now that all the PAS arrays or UMR XLT's for mkcs are filled using rdma_for_each_block() we can use the common ib_umem_find_best_pgsz() algorithm. Link: https://lore.kernel.org/r/20201026132314.1336717-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-30RDMA: Convert various random sprintf sysfs _show uses to sysfs_emitJoe Perches
Manual changes for sysfs_emit as cocci scripts can't easily convert them. Link: https://lore.kernel.org/r/ecde7791467cddb570c6f6d2c908ffbab9145cac.1602122880.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-30RDMA: Manual changes for sysfs_emit and neateningJoe Perches
Make changes to use sysfs_emit in the RDMA code as cocci scripts can not be written to handle _all_ the possible variants of various sprintf family uses in sysfs show functions. While there, make the code more legible and update its style to be more like the typical kernel styles. Miscellanea: o Use intermediate pointers for dereferences o Add and use string lookup functions o return early when any intermediate call fails so normal return is at the bottom of the function o mlx4/mcg.c:sysfs_show_group: use scnprintf to format intermediate strings Link: https://lore.kernel.org/r/f5c9e4c9d8dafca1b7b70bd597ee7f8f219c31c8.1602122880.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28RDMA/core: Fix error return in _ib_modify_qp()Jing Xiangfeng
Fix to return error code PTR_ERR() from the error handling case instead of 0. Fixes: 51aab12631dd ("RDMA/core: Get xmit slave for LAG") Link: https://lore.kernel.org/r/20201016075845.129562-1-jingxiangfeng@huawei.com Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28RDMA: Add rdma_connect_locked()Jason Gunthorpe
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the handler triggers a completion and another thread does rdma_connect() or the handler directly calls rdma_connect(). In all cases rdma_connect() needs to hold the handler_mutex, but when handler's are invoked this is already held by the core code. This causes ULPs using the 2nd method to deadlock. Provide a rdma_connect_locked() and have all ULPs call it from their handlers. Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Convert sysfs device * show functions to use sysfs_emit()Joe Perches
Done with cocci script: @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA/uverbs: Fix false error in query gid IOCTLGal Pressman
Some drivers (such as EFA) have a GID table, but aren't IB/RoCE devices. Remove the unnecessary rdma_ib_or_roce() check. This fixes rdma-core failures for EFA when it uses the new ioctl interface for querying the GID table. Fixes: 9f85cbe50aa0 ("RDMA/uverbs: Expose the new GID query API to user space") Link: https://lore.kernel.org/r/20201026082621.32463-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Remove AH from uverbs_cmd_maskJason Gunthorpe
Drivers that need a uverbs AH should instead set the create_user_ah() op similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never implemented so are just deleted. Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA/core Remove uverbs_ex_cmd_maskJason Gunthorpe
No driver sets it, and the core code sets a maximum mask, simply remove it. Disabled operations are now handled either by having a NULL ops pointer, or by having the common driver callbacks check for unsupported extended attributes. Link: https://lore.kernel.org/r/9-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check create_flags during create_qpJason Gunthorpe
Each driver should check that the QP attrs create_flags is supported. Unfortuantely when create_flags was added to the QP attrs the drivers were not updated. uverbs_ex_cmd_mask was used to block it - even though kernel drivers use these flags too. Check that flags is zero in all drivers that don't use it, remove IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code to be EOPNOTSUPP. Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check flags during create_cqJason Gunthorpe
Each driver should check that the CQ attrs is supported. Unfortuantely when flags was added to the CQ attrs the drivers were not updated, uverbs_ex_cmd_mask was used to block it. This was missed when create CQ was converted to ioctl, so non-zero flags could have been passed into drivers. Check that flags is zero in all drivers that don't use it, remove IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask. Fixes: 41b2a71fc848 ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file") Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check attr_mask during modify_qpJason Gunthorpe
Each driver should check that it can support the provided attr_mask during modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block modify_qp_ex because the driver didn't check RATE_LIMIT. Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check srq_type during create_srqJason Gunthorpe
uverbs was blocking srq_types the driver doesn't support based on the CREATE_XSRQ cmd_mask. Fix all drivers to check for supported srq_types during create_srq and move CREATE_XSRQ to the core code. Link: https://lore.kernel.org/r/5-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Move more uverbs_cmd_mask settings to the coreJason Gunthorpe
These functions all depend on the driver providing a specific op: - REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op - ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this without providing the op - OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow. - OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd() - CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq() - QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but sometimes supplies a NULL op. - RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op - ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an (now deleted) implementation but no userspace All drivers were checked that no drivers provide the op without also setting uverbs_cmd_mask so this should have no functional change. Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Remove elements in uverbs_cmd_mask that all drivers setJason Gunthorpe
This is a step toward eliminating uverbs_cmd_mask. Preset this list in the core code. Only the op reg_user_mr wasn't already being required from the drivers. Link: https://lore.kernel.org/r/3-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Remove uverbs_ex_cmd_mask values that are linked to functionsJason Gunthorpe
Since a while now the uverbs layer checks if the driver implements a function before allowing the ucmd to proceed. This largely obsoletes the cmd_mask stuff, but there is some tricky bits in drivers preventing it from being removed. Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the core code. These are triggered soley based on the related ops function pointer. query_device_ex is not triggered based on an op, but all drivers already implement something compatible with the extension, so enable it globally too. Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "A usual cycle for RDMA with a typical mix of driver and core subsystem updates: - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns, usnic, qib, qedr, cxgb4, hns, bnxt_re - Various rtrs fixes and updates - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA wasn't working right - Use tracepoints instead of pr_debug in the CM code - Scrub the locking in ucma and cma to close more syzkaller bugs - Use tasklet_setup in the subsystem - Revert the idea that 'destroy' operations are not allowed to fail at the driver level. This proved unworkable from a HW perspective. - Revise how the umem API works so drivers make fewer mistakes using it - XRC support for qedr - Convert uverbs objects RWQ and MW to new the allocation scheme - Large queue entry sizes for hns - Use hmm_range_fault() for mlx5 On Demand Paging - uverbs APIs to inspect the GID table instead of sysfs - Move some of the RDMA code for building large page SGLs into lib/scatterlist" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits) RDMA/ucma: Fix use after free in destroy id flow RDMA/rxe: Handle skb_clone() failure in rxe_recv.c RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI RDMA: Explicitly pass in the dma_device to ib_register_device lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values IB/mlx4: Convert rej_tmout radix-tree to XArray RDMA/rxe: Fix bug rejecting all multicast packets RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt() RDMA/rxe: Remove duplicate entries in struct rxe_mr IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS IB/rdmavt: Fix sizeof mismatch MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl. RDMA/bnxt_re: Use rdma_umem_for_each_dma_block() RDMA/umem: Move to allocate SG table from pages lib/scatterlist: Add support in dynamic allocation of SG table from pages tools/testing/scatterlist: Show errors in human readable form tools/testing/scatterlist: Rejuvenate bit-rotten test RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces RDMA/uverbs: Expose the new GID query API to user space ...
2020-10-16mm: remove the now-unnecessary mmget_still_valid() hackJann Horn
The preceding patches have ensured that core dumping properly takes the mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all its users. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16RDMA/ucma: Fix use after free in destroy id flowMaor Gottlieb
ucma_free_ctx() should call to __destroy_id() on all the connection requests that have not been delivered to user space. Currently it calls on the context itself and cause to use after free. Fixes the trace: BUG: Unable to handle kernel data access on write at 0x5deadbeef0000108 Faulting instruction address: 0xc0080000002428f4 Oops: Kernel access of bad area, sig: 11 [#1] Call Trace: [c000000207f2b680] [c00800000024280c] .__destroy_id+0x28c/0x610 [rdma_ucm] (unreliable) [c000000207f2b750] [c0080000002429c4] .__destroy_id+0x444/0x610 [rdma_ucm] [c000000207f2b820] [c008000000242c24] .ucma_close+0x94/0xf0 [rdma_ucm] [c000000207f2b8c0] [c00000000046fbdc] .__fput+0xac/0x330 [c000000207f2b960] [c00000000015d48c] .task_work_run+0xbc/0x110 [c000000207f2b9f0] [c00000000012fb00] .do_exit+0x430/0xc50 [c000000207f2bae0] [c0000000001303ec] .do_group_exit+0x5c/0xd0 [c000000207f2bb70] [c000000000144a34] .get_signal+0x194/0xe30 [c000000207f2bc60] [c00000000001f6b4] .do_notify_resume+0x124/0x470 [c000000207f2bd60] [c000000000032484] .interrupt_exit_user_prepare+0x1b4/0x240 [c000000207f2be20] [c000000000010034] interrupt_return+0x14/0x1c0 Rename listen_ctx to conn_req_ctx as the poor name was the cause of this bug. Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery") Link: https://lore.kernel.org/r/20201012045600.418271-4-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16RDMA: Explicitly pass in the dma_device to ib_register_deviceJason Gunthorpe
The code in setup_dma_device has become rather convoluted, move all of this to the drivers. Drives now pass in a DMA capable struct device which will be used to setup DMA, or drivers must fully configure the ibdev for DMA and pass in NULL. Other than setting the masks in rvt all drivers were doing this already anyhow. mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for DMA based on their hardweare limits in: __mthca_init_one() dma_set_max_seg_size (1G) __mlx4_init_one() dma_set_max_seg_size (1G) mlx5_pci_init() set_dma_caps() dma_set_max_seg_size (2G) Other non software drivers (except usnic) were extended to UINT_MAX [1, 2] instead of 2G as was before. [1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/ [2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/ Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16Merge branch 'dynamic_sg' into rdma.git for-nextJason Gunthorpe
From Maor Gottlieb says: ==================== This series extends __sg_alloc_table_from_pages to allow chaining of new pages to an already initialized SG table. This allows for drivers to utilize the optimization of merging contiguous pages without a need to pre allocate all the pages and hold them in a very large temporary buffer prior to the call to SG table initialization. The last patch changes the Infiniband core to use the new API. It removes duplicate functionality from the code and benefits from the optimization of allocating dynamic SG table from pages. In huge pages system of 2MB page size, without this change, the SG table would contain x512 SG entries. ==================== * branch 'dynamic_sg': RDMA/umem: Move to allocate SG table from pages lib/scatterlist: Add support in dynamic allocation of SG table from pages tools/testing/scatterlist: Show errors in human readable form tools/testing/scatterlist: Rejuvenate bit-rotten test
2020-10-05RDMA/umem: Move to allocate SG table from pagesMaor Gottlieb
Remove the implementation of ib_umem_add_sg_table and instead call to __sg_alloc_table_from_pages which already has the logic to merge contiguous pages. Besides that it removes duplicated functionality, it reduces the memory consumption of the SG table significantly. Prior to this patch, the SG table was allocated in advance regardless consideration of contiguous pages. In huge pages system of 2MB page size, without this change, the SG table would contain x512 SG entries. E.g. for 100GB memory registration: Number of entries Size Before 26214400 600.0MB After 51200 1.2MB Link: https://lore.kernel.org/r/20201004154340.1080481-5-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Make sure SKB control block is in the proper state during IPSEC ESP-in-TCP encapsulation. From Sabrina Dubroca. 2) Various kinds of attributes were not being cloned properly when we build new xfrm_state objects from existing ones. Fix from Antony Antony. 3) Make sure to keep BTF sections, from Tony Ambardar. 4) TX DMA channels need proper locking in lantiq driver, from Hauke Mehrtens. 5) Honour route MTU during forwarding, always. From Maciej Żenczykowski. 6) Fix races in kTLS which can result in crashes, from Rohit Maheshwari. 7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan Jha. 8) Use correct address family in xfrm state lookups, from Herbert Xu. 9) A bridge FDB flush should not clear out user managed fdb entries with the ext_learn flag set, from Nikolay Aleksandrov. 10) Fix nested locking of netdev address lists, from Taehee Yoo. 11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau. 12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit. 13) Don't free command entries in mlx5 while comp handler could still be running, from Eran Ben Elisha. 14) Error flow of request_irq() in mlx5 is busted, due to an off by one we try to free and IRQ never allocated. From Maor Gottlieb. 15) Fix leak when dumping netlink policies, from Johannes Berg. 16) Sendpage cannot be performed when a page is a slab page, or the page count is < 1. Some subsystems such as nvme were doing so. Create a "sendpage_ok()" helper and use it as needed, from Coly Li. 17) Don't leak request socket when using syncookes with mptcp, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net/core: check length before updating Ethertype in skb_mpls_{push,pop} net: mvneta: fix double free of txq->buf net_sched: check error pointer in tcf_dump_walker() net: team: fix memory leak in __team_options_register net: typhoon: Fix a typo Typoon --> Typhoon net: hinic: fix DEVLINK build errors net: stmmac: Modify configuration method of EEE timers tcp: fix syn cookied MPTCP request socket leak libceph: use sendpage_ok() in ceph_tcp_sendpage() scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map() drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage() tcp: use sendpage_ok() to detect misused .sendpage nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send net: introduce helper sendpage_ok() in include/linux/net.h net: usb: pegasus: Proper error handing when setting pegasus' MAC address net: core: document two new elements of struct net_device netlink: fix policy dump leak net/mlx5e: Fix race condition on nhe->n pointer in neigh update net/mlx5e: Fix VLAN create flow ...
2020-10-01RDMA/uverbs: Expose the new GID query API to user spaceAvihai Horon
Expose the query GID table and entry API to user space by adding two new methods and method handlers to the device object. This API provides a faster way to query a GID table using single call and will be used in libibverbs to improve current approach that requires multiple calls to open, close and read multiple sysfs files for a single GID table entry. Link: https://lore.kernel.org/r/20200923165015.2491894-5-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/core: Introduce new GID table query APIAvihai Horon
Introduce rdma_query_gid_table which enables querying all the GID tables of a given device and copying the attributes of all valid GID entries to a provided buffer. This API provides a faster way to query a GID table using single call and will be used in libibverbs to improve current approach that requires multiple calls to open, close and read multiple sysfs files for a single GID table entry. Link: https://lore.kernel.org/r/20200923165015.2491894-4-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>