summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2022-11-15RDMA/siw: Set defined status for work completion with undefined statusBernard Metzler
A malicious user may write undefined values into memory mapped completion queue elements status or opcode. Undefined status or opcode values will result in out-of-bounds access to an array mapping siw internal representation of opcode and status to RDMA core representation when reaping CQ elements. While siw detects those undefined values, it did not correctly set completion status to a defined value, thus defeating the whole purpose of the check. This bug leads to the following Smatch static checker warning: drivers/infiniband/sw/siw/siw_cq.c:96 siw_reap_cqe() error: buffer overflow 'map_cqe_status' 10 <= 21 Fixes: bdf1da5df9da ("RDMA/siw: Fix immediate work request flush to completion queue") Link: https://lore.kernel.org/r/20221115170747.1263298-1-bmt@zurich.ibm.com Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-15RDMA/nldev: Return "-EAGAIN" if the cm_id isn't from expected portMark Zhang
When filling a cm_id entry, return "-EAGAIN" instead of 0 if the cm_id doesn'the have the same port as requested, otherwise an incomplete entry may be returned, which causes "rdam res show cm_id" to return an error. For example on a machine with two rdma devices with "rping -C 1 -v -s" running background, the "rdma" command fails: $ rdma -V rdma utility, iproute2-5.19.0 $ rdma res show cm_id link mlx5_0/- cm-idn 0 state LISTEN ps TCP pid 28056 comm rping src-addr 0.0.0.0:7174 error: Protocol not available While with this fix it succeeds: $ rdma res show cm_id link mlx5_0/- cm-idn 0 state LISTEN ps TCP pid 26395 comm rping src-addr 0.0.0.0:7174 link mlx5_1/- cm-idn 0 state LISTEN ps TCP pid 26395 comm rping src-addr 0.0.0.0:7174 Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/a08e898cdac5e28428eb749a99d9d981571b8ea7.1667810736.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-15RDMA/core: Make sure "ib_port" is valid when access sysfs nodeMark Zhang
The "ib_port" structure must be set before adding the sysfs kobject, and reset after removing it, otherwise it may crash when accessing the sysfs node: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 Mem abort info: ESR = 0x96000006 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e85f5ba5 [0000000000000050] pgd=0000000848fd9003, pud=000000085b387003, pmd=0000000000000000 Internal error: Oops: 96000006 [#2] PREEMPT SMP Modules linked in: ib_umad(O) mlx5_ib(O) nfnetlink_cttimeout(E) nfnetlink(E) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_nat_ipv6(E) nf_nat_ipv4(E) nf_conncount(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) mst_pciconf(O) ipmi_devintf(E) ipmi_msghandler(E) ipmb_dev_int(OE) mlx5_core(O) mlxfw(O) mlxdevm(O) auxiliary(O) ib_uverbs(O) ib_core(O) mlx_compat(O) psample(E) sbsa_gwdt(E) uio_pdrv_genirq(E) uio(E) mlxbf_pmc(OE) mlxbf_gige(OE) mlxbf_tmfifo(OE) gpio_mlxbf2(OE) pwr_mlxbf(OE) mlx_trio(OE) i2c_mlxbf(OE) mlx_bootctl(OE) bluefield_edac(OE) knem(O) ip_tables(E) ipv6(E) crc_ccitt(E) [last unloaded: mst_pci] Process grep (pid: 3372, stack limit = 0x0000000022055c92) CPU: 5 PID: 3372 Comm: grep Tainted: G D OE 4.19.161-mlnx.47.gadcd9e3 #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.9.2-15-ga2403ab Sep 8 2022 pstate: 40000005 (nZcv daif -PAN -UAO) pc : hw_stat_port_show+0x4c/0x80 [ib_core] lr : port_attr_show+0x40/0x58 [ib_core] sp : ffff000029f43b50 x29: ffff000029f43b50 x28: 0000000019375000 x27: ffff8007b821a540 x26: ffff000029f43e30 x25: 0000000000008000 x24: ffff000000eaa958 x23: 0000000000001000 x22: ffff8007a4ce3000 x21: ffff8007baff8000 x20: ffff8007b9066ac0 x19: ffff8007bae97578 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffff8007a4ce4000 x7 : 0000000000000000 x6 : 000000000000003f x5 : ffff000000e6a280 x4 : ffff8007a4ce3000 x3 : 0000000000000000 x2 : aaaaaaaaaaaaaaab x1 : ffff8007b9066a10 x0 : ffff8007baff8000 Call trace: hw_stat_port_show+0x4c/0x80 [ib_core] port_attr_show+0x40/0x58 [ib_core] sysfs_kf_seq_show+0x8c/0x150 kernfs_seq_show+0x44/0x50 seq_read+0x1b4/0x45c kernfs_fop_read+0x148/0x1d8 __vfs_read+0x58/0x180 vfs_read+0x94/0x154 ksys_read+0x68/0xd8 __arm64_sys_read+0x28/0x34 el0_svc_common+0x88/0x18c el0_svc_handler+0x78/0x94 el0_svc+0x8/0xe8 Code: f2955562 aa1603e4 aa1503e0 f9405683 (f9402861) Fixes: d8a5883814b9 ("RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/88867e705c42c1cd2011e45201c25eecdb9fef94.1667810736.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-15RDMA/restrack: Release MR restrack when deleteMark Zhang
The MR restrack also needs to be released when delete it, otherwise it cause memory leak as the task struct won't be released. Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/703db18e8d4ef628691fb93980a709be673e62e3.1667810736.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-14RDMA/mana: Remove redefinition of basic u64 typeLeon Romanovsky
gdma_obj_handle_t is no more than redefinition of basic u64 type. Remove such obfuscation. Link: https://lore.kernel.org/r/3c1e821279e6a165d058655d2343722d6650e776.1668160486.git.leonro@nvidia.com Acked-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-11RDMA/mana_ib: Add a driver for Microsoft Azure Network AdapterLong Li
Add a RDMA VF driver for Microsoft Azure Network Adapter (MANA). Co-developed-by: Ajay Sharma <sharmaajay@microsoft.com> Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-13-git-send-email-longli@linuxonhyperv.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mmap.cBob Pearson
Replace calls to pr_xxx() in rxe_mmap.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-17-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_icrc.cBob Pearson
Replace calls to pr_xxx() in rxe_icrc.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-16-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.cBob Pearson
Replace calls to pr_xxx() in rxe.c with rxe_dbg_xxx(). Calls with a rxe device not yet in scope are left as is. Link: https://lore.kernel.org/r/20221103171013.20659-15-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_task.cBob Pearson
Replace calls to pr_xxx() in rxe_task.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-14-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_av.cBob Pearson
Replace calls to pr_xxx() in rxe_av.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-13-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_verbs.cBob Pearson
Replace calls to pr_xxx() in rxe_verbs.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-12-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_srq.cBob Pearson
Replace calls to pr_xxx() in rxe_srq.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-11-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_resp.cBob Pearson
Replace calls to pr_xxx() in rxe_resp.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_req.cBob Pearson
Replace calls to pr_xxx() in rxe_req.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_qp.cBob Pearson
Replace calls to pr_xxx() in rxe_qp.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_net.cBob Pearson
Replace (some) calls to pr_xxx() in rxe_net.c with rxe_dbg_xxx(). Calls with a rxe device not yet in scope are left as is. Link: https://lore.kernel.org/r/20221103171013.20659-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mw.cBob Pearson
Replace calls to pr_xxx() int rxe_mw.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.cBob Pearson
Replace calls to pr_xxx() in rxe_mr.c by rxe_dbg_mr(). Link: https://lore.kernel.org/r/20221103171013.20659-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_cq.cBob Pearson
Replace calls to pr_xxx() in rxe_cq.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_comp.cBob Pearson
Replace calls to pr_xxx() in rxe_comp.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10RDMA/rxe: Add ibdev_dbg macros for rxeBob Pearson
Add macros borrowed from siw to call dynamic debug macro ibdev_dbg. Link: https://lore.kernel.org/r/20221103171013.20659-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10IB/mad: Don't call to function that might sleep while in atomic contextLeonid Ravich
Tracepoints are not allowed to sleep, as such the following splat is generated due to call to ib_query_pkey() in atomic context. WARNING: CPU: 0 PID: 1888000 at kernel/trace/ring_buffer.c:2492 rb_commit+0xc1/0x220 CPU: 0 PID: 1888000 Comm: kworker/u9:0 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.3.1.el8.x86_64 #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module_el8.3.0+555+a55c8938 04/01/2014 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] RIP: 0010:rb_commit+0xc1/0x220 RSP: 0000:ffffa8ac80f9bca0 EFLAGS: 00010202 RAX: ffff8951c7c01300 RBX: ffff8951c7c14a00 RCX: 0000000000000246 RDX: ffff8951c707c000 RSI: ffff8951c707c57c RDI: ffff8951c7c14a00 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff8951c7c01300 R11: 0000000000000001 R12: 0000000000000246 R13: 0000000000000000 R14: ffffffff964c70c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8951fbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f20e8f39010 CR3: 000000002ca10005 CR4: 0000000000170ef0 Call Trace: ring_buffer_unlock_commit+0x1d/0xa0 trace_buffer_unlock_commit_regs+0x3b/0x1b0 trace_event_buffer_commit+0x67/0x1d0 trace_event_raw_event_ib_mad_recv_done_handler+0x11c/0x160 [ib_core] ib_mad_recv_done+0x48b/0xc10 [ib_core] ? trace_event_raw_event_cq_poll+0x6f/0xb0 [ib_core] __ib_process_cq+0x91/0x1c0 [ib_core] ib_cq_poll_work+0x26/0x80 [ib_core] process_one_work+0x1a7/0x360 ? create_worker+0x1a0/0x1a0 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x35/0x40 ---[ end trace 78ba8509d3830a16 ]--- Fixes: 821bf1de45a1 ("IB/MAD: Add recv path trace point") Signed-off-by: Leonid Ravich <lravich@gmail.com> Link: https://lore.kernel.org/r/Y2t5feomyznrVj7V@leonid-Inspiron-3421 Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-09RDMA/rxe: Implement packet length validation on responderDaisuke Matsuda
The function check_length() is supposed to check the length of inbound packets on responder, but it actually has been a stub since the driver was born. Let it check the payload length and the DMA length. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20221107055338.357184-1-matsuda-daisuke@fujitsu.com Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-09driver core: class: make namespace and get_ownership take const *Greg Kroah-Hartman
The callbacks in struct class namespace() and get_ownership() do not modify the struct device passed to them, so mark the pointer as constant and fix up all callbacks in the kernel to have the correct function signature. This helps make it more obvious what calls and callbacks do, and do not, modify structures passed to them. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20221001165426.2690912-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-09RDMA/siw: Fix immediate work request flush to completion queueBernard Metzler
Correctly set send queue element opcode during immediate work request flushing in post sendqueue operation, if the QP is in ERROR state. An undefined ocode value results in out-of-bounds access to an array for mapping the opcode between siw internal and RDMA core representation in work completion generation. It resulted in a KASAN BUG report of type 'global-out-of-bounds' during NFSoRDMA testing. This patch further fixes a potential case of a malicious user which may write undefined values for completion queue elements status or opcode, if the CQ is memory mapped to user land. It avoids the same out-of-bounds access to arrays for status and opcode mapping as described above. Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Fixes: b0fff7317bb4 ("rdma/siw: completion queue methods") Reported-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20221107145057.895747-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/erdma: Implement atomic operations supportCheng Xu
Add atomic operations support in post_send and poll_cq implementation. Also, rename 'laddr' and 'lkey' in struct erdma_sge to 'addr' and 'key', because this structure is used for both local and remote SGEs. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20221107021845.44598-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/erdma: Report atomic capacity when hardware supports atomic featureCheng Xu
Introduce "capacity flags" field at where hardware put all zeros originally in "query device" response. Using this field, hardware can report atomic feature if supports. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20221107021845.44598-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/erdma: Extend access right field of FRMR and REG MR to support atomicCheng Xu
To support atomic operations, IB_ACCESS_REMOTE_ATOMIC right should be passed to hardware for permission check. Since "access mode" field in FRMR SQE and RegMr command is never used by hw, we remove the "access mode" field, so that we can then have enough space to extend access fields. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20221107021845.44598-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/irdma: Report the correct link speedShiraz Saleem
The active link speed is currently hard-coded in irdma_query_port due to which the port rate in ibstatus does reflect the active link speed. Call ib_get_eth_speed in irdma_query_port to get the active link speed. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Reported-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221104234957.1135-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-06RDMA/mlx5: Change debug log level for remote access error syndromesArumugam Kolappan
The mlx5 driver dumps the entire CQE buffer by default for few syndromes. Some syndromes are expected due to the application behavior [ex: MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR, MLX5_CQE_SYNDROME_REMOTE_OP_ERR and MLX5_CQE_SYNDROME_LOCAL_PROT_ERR]. Hence, for these syndromes, the patch converts the log level from KERN_WARNING to KERN_DEBUG. This enables the application to get the CQE buffer dump by changing to KERN_DEBUG level as and when needed. Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com> Link: https://lore.kernel.org/r/1667287664-19377-1-git-send-email-aru.kolappan@oracle.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-28RDMA/rxe: cleanup some error handling in rxe_verbs.cYunsheng Lin
Instead of 'goto and return', just return directly to simplify the error handling, and avoid some unnecessary return value check. Link: https://lore.kernel.org/r/20221028075053.3990467-1-xuhaoyue1@hisilicon.com Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Remove the duplicate assignment of mr->map_shiftXiao Yang
mr->map_shift is set to ilog2(RXE_BUF_PER_MAP) in both rxe_mr_init() and rxe_mr_alloc() so remove the duplicate one in rxe_mr_init(). Link: https://lore.kernel.org/r/1666855893-145-1-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Make sure requested access is a subset of {mr,mw}->accessLi Zhijian
We should reject the requests with access flags that is not registered by MR/MW. For example, lookup_mr() should return NULL when requested access is 0x03 and mr->access is 0x01. Link: https://lore.kernel.org/r/20220927055337.22630-2-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Rename task->state_lock to task->lockBob Pearson
Rename task-state_lock to task->lock Link: https://lore.kernel.org/r/20221021200118.2163-7-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Make rxe_do_task staticBob Pearson
The subroutine rxe_do_task() is only called in rxe_task.c. This patch makes it static and renames it do_task(). Link: https://lore.kernel.org/r/20221021200118.2163-6-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Split rxe_run_task() into two subroutinesBob Pearson
Split rxe_run_task(task, sched) into rxe_run_task(task) and rxe_sched_task(task). Link: https://lore.kernel.org/r/20221021200118.2163-5-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Removed unused name from rxe_task structBob Pearson
The name field in struct rxe_task is never used. This patch removes it. Link: https://lore.kernel.org/r/20221021200118.2163-4-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Remove init of task locks from rxe_qp.cBob Pearson
The calls to spin_lock_init() for the tasklet spinlocks in rxe_qp_init_misc() are redundant since they are intiialized in rxe_init_task(). This patch removes them. Link: https://lore.kernel.org/r/20221021200118.2163-3-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/rxe: Remove redundant header filesBob Pearson
Remove unneeded include files. Link: https://lore.kernel.org/r/20221021200118.2163-2-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/qedr: clean up work queue on failure in qedr_alloc_resources()Dan Carpenter
Add a check for if create_singlethread_workqueue() fails and also destroy the work queue on failure paths. Fixes: e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Y1gBkDucQhhWj5YM@kili Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/core: Fix null-ptr-deref in ib_core_cleanup()Chen Zhongjin
KASAN reported a null-ptr-deref error: KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] CPU: 1 PID: 379 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:destroy_workqueue+0x2f/0x740 RSP: 0018:ffff888016137df8 EFLAGS: 00000202 ... Call Trace: ib_core_cleanup+0xa/0xa1 [ib_core] __do_sys_delete_module.constprop.0+0x34f/0x5b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fa1a0d221b7 ... It is because the fail of roce_gid_mgmt_init() is ignored: ib_core_init() roce_gid_mgmt_init() gid_cache_wq = alloc_ordered_workqueue # fail ... ib_core_cleanup() roce_gid_mgmt_cleanup() destroy_workqueue(gid_cache_wq) # destroy an unallocated wq Fix this by catching the fail of roce_gid_mgmt_init() in ib_core_init(). Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Link: https://lore.kernel.org/r/20221025024146.109137-1-chenzhongjin@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-27RDMA/core: Fix order of nldev_exit callLeon Romanovsky
Create symmetrical exit flow by calling to nldev_exit() after call to rdma_nl_unregister(RDMA_NL_LS). Fixes: 6c80b41abe22 ("RDMA/netlink: Add nldev initialization flows") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/64e676774a53a406f4cde265d5a4cfd6b8e97df9.1666683334.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25RDMA/rxe: Fix mr leak in RESPST_ERR_RNRLi Zhijian
rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr) to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning: WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe] ... Call Trace: rxe_dereg_mr+0x4c/0x60 [rdma_rxe] ib_dereg_mr_user+0xa8/0x200 [ib_core] ib_mr_pool_destroy+0x77/0xb0 [ib_core] nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma] nvme_rdma_free_queue+0x40/0x50 [nvme_rdma] nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma] nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma] process_one_work+0x582/0xa40 ? pwq_dec_nr_in_flight+0x100/0x100 ? rwlock_bug.part.0+0x60/0x60 worker_thread+0x2a9/0x700 ? process_one_work+0xa40/0xa40 kthread+0x168/0x1a0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Link: https://lore.kernel.org/r/20221024052049.20577-1-lizhijian@fujitsu.com Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25RDMA/rxe: Remove unnecessary mr testingLi Zhijian
Before the testing, we already passed it to rxe_mr_copy() where mr could be dereferenced. so this checking is not needed. The only way that mr is NULL is when it reaches below line 780 with 'qp->resp.mr = NULL', which is not possible in Bob's explanation[1]. 778 if (res->state == rdatm_res_state_new) { 779 if (!res->replay) { 780 mr = qp->resp.mr; 781 qp->resp.mr = NULL; 782 } else { [1] https://lore.kernel.org/lkml/30ff25c4-ce66-eac4-eaa2-64c0db203a19@gmail.com/ Link: https://lore.kernel.org/r/1666582315-2-1-git-send-email-lizhijian@fujitsu.com CC: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25RDMA/rxe: Handle remote errors in the midst of a Read reply sequenceDaisuke Matsuda
Requesting nodes do not handle a reported error correctly if it is generated in the middle of multi-packet Read responses, and the node tries to resend the request endlessly. Let completer terminate the connection in that case. Link: https://lore.kernel.org/r/20221013014724.3786212-2-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25RDMA/rxe: Make responder handle RDMA Read failuresDaisuke Matsuda
Currently, responder can reply packets with invalid payloads if it fails to copy messages to the packets. Add an error handling in read_reply() to inform a requesting node of the failure. Link: https://lore.kernel.org/r/20221013014724.3786212-1-matsuda-daisuke@fujitsu.com Suggested-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/hns: Fix NULL pointer problem in free_mr_init()Yixing Liu
Lock grab occurs in a concurrent scenario, resulting in stepping on a NULL pointer. It should be init mutex_init() first before use the lock. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: __mutex_lock.constprop.0+0xd0/0x5c0 __mutex_lock_slowpath+0x1c/0x2c mutex_lock+0x44/0x50 free_mr_send_cmd_to_hw+0x7c/0x1c0 [hns_roce_hw_v2] hns_roce_v2_dereg_mr+0x30/0x40 [hns_roce_hw_v2] hns_roce_dereg_mr+0x4c/0x130 [hns_roce_hw_v2] ib_dereg_mr_user+0x54/0x124 uverbs_free_mr+0x24/0x30 destroy_hw_idr_uobject+0x38/0x74 uverbs_destroy_uobject+0x48/0x1c4 uobj_destroy+0x74/0xcc ib_uverbs_cmd_verbs+0x368/0xbb0 ib_uverbs_ioctl+0xec/0x1a4 __arm64_sys_ioctl+0xb4/0x100 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x58/0x190 do_el0_svc+0x30/0x90 el0_svc+0x2c/0xb4 el0t_64_sync_handler+0x1a4/0x1b0 el0t_64_sync+0x19c/0x1a0 Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Link: https://lore.kernel.org/r/20221024083814.1089722-3-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/hns: Disable local invalidate operationYangyang Li
When function reset and local invalidate are mixed, HNS RoCEE may hang. Before introducing the cause of the problem, two hardware internal concepts need to be introduced: 1. Execution queue: The queue of hardware execution instructions, function reset and local invalidate are queued for execution in this queue. 2.Local queue: A queue that stores local operation instructions. The instructions in the local queue will be sent to the execution queue for execution. The instructions in the local queue will not be removed until the execution is completed. The reason for the problem is as follows: 1. There is a function reset instruction in the execution queue, which is currently being executed. A necessary condition for the successful execution of function reset is: the hardware pipeline needs to empty the instructions that were not completed before; 2. A local invalidate instruction at the head of the local queue is sent to the execution queue. Now there are two instructions in the execution queue, the first is the function reset instruction, and the second is the local invalidate instruction, which will be executed in se quence; 3. The user has issued many local invalidate operations, causing the local queue to be filled up. 4. The user still has a new local operation command and is queuing to enter the local queue. But the local queue is full and cannot receive new instructions, this instruction is temporarily stored at the hardware pipeline. 5. The function reset has been waiting for the instruction before the hardware pipeline stage is drained. The hardware pipeline stage also caches a local invalidate instruction, so the function reset cannot be completed, and the instructions after it cannot be executed. These factors together cause the execution logic deadlock of the hardware, and the consequence is that RoCEE will not have any response. Considering that the local operation command may potentially cause RoCEE to hang, this feature is no longer supported. Fixes: e93df0108579 ("RDMA/hns: Support local invalidate for hip08 in kernel space") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Link: https://lore.kernel.org/r/20221024083814.1089722-2-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/rxe: Remove the member 'type' of struct rxe_mryangx.jy@fujitsu.com
The member 'type' is included in both struct rxe_mr and struct ib_mr so remove the duplicate one of struct rxe_mr. Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Link: https://lore.kernel.org/r/20221021134513.17730-1-yangx.jy@fujitsu.com Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>