summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)Author
2022-12-07RDMA/hns: Fix error code of CMDChengchang Tang
The error code is fixed to EIO when CMD fails to excute. This patch converts the error status reported by firmware to linux errno. Fixes: a04ff739f2a9 ("RDMA/hns: Add command queue support for hip08 RoCE driver") Link: https://lore.kernel.org/r/20221126102911.2921820-6-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix page size cap from firmwareChengchang Tang
Add verification to make sure the roce page size cap is supported by the system page size. Fixes: ba6bb7e97421 ("RDMA/hns: Add interfaces to get pf capabilities from firmware") Link: https://lore.kernel.org/r/20221126102911.2921820-5-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix PBL page MTR findChengchang Tang
Now, The address of the first two pages in the MR will be searched, which use to speed up the lookup of the pbl table for hardware. An exception will occur when there is only one page in this MR. This patch fix the number of page to search. Fixes: 9b2cf76c9f05 ("RDMA/hns: Optimize PBL buffer allocation process") Link: https://lore.kernel.org/r/20221126102911.2921820-4-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix AH attr queried by query_qpChengchang Tang
The queried AH attr is invalid. This patch fix it. This problem is found by rdma-core test test_mr_rereg_pd ERROR: test_mr_rereg_pd (tests.test_mr.MRTest) Test that cover rereg MR's PD with this flow: ---------------------------------------------------------------------- Traceback (most recent call last): File "./tests/test_mr.py", line 157, in test_mr_rereg_pd self.restate_qps() File "./tests/test_mr.py", line 113, in restate_qps self.server.qp.to_rts(self.server_qp_attr) File "qp.pyx", line 1137, in pyverbs.qp.QP.to_rts File "qp.pyx", line 1123, in pyverbs.qp.QP.to_rtr pyverbs.pyverbs_error.PyverbsRDMAError: Failed to modify QP state to RTR. Errno: 22, Invalid argument Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/20221126102911.2921820-3-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix the gid problem caused by free mrYixing Liu
After the hns roce driver is loaded, if you modify the mac address of the network port, the following error will appear: __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fe22:abb5 error=-28 hns3 0000:7d:00.0 hns_0: attr path_mtu(1) invalid while modify qp The reason for the error is that the gid being occupied will cause the failure to modify the gid. The gid is occupied by the loopback QP used by free mr. When the mac address is modified, the gid will change. If there is a busy QP at this time, the gid will not be released and the modification will fail. The QP of free mr is created using the ib interface. The ib interface will add a reference count to the gid, resulting in this error scenario. Considering that free mr is solving a bug in HIP08, not an actual business, it is not necessary to use ib interfaces. Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Link: https://lore.kernel.org/r/20221126102911.2921820-2-xuhaoyue1@hisilicon.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hfi1: use sysfs_emit() to instead of scnprintf()ye xingchen
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202212071632188074249@zte.com.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA/mlx5: no need to kfree NULL pointerLi Zhijian
Goto label 'free' where it will kfree the 'in' is not needed though it's safe to kfree NULL. Return err code directly to simplify the code. 1973 free: 1974 kfree(in); 1975 return err; Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20221203033714.25870-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA/hfi1: Fix error return code in parse_platform_config()Wang Yufen
In the previous iteration of the while loop, the "ret" may have been assigned a value of 0, so the error return code -EINVAL may have been incorrectly set to 0. To fix set valid return code before calling to goto. Fixes: 97167e813415 ("staging/rdma/hfi1: Tune for unknown channel if configuration file is absent") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/1669953638-11747-1-git-send-email-wangyufen@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-01IB/hfi1: Switch to netif_napi_add()Yang Yang
There is no need to use netif_napi_add_weight() when the weight argument is 64. See "net: drop the weight argument from netif_napi_add". Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Link: https://lore.kernel.org/r/202211301744378304494@zte.com.cn Reviewed-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-30RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usageDavid Hildenbrand
FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages using unpin_user_pages_dirty_lock(true), the assumption is that all these pages are writable. FOLL_FORCE in this case seems to be a legacy leftover. Let's just remove it. Link: https://lkml.kernel.org/r/20221116102659.70287-19-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30RDMA/usnic: remove FOLL_FORCE usageDavid Hildenbrand
GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Link: https://lkml.kernel.org/r/20221116102659.70287-12-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Christian Benvenuti <benve@cisco.com> Cc: Nelson Escobar <neescoba@cisco.com> Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-29net/mlx5: Generalize name of UMR alignment definitionTariq Toukan
Per the device spec, MLX5_UMR_MTT_ALIGNMENT is good not only for UMR MTT entries, but for all other entries as well, like KLMs and KSMs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-29RDMA/mlx4: Remove NULL check before dev_{put, hold}zhang songyi
The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}. Fix the following coccicheck warnings: /drivers/infiniband/hw/mlx4/main.c:1311:2-10: WARNING: WARNING NULL check before dev_{put, hold} functions is not needed. /drivers/infiniband/hw/mlx4/main.c:148:2-10: WARNING: WARNING NULL check before dev_{put, hold} functions is not needed. /drivers/infiniband/hw/mlx4/main.c:1959:3-11: WARNING: WARNING NULL check before dev_{put, hold} functions is not needed. /drivers/infiniband/hw/mlx4/main.c:1962:3-10: WARNING: WARNING NULL check before dev_{put, hold} functions is not needed. Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn> Link: https://lore.kernel.org/r/202211291554079687539@zte.com.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-24RDMA/erdma: Notify the latest PI to FW for reflushing when necessaryCheng Xu
Firmware is responsible for flushing WRs in HW, and it's a little difficult for firmware to get the latest PI of QPs, especially for RQs after QP state being changed to ERROR. So we introduce a new CMDQ command, by which driver can notify to latest PI to FW, and then FW can flush all posted WRs. Link: https://lore.kernel.org/r/20221116023107.82835-4-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-24RDMA/erdma: Implement the lifecycle of reflushing work for each QPCheng Xu
Each QP has a work for reflushing purpose. In the work, driver will report the latest pi to hardware. Link: https://lore.kernel.org/r/20221116023107.82835-3-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-24RDMA/erdma: Add a workqueue for WRs reflushingCheng Xu
ERDMA driver use a workqueue for asynchronous reflush command posting. Implement the lifecycle of this workqueue. Link: https://lore.kernel.org/r/20221116023107.82835-2-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-24RDMA/erdma: Fix a typo in annotationCheng Xu
A non-ASCII character was wrongly put in a comment, use the ACSII version. Fixes: bee85e0e31ec ("RDMA/erdma: Add main include file") Link: https://lore.kernel.org/r/20221124114933.77250-1-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-24driver core: make struct class.devnode() take a const *Greg Kroah-Hartman
The devnode() in struct class should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: Justin Sanders <justin@coraid.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> Cc: Liam Mark <lmark@codeaurora.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Brian Starkey <Brian.Starkey@arm.com> Cc: John Stultz <jstultz@google.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sean Young <sean@mess.org> Cc: Frank Haverkamp <haver@linux.ibm.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Xie Yongji <xieyongji@bytedance.com> Cc: Gautam Dawar <gautam.dawar@xilinx.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Eli Cohen <elic@nvidia.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: alsa-devel@alsa-project.org Cc: dri-devel@lists.freedesktop.org Cc: kvm@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-block@vger.kernel.org Cc: linux-input@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20221123122523.1332370-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-22RDMA/hns: fix memory leak in hns_roce_alloc_mr()Zhengchao Shao
When hns_roce_mr_enable() failed in hns_roce_alloc_mr(), mr_key is not released. Compiled test only. Fixes: 9b2cf76c9f05 ("RDMA/hns: Optimize PBL buffer allocation process") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20221119070834.48502-1-shaozhengchao@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-22RDMA/irdma: Initialize net_type before checking itMustafa Ismail
The av->net_type is not initialized before it is checked in irdma_modify_qp_roce. This leads to an incorrect update to the ARP cache and QP context. RoCEv2 connections might fail as result. Set the net_type using rdma_gid_attr_network_type. Fixes: 80005c43d4c8 ("RDMA/irdma: Use net_type to check network type") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221122004410.1471-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-22RDMA/hfi: Decrease PCI device reference count in error pathXiongfeng Wang
pci_get_device() will increase the reference count for the returned pci_dev, and also decrease the reference count for the input parameter *from* if it is not NULL. If we break out the loop in node_affinity_init() with 'dev' not NULL, we need to call pci_dev_put() to decrease the reference count. Add missing pci_dev_put() in error path. Fixes: c513de490f80 ("IB/hfi1: Invalid NUMA node information can cause a divide by zero") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221117131546.113280-1-wangxiongfeng2@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-21RDMA/qib: don't pass bogus GFP_ flags to dma_alloc_coherentChristoph Hellwig
dma_alloc_coherent is an opaque allocator that only uses the GFP_ flags for allocation context control. Don't pass GFP_USER which doesn't make sense for a kernel DMA allocation or __GFP_COMP which makes no sense for an allocation that can't in any way be converted to a page pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-21RDMA/hfi1: don't pass bogus GFP_ flags to dma_alloc_coherentChristoph Hellwig
dma_alloc_coherent is an opaque allocator that only uses the GFP_ flags for allocation context control. Don't pass GFP_USER which doesn't make sense for a kernel DMA allocation or __GFP_COMP which makes no sense for an allocation that can't in any way be converted to a page pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Dean Luick <dean.luick@cornelisnetworks.com> Tested-by: Dean Luick <dean.luick@cornelisnetworks.com>
2022-11-18RDMA/hns: Fix incorrect sge nums calculationLuoyouming
The user usually configures the number of sge through the max_send_sge parameter when creating qp, and configures the maximum size of inline data that can be sent through max_inline_data. Inline uses sge to fill data to send. Expect the following: 1) When the sge space cannot hold inline data, the sge space needs to be expanded to accommodate all inline data 2) When the sge space is enough to accommodate inline data, the upper limit of inline data can be increased so that users can send larger inline data Currently case one is not implemented. When the inline data is larger than the sge space, an error of insufficient sge space occurs. This part of the code needs to be reimplemented according to the expected rules. The calculation method of sge num is modified to take the maximum value of max_send_sge and the sge for max_inline_data to solve this problem. Fixes: 05201e01be93 ("RDMA/hns: Refactor process of setting extended sge") Fixes: 30b707886aeb ("RDMA/hns: Support inline data in extented sge space for RC") Link: https://lore.kernel.org/r/20221108133847.2304539-3-xuhaoyue1@hisilicon.com Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18RDMA/hns: Fix ext_sge num error when post sendLuoyouming
In the HNS ROCE driver, The sge is divided into standard sge and extended sge. There are 2 standard sge in RC/XRC, and the UD standard sge is 0. In the scenario of RC SQ inline, if the data does not exceed 32bytes, the standard sge will be used. If it exceeds, only the extended sge will be used to fill the data. Currently, when filling the extended sge, max_gs is directly used as the number of the extended sge, which did not subtract the number of standard sge. There is a logical error. The new algorithm subtracts the number of standard sge from max_gs to get the actual number of extended sge. Fixes: 30b707886aeb ("RDMA/hns: Support inline data in extented sge space for RC") Link: https://lore.kernel.org/r/20221108133847.2304539-2-xuhaoyue1@hisilicon.com Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18IB/hfi1: Replace 1-element array with singletonKees Cook
Zero-length arrays are deprecated[1] and are being replaced with flexible array members in support of the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3. Replace zero-length array with flexible-array member "lvs" in struct opa_port_data_counters_msg and struct opa_port_error_counters64_msg. Additionally, the "port" member of several structs is defined as a single-element, but is only ever accessed at index 0. Replace it with a singleton so that flexible array usage is sane. This results in no differences in binary output. [1] https://github.com/KSPP/linux/issues/78 Link: https://lore.kernel.org/r/20221118215847.never.416-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18treewide: use get_random_u32_inclusive() when possibleJason A. Donenfeld
These cases were done with this Coccinelle: @@ expression H; expression L; @@ - (get_random_u32_below(H) + L) + get_random_u32_inclusive(L, H + L - 1) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - + E - - E ) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - - E - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - - E + F - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - + E + F - - E ) And then subsequently cleaned up by hand, with several automatic cases rejected if it didn't make sense contextually. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18treewide: use get_random_u32_below() instead of deprecated functionJason A. Donenfeld
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-17RDMA/irdma: Do not request 2-level PBLEs for CQ allocMustafa Ismail
When allocating PBLE's for a large CQ, it is possible that a 2-level PBLE is returned which would cause the CQ allocation to fail since 1-level is assumed and checked for. Fix this by requesting a level one PBLE only. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-4-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/irdma: Fix RQ completion opcodeMustafa Ismail
The opcode written by HW, in the RQ CQE, is the RoCEv2/iWARP protocol opcode from the received packet and not the SW opcode as currently assumed. Fix this by returning the raw operation type and queue type in the CQE to irdma_process_cqe and add 2 helpers set_ib_wc_op_sq set_ib_wc_op_rq to map IRDMA HW op types to IB op types. Note that for iWARP, only Write with Immediate is supported so the opcode can only be IB_WC_RECV_RDMA_WITH_IMM when there is immediate data present. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17RDMA/irdma: Fix inline for multiple SGE'sMustafa Ismail
Currently, inline send and inline write assume a single SGE and only copy data from the first one. Add support for multiple SGE's. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221115011701.1379-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-14RDMA/mana: Remove redefinition of basic u64 typeLeon Romanovsky
gdma_obj_handle_t is no more than redefinition of basic u64 type. Remove such obfuscation. Link: https://lore.kernel.org/r/3c1e821279e6a165d058655d2343722d6650e776.1668160486.git.leonro@nvidia.com Acked-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-11RDMA/mana_ib: Add a driver for Microsoft Azure Network AdapterLong Li
Add a RDMA VF driver for Microsoft Azure Network Adapter (MANA). Co-developed-by: Ajay Sharma <sharmaajay@microsoft.com> Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-13-git-send-email-longli@linuxonhyperv.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/erdma: Implement atomic operations supportCheng Xu
Add atomic operations support in post_send and poll_cq implementation. Also, rename 'laddr' and 'lkey' in struct erdma_sge to 'addr' and 'key', because this structure is used for both local and remote SGEs. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20221107021845.44598-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/erdma: Report atomic capacity when hardware supports atomic featureCheng Xu
Introduce "capacity flags" field at where hardware put all zeros originally in "query device" response. Using this field, hardware can report atomic feature if supports. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20221107021845.44598-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/erdma: Extend access right field of FRMR and REG MR to support atomicCheng Xu
To support atomic operations, IB_ACCESS_REMOTE_ATOMIC right should be passed to hardware for permission check. Since "access mode" field in FRMR SQE and RegMr command is never used by hw, we remove the "access mode" field, so that we can then have enough space to extend access fields. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20221107021845.44598-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-07RDMA/irdma: Report the correct link speedShiraz Saleem
The active link speed is currently hard-coded in irdma_query_port due to which the port rate in ibstatus does reflect the active link speed. Call ib_get_eth_speed in irdma_query_port to get the active link speed. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Reported-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20221104234957.1135-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-06RDMA/mlx5: Change debug log level for remote access error syndromesArumugam Kolappan
The mlx5 driver dumps the entire CQE buffer by default for few syndromes. Some syndromes are expected due to the application behavior [ex: MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR, MLX5_CQE_SYNDROME_REMOTE_OP_ERR and MLX5_CQE_SYNDROME_LOCAL_PROT_ERR]. Hence, for these syndromes, the patch converts the log level from KERN_WARNING to KERN_DEBUG. This enables the application to get the CQE buffer dump by changing to KERN_DEBUG level as and when needed. Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com> Link: https://lore.kernel.org/r/1667287664-19377-1-git-send-email-aru.kolappan@oracle.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-28RDMA/qedr: clean up work queue on failure in qedr_alloc_resources()Dan Carpenter
Add a check for if create_singlethread_workqueue() fails and also destroy the work queue on failure paths. Fixes: e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Y1gBkDucQhhWj5YM@kili Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-24RDMA/hns: Fix NULL pointer problem in free_mr_init()Yixing Liu
Lock grab occurs in a concurrent scenario, resulting in stepping on a NULL pointer. It should be init mutex_init() first before use the lock. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: __mutex_lock.constprop.0+0xd0/0x5c0 __mutex_lock_slowpath+0x1c/0x2c mutex_lock+0x44/0x50 free_mr_send_cmd_to_hw+0x7c/0x1c0 [hns_roce_hw_v2] hns_roce_v2_dereg_mr+0x30/0x40 [hns_roce_hw_v2] hns_roce_dereg_mr+0x4c/0x130 [hns_roce_hw_v2] ib_dereg_mr_user+0x54/0x124 uverbs_free_mr+0x24/0x30 destroy_hw_idr_uobject+0x38/0x74 uverbs_destroy_uobject+0x48/0x1c4 uobj_destroy+0x74/0xcc ib_uverbs_cmd_verbs+0x368/0xbb0 ib_uverbs_ioctl+0xec/0x1a4 __arm64_sys_ioctl+0xb4/0x100 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x58/0x190 do_el0_svc+0x30/0x90 el0_svc+0x2c/0xb4 el0t_64_sync_handler+0x1a4/0x1b0 el0t_64_sync+0x19c/0x1a0 Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Link: https://lore.kernel.org/r/20221024083814.1089722-3-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/hns: Disable local invalidate operationYangyang Li
When function reset and local invalidate are mixed, HNS RoCEE may hang. Before introducing the cause of the problem, two hardware internal concepts need to be introduced: 1. Execution queue: The queue of hardware execution instructions, function reset and local invalidate are queued for execution in this queue. 2.Local queue: A queue that stores local operation instructions. The instructions in the local queue will be sent to the execution queue for execution. The instructions in the local queue will not be removed until the execution is completed. The reason for the problem is as follows: 1. There is a function reset instruction in the execution queue, which is currently being executed. A necessary condition for the successful execution of function reset is: the hardware pipeline needs to empty the instructions that were not completed before; 2. A local invalidate instruction at the head of the local queue is sent to the execution queue. Now there are two instructions in the execution queue, the first is the function reset instruction, and the second is the local invalidate instruction, which will be executed in se quence; 3. The user has issued many local invalidate operations, causing the local queue to be filled up. 4. The user still has a new local operation command and is queuing to enter the local queue. But the local queue is full and cannot receive new instructions, this instruction is temporarily stored at the hardware pipeline. 5. The function reset has been waiting for the instruction before the hardware pipeline stage is drained. The hardware pipeline stage also caches a local invalidate instruction, so the function reset cannot be completed, and the instructions after it cannot be executed. These factors together cause the execution logic deadlock of the hardware, and the consequence is that RoCEE will not have any response. Considering that the local operation command may potentially cause RoCEE to hang, this feature is no longer supported. Fixes: e93df0108579 ("RDMA/hns: Support local invalidate for hip08 in kernel space") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Link: https://lore.kernel.org/r/20221024083814.1089722-2-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/qedr: fix repeated words in commentswangjianli
Delete the redundant word 'the'. Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Link: https://lore.kernel.org/r/20221022055257.42905-1-wangjianli@cdjrlc.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/qib: fix repeated words in commentswangjianli
Delete the redundant word 'the'. Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Link: https://lore.kernel.org/r/20221022055905.49176-1-wangjianli@cdjrlc.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/qib: Remove not-used variable freeze_cntColin Ian King
The variable freeze_cnt being incremented but it is never referenced, it is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20221021173504.27546-1-colin.i.king@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/qib: Remove not-used variable nColin Ian King
The variable n being incremented but it is never referenced, it is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20221021172611.26763-1-colin.i.king@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/efa: Add EFA 0xefa2 PCI IDMichael Margolin
Add support for 0xefa2 devices. Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20221020151949.1768-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-19IB/hfi1: Correctly move list in sc_disable()Dean Luick
Commit 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") incorrectly tries to move a list from one list head to another. The result is a kernel crash. The crash is triggered when a link goes down and there are waiters for a send to complete. The following signature is seen: BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] Call Trace: sc_disable+0x1ba/0x240 [hfi1] pio_freeze+0x3d/0x60 [hfi1] handle_freeze+0x27/0x1b0 [hfi1] process_one_work+0x1b0/0x380 ? process_one_work+0x380/0x380 worker_thread+0x30/0x360 ? process_one_work+0x380/0x380 kthread+0xd7/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 The fix is to use the correct call to move the list. Fixes: 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/166610327042.674422.6146908799669288976.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Not a big list of changes this cycle, mostly small things. The new MANA rdma driver should come next cycle along with a bunch of work on rxe. Summary: - Small bug fixes in mlx5, efa, rxe, hns, irdma, erdma, siw - rts tracing improvements - Code improvements: strlscpy conversion, unused parameter, spelling mistakes, unused variables, flex arrays - restrack device details report for hns - Simplify struct device initialization in SRP - Eliminate the never-used service_mask support in IB CM - Make rxe not print to the console for some kinds of network packets - Asymetric paths and router support in the CM through netlink messages - DMABUF importer support for mlx5devx umem's" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (84 commits) RDMA/rxe: Remove error/warning messages from packet receiver path RDMA/usnic: fix set-but-not-unused variable 'flags' warning IB/hfi1: Use skb_put_data() instead of skb_put/memcpy pair RDMA/hns: Unified Log Printing Style RDMA/hns: Replacing magic number with macros in apply_func_caps() RDMA/hns: Repacing 'dseg_len' by macros in fill_ext_sge_inl_data() RDMA/hns: Remove redundant 'max_srq_desc_sz' in caps RDMA/hns: Remove redundant 'num_mtt_segs' and 'max_extend_sg' RDMA/hns: Remove redundant 'phy_addr' in hns_roce_hem_list_find_mtt() RDMA/hns: Remove redundant 'use_lowmem' argument from hns_roce_init_hem_table() RDMA/hns: Remove redundant 'bt_level' for hem_list_alloc_item() RDMA/hns: Remove redundant 'attr_mask' in modify_qp_init_to_init() RDMA/hns: Remove unnecessary brackets when getting point RDMA/hns: Remove unnecessary braces for single statement blocks RDMA/hns: Cleanup for a spelling error of Asynchronous IB/rdmavt: Add __init/__exit annotations to module init/exit funcs RDMA/rxe: Remove redundant num_sge fields RDMA/mlx5: Enable ATS support for MRs and umems RDMA/mlx5: Add support for dmabuf to devx umem RDMA/core: Add UVERBS_ATTR_RAW_FD ...