summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/hfi1
AgeCommit message (Collapse)Author
2019-10-17IB/hfi1: Use a common pad buffer for 9B and 16B packetsMike Marciniszyn
There is no reason for a different pad buffer for the two packet types. Expand the current buffer allocation to allow for both packet types. Fixes: f8195f3b14a0 ("IB/hfi1: Eliminate allocation while atomic") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17IB/hfi1: Avoid excessive retry for TID RDMA READ requestKaike Wan
A TID RDMA READ request could be retried under one of the following conditions: - The RC retry timer expires; - A later TID RDMA READ RESP packet is received before the next expected one. For the latter, under normal conditions, the PSN in IB space is used for comparison. More specifically, the IB PSN in the incoming TID RDMA READ RESP packet is compared with the last IB PSN of a given TID RDMA READ request to determine if the request should be retried. This is similar to the retry logic for noraml RDMA READ request. However, if a TID RDMA READ RESP packet is lost due to congestion, header suppresion will be disabled and each incoming packet will raise an interrupt until the hardware flow is reloaded. Under this condition, each packet KDETH PSN will be checked by software against r_next_psn and a retry will be requested if the packet KDETH PSN is later than r_next_psn. Since each TID RDMA READ segment could have up to 64 packets and each TID RDMA READ request could have many segments, we could make far more retries under such conditions, and thus leading to RETRY_EXC_ERR status. This patch fixes the issue by removing the retry when the incoming packet KDETH PSN is later than r_next_psn. Instead, it resorts to RC timer and normal IB PSN comparison for any request retry. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-01RDMA/hfi1: Prevent memory leak in sdma_initNavid Emamdoost
In sdma_init if rhashtable_init fails the allocated memory for tmp_sdma_rht should be released. Fixes: 5a52a7acf7e2 ("IB/hfi1: NULL pointer dereference when freeing rhashtable") Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-26IB/hfi1: remove unlikely() from IS_ERR*() conditionDenis Efremov
"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses unlikely() internally. Link: http://lkml.kernel.org/r/20190829165025.15750-8-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Joe Perches <joe@perches.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24mm/gup: add make_dirty arg to put_user_pages_dirty_lock()akpm@linux-foundation.org
[11~From: John Hubbard <jhubbard@nvidia.com> Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock() Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()", v3. There are about 50+ patches in my tree [2], and I'll be sending out the remaining ones in a few more groups: * The block/bio related changes (Jerome mostly wrote those, but I've had to move stuff around extensively, and add a little code) * mm/ changes * other subsystem patches * an RFC that shows the current state of the tracking patch set. That can only be applied after all call sites are converted, but it's good to get an early look at it. This is part a tree-wide conversion, as described in fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). This patch (of 3): Provide more capable variation of put_user_pages_dirty_lock(), and delete put_user_pages_dirty(). This is based on the following: 1. Lots of call sites become simpler if a bool is passed into put_user_page*(), instead of making the call site choose which put_user_page*() variant to call. 2. Christoph Hellwig's observation that set_page_dirty_lock() is usually correct, and set_page_dirty() is usually a bug, or at least questionable, within a put_user_page*() calling chain. This leads to the following API choices: * put_user_pages_dirty_lock(page, npages, make_dirty) * There is no put_user_pages_dirty(). You have to hand code that, in the rare case that it's required. [jhubbard@nvidia.com: remove unused variable in siw_free_plist()] Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull RDMA subsystem updates from Jason Gunthorpe: "This cycle mainly saw lots of bug fixes and clean up code across the core code and several drivers, few new functional changes were made. - Many cleanup and bug fixes for hns - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed, bnxt_re, efa - Share the query_port code between all the iWarp drivers - General rework and cleanup of the ODP MR umem code to fit better with the mmu notifier get/put scheme - Support rdma netlink in non init_net name spaces - mlx5 support for XRC devx and DC ODP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA: Fix double-free in srq creation error flow RDMA/efa: Fix incorrect error print IB/mlx5: Free mpi in mp_slave mode IB/mlx5: Use the original address for the page during free_pages RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp" RDMA/hns: Package operations of rq inline buffer into separate functions RDMA/hns: Optimize cmd init and mode selection for hip08 IB/hfi1: Define variables as unsigned long to fix KASAN warning IB/{rdmavt, hfi1, qib}: Add a counter for credit waits IB/hfi1: Add traces for TID RDMA READ RDMA/siw: Relax from kmap_atomic() use in TX path IB/iser: Support up to 16MB data transfer in a single command RDMA/siw: Fix page address mapping in TX path RDMA: Fix goto target to release the allocated memory RDMA/usnic: Avoid overly large buffers on stack RDMA/odp: Add missing cast for 32 bit RDMA/hns: Use devm_platform_ioremap_resource() to simplify code Documentation/infiniband: update name of some functions RDMA/cma: Fix false error message RDMA/hns: Fix wrong assignment of qp_access_flags ...
2019-09-13IB/hfi1: Define variables as unsigned long to fix KASAN warningIra Weiny
Define the working variables to be unsigned long to be compatible with for_each_set_bit and change types as needed. While we are at it remove unused variables from a couple of functions. This was found because of the following KASAN warning: ================================================================== BUG: KASAN: stack-out-of-bounds in find_first_bit+0x19/0x70 Read of size 8 at addr ffff888362d778d0 by task kworker/u308:2/1889 CPU: 21 PID: 1889 Comm: kworker/u308:2 Tainted: G W 5.3.0-rc2-mm1+ #2 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.02.04.0003.102320141138 10/23/2014 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] Call Trace: dump_stack+0x9a/0xf0 ? find_first_bit+0x19/0x70 print_address_description+0x6c/0x332 ? find_first_bit+0x19/0x70 ? find_first_bit+0x19/0x70 __kasan_report.cold.6+0x1a/0x3b ? find_first_bit+0x19/0x70 kasan_report+0xe/0x12 find_first_bit+0x19/0x70 pma_get_opa_portstatus+0x5cc/0xa80 [hfi1] ? ret_from_fork+0x3a/0x50 ? pma_get_opa_port_ectrs+0x200/0x200 [hfi1] ? stack_trace_consume_entry+0x80/0x80 hfi1_process_mad+0x39b/0x26c0 [hfi1] ? __lock_acquire+0x65e/0x21b0 ? clear_linkup_counters+0xb0/0xb0 [hfi1] ? check_chain_key+0x1d7/0x2e0 ? lock_downgrade+0x3a0/0x3a0 ? match_held_lock+0x2e/0x250 ib_mad_recv_done+0x698/0x15e0 [ib_core] ? clear_linkup_counters+0xb0/0xb0 [hfi1] ? ib_mad_send_done+0xc80/0xc80 [ib_core] ? mark_held_locks+0x79/0xa0 ? _raw_spin_unlock_irqrestore+0x44/0x60 ? rvt_poll_cq+0x1e1/0x340 [rdmavt] __ib_process_cq+0x97/0x100 [ib_core] ib_cq_poll_work+0x31/0xb0 [ib_core] process_one_work+0x4ee/0xa00 ? pwq_dec_nr_in_flight+0x110/0x110 ? do_raw_spin_lock+0x113/0x1d0 worker_thread+0x57/0x5a0 ? process_one_work+0xa00/0xa00 kthread+0x1bb/0x1e0 ? kthread_create_on_node+0xc0/0xc0 ret_from_fork+0x3a/0x50 The buggy address belongs to the page: page:ffffea000d8b5dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x17ffffc0000000() raw: 0017ffffc0000000 0000000000000000 ffffea000d8b5dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected addr ffff888362d778d0 is located in stack of task kworker/u308:2/1889 at offset 32 in frame: pma_get_opa_portstatus+0x0/0xa80 [hfi1] this frame has 1 object: [32, 36) 'vl_select_mask' Memory state around the buggy address: ffff888362d77780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888362d77800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888362d77880: 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2 00 00 ^ ffff888362d77900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888362d77980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2 ================================================================== Cc: <stable@vger.kernel.org> Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20190911113053.126040.47327.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13IB/{rdmavt, hfi1, qib}: Add a counter for credit waitsKaike Wan
This patch adds a counter for credit waits to assist field debugging. Link: https://lore.kernel.org/r/20190911113047.126040.10857.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13IB/hfi1: Add traces for TID RDMA READKaike Wan
This patch adds traces to debug packet loss and retry for TID RDMA READ protocol. Link: https://lore.kernel.org/r/20190911113041.126040.64541.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13Merge tag 'v5.3-rc8' into rdma.git for-nextJason Gunthorpe
To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor conflict in r8169, bug fix had two versions in net and net-next, take the net-next hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21Merge branch 'odp_fixes' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-20infiniband: hfi1: fix memory leaksWenwen Wang
In fault_opcodes_write(), 'data' is allocated through kcalloc(). However, it is not deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, introduce the 'free_data' label to free 'data' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/1566154486-3713-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20infiniband: hfi1: fix a memory leak bugWenwen Wang
In fault_opcodes_read(), 'data' is not deallocated if debugfs_file_get() fails, leading to a memory leak. To fix this bug, introduce the 'free_data' label to free 'data' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/1566156571-4335-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/hfi1: Drop stale TID RDMA packets that cause TIDErrKaike Wan
In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order. A stale TID RDMA data packet could lead to TidErr if the TID entries have been released by duplicate data packets generated from retries, and subsequently erroneously force the qp into error state in the current implementation. Since the payload has already been dropped by hardware, the packet can be simply dropped and it is no longer necessary to put the qp into error state. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192058.105923.72324.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packetKaike Wan
In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA WRITE DATA packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: d72fe7d5008b ("IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192051.105923.69979.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/hfi1: Add additional checks when handling TID RDMA READ RESP packetKaike Wan
In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA READ RESP packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192045.105923.59813.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packetKaike Wan
When processing a TID RDMA READ RESP packet that causes KDETH EFLAGS errors, the packet's IB PSN is checked against qp->s_last_psn and qp->s_psn without the protection of qp->s_lock, which is not safe. This patch fixes the issue by acquiring qp->s_lock first. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192039.105923.7852.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/hfi1: Drop stale TID RDMA packetsKaike Wan
In a congested fabric with adaptive routing enabled, traces show that the sender could receive stale TID RDMA NAK packets that contain newer KDETH PSNs and older Verbs PSNs. If not dropped, these packets could cause the incorrect rewinding of the software flows and the incorrect completion of TID RDMA WRITE requests, and eventually leading to memory corruption and kernel crash. The current code drops stale TID RDMA ACK/NAK packets solely based on KDETH PSNs, which may lead to erroneous processing. This patch fixes the issue by also checking the Verbs PSN. Addition checks are added before rewinding the TID RDMA WRITE DATA packets. Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192033.105923.44192.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01IB/hfi1: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
sl is controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. Fix this by sanitizing sl before using it to index ibp->sl_to_sc. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/ Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20190731175428.GA16736@embeddedor Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-30net: Use skb_frag_off accessorsJonathan Lemon
Use accessor functions for skb fragment's page_offset instead of direct references, in preparation for bvec conversion. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29Merge branch 'wip/dl-for-rc' into wip/dl-for-nextDoug Ledford
The fix for IB port statistics initialization ("IB/core: Fix querying total rdma stats") is needed before we take a follow-on patch to for-next. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-22net: Use skb accessors in network driversMatthew Wilcox (Oracle)
In preparation for unifying the skb_frag and bio_vec, use the fine accessors which already exist and use skb_frag_t instead of struct skb_frag_struct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22IB/hfi1: Remove unused defineMike Marciniszyn
The patch noted in Fixes missed deleting the define it obsoleted. Fixes: da9de5f8527f ("IB/hfi1: Close PSM sdma_progress sleep window") Link: https://lore.kernel.org/r/20190715164552.74174.99396.stgit@awfm-01.aw.intel.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-22IB/hfi1: Do not update hcrc for a KDETH packet during fault injectionKaike Wan
When a KDETH packet is subject to fault injection during transmission, HCRC is supposed to be omitted from the packet so that the hardware on the receiver side would drop the packet. When creating pbc, the PbcInsertHcrc field is set to be PBC_IHCRC_NONE if the KDETH packet is subject to fault injection, but overwritten with PBC_IHCRC_LKDETH when update_hcrc() is called later. This problem is fixed by not calling update_hcrc() when the packet is subject to fault injection. Fixes: 6b6cf9357f78 ("IB/hfi1: Set PbcInsertHcrc for TID RDMA packets") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190715164546.74174.99296.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-22IB/hfi1: Drop all TID RDMA READ RESP packets after r_next_psnKaike Wan
When a TID sequence error occurs while receiving TID RDMA READ RESP packets, all packets after flow->flow_state.r_next_psn should be dropped, including those response packets for subsequent segments. The current implementation will drop the subsequent response packets for the segment to complete next, but may accept packets for subsequent segments and therefore mistakenly advance the r_next_psn fields for the corresponding software flows. This may result in failures to complete subsequent segments after the current segment is completed. The fix is to only use the flow pointed by req->clear_tail for checking KDETH PSN instead of finding a flow from the request's flow array. Fixes: b885d5be9ca1 ("IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190715164540.74174.54702.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-22IB/hfi1: Field not zero-ed when allocating TID flow memoryKaike Wan
The field flow->resync_npkts is added for TID RDMA WRITE request and zero-ed when a TID RDMA WRITE RESP packet is received by the requester. This field is used to rewind a request during retry in the function hfi1_tid_rdma_restart_req() shared by both TID RDMA WRITE and TID RDMA READ requests. Therefore, when a TID RDMA READ request is retried, this field may not be initialized at all, which causes the retry to start at an incorrect psn, leading to the drop of the retry request by the responder. This patch fixes the problem by zeroing out the field when the flow memory is allocated. Fixes: 838b6fd2d9ca ("IB/hfi1: TID RDMA RcvArray programming and TID allocation") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190715164534.74174.6177.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-22IB/hfi1: Unreserve a flushed OPFN requestKaike Wan
When an OPFN request is flushed, the request is completed without unreserving itself from the send queue. Subsequently, when a new request is post sent, the following warning will be triggered: WARNING: CPU: 4 PID: 8130 at rdmavt/qp.c:1761 rvt_post_send+0x72a/0x880 [rdmavt] Call Trace: [<ffffffffbbb61e41>] dump_stack+0x19/0x1b [<ffffffffbb497688>] __warn+0xd8/0x100 [<ffffffffbb4977cd>] warn_slowpath_null+0x1d/0x20 [<ffffffffc01c941a>] rvt_post_send+0x72a/0x880 [rdmavt] [<ffffffffbb4dcabe>] ? account_entity_dequeue+0xae/0xd0 [<ffffffffbb61d645>] ? __kmalloc+0x55/0x230 [<ffffffffc04e1a4c>] ib_uverbs_post_send+0x37c/0x5d0 [ib_uverbs] [<ffffffffc04e5e36>] ? rdma_lookup_put_uobject+0x26/0x60 [ib_uverbs] [<ffffffffc04dbce6>] ib_uverbs_write+0x286/0x460 [ib_uverbs] [<ffffffffbb6f9457>] ? security_file_permission+0x27/0xa0 [<ffffffffbb641650>] vfs_write+0xc0/0x1f0 [<ffffffffbb64246f>] SyS_write+0x7f/0xf0 [<ffffffffbbb74ddb>] system_call_fastpath+0x22/0x27 This patch fixes the problem by moving rvt_qp_wqe_unreserve() into rvt_qp_complete_swqe() to simplify the code and make it less error-prone. Fixes: ca95f802ef51 ("IB/hfi1: Unreserve a reserved request when it is completed") Link: https://lore.kernel.org/r/20190715164528.74174.31364.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-22IB/hfi1: Check for error on call to alloc_rsm_map_tableJohn Fleck
The call to alloc_rsm_map_table does not check if the kmalloc fails. Check for a NULL on alloc, and bail if it fails. Fixes: 372cc85a13c9 ("IB/hfi1: Extract RSM map table init from QOS") Link: https://lore.kernel.org/r/20190715164521.74174.27047.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "A smaller cycle this time. Notably we see another new driver, 'Soft iWarp', and the deletion of an ancient unused driver for nes. - Revise and simplify the signature offload RDMA MR APIs - More progress on hoisting object allocation boiler plate code out of the drivers - Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib, i40iw - Tree wide cleanups: struct_size, put_user_page, xarray, rst doc conversion - Removal of obsolete ib_ucm chardev and nes driver - netlink based discovery of chardevs and autoloading of the modules providing them - Move more of the rdamvt/hfi1 uapi to include/uapi/rdma - New driver 'siw' for software based iWarp running on top of netdev, much like rxe's software RoCE. - mlx5 feature to report events in their raw devx format to userspace - Expose per-object counters through rdma tool - Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core from netdev" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (194 commits) RMDA/siw: Require a 64 bit arch RDMA/siw: Mark expected switch fall-throughs RDMA/core: Fix -Wunused-const-variable warnings rdma/siw: Remove set but not used variable 's' rdma/siw: Add missing dependencies on LIBCRC32C and DMA_VIRT_OPS RDMA/siw: Add missing rtnl_lock around access to ifa rdma/siw: Use proper enumerated type in map_cqe_status RDMA/siw: Remove unnecessary kthread create/destroy printouts IB/rdmavt: Fix variable shadowing issue in rvt_create_cq RDMA/core: Fix race when resolving IP address RDMA/core: Make rdma_counter.h compile stand alone IB/core: Work on the caller socket net namespace in nldev_newlink() RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM RDMA/mlx5: Set RDMA DIM to be enabled by default RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink RDMA/core: Provide RDMA DIM support for ULPs linux/dim: Implement RDMA adaptive moderation (DIM) IB/mlx5: Report correctly tag matching rendezvous capability docs: infiniband: add it to the driver-api bookset IB/mlx5: Implement VHCA tunnel mechanism in DEVX ...
2019-06-28IB/hfi1: No need to use try_module_get for debugfsDennis Dalessandro
The call in debugfs.c for try_module_get() is not needed. A reference to the module will be taken by the VFS layer as long as the owner field is set in the file ops struct. So set this as well as remove the call. Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Add missing INVALIDATE opcodes for traceMike Marciniszyn
This was missed in the original implementation of the memory management extensions. Fixes: 0db3dfa03c08 ("IB/hfi1: Work request processing for fast register mr and invalidate") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Reduce excessive aspm inlinesMichael J. Ruhl
Uninline the aspm API since it increases code space for no reason. Move the aspm module param to the new aspm C file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR detailsMichael J. Ruhl
Add some helper functions to hide struct rvt_swqe details. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPsMichael J. Ruhl
Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a non-zero reference count. IBTA 11.2.2 notes no such return value or error case: Output Modifiers: - Verb results: - Operation completed successfully. - Invalid HCA handle. - Invalid address handle. ULPs never test for this error and this will leak memory. The reference count exists to allow for driver independent progress mechanisms to process UD SWQEs in parallel with post sends. The SWQE will hold a reference count until the UD SWQE completes and then drops the reference. Fix by removing need to reference count the AH. Add a UD specific allocation to each SWQE entry to cache the necessary information for independent progress. Copy the information during the post send processing. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is fullKamenee Arumugam
When a completion queue is full, the associated queue pairs are not put into the error state. According to the IBTA specification, this is a violation. Quote from IBTA spec: C9-218: A Requester Class F error occurs when the CQ is inaccessible or full and an attempt is made to complete a WQE. The Affected QP shall be moved to the error state and affiliated asynchronous errors generated as described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The current WQE and any subsequent WQEs are left in an unknown state. C11-37: The CI shall generate a CQ Error when a CQ overrun is detected. This condition will result in an Affiliated Asynchronous Error for any associated Work Queues when they attempt to use that CQ. Completions can no longer be added to the CQ. It is not guaranteed that completions present in the CQ at the time the error occurred can be retrieved. Possible causes include a CQ overrun or a CQ protection error. Put the qp in error state when cq is full. Implement a state called full to continue to put other associated QPs in error state. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Move rvt_cq_wc struct into uapi directoryKamenee Arumugam
The rvt_cq_wc struct elements are shared between rdmavt and the providers but not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. In that case, move rvt_cq_wc struct into the rvt-abi.h header file and create a rvt_k_cq_w for the kernel completion queue. Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24Merge tag 'v5.2-rc6' into sched/core, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-20IB/{rdmavt, qib, hfi1}: Convert to new completion APIMike Marciniszyn
Convert all completions to use the new completion routine that fixes a race between post send and completion where fields from a SWQE can be read after SWQE has been freed. This patch also addresses issues reported in https://marc.info/?l=linux-kernel&m=155656897409107&w=2. The reserved operation path has no need for any barrier. The barrier for the other path is addressed by the smp_load_acquire() barrier. Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18IB/hfi1: Spelling s/statisfied/satisfied/Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17IB/hfi1: Handle port down properly in pioMike Marciniszyn
The call to sc_buffer_alloc currently returns NULL (no buffer) or a buffer descriptor. There is a third case when the port is down. Currently that returns NULL and this prevents the caller from properly handling the sc_buffer_alloc() failure. A verbs code link test after the call is racy so the indication needs to come from the state check inside the allocation routine to be valid. Fix by encoding the ECOMM failure like SDMA. IS_ERR_OR_NULL() tests are added at all call sites. For verbs send, this needs to treat any error by returning a completion without any MMIO copy. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17IB/hfi1: Handle wakeup of orphaned QPs for pioMike Marciniszyn
Once a send context is taken down due to a link failure, any QPs waiting for pio credits will stay on the waitlist indefinitely. Fix by wakeing up all QPs linked to piowait list. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17IB/hfi1: Wakeup QPs orphaned on wait list after flushMike Marciniszyn
Once an SDMA engine is taken down due to a link failure, any waiting QPs that do not have outstanding descriptors in the ring will stay on the dmawait list as long as the port is down. Since there is no timer running, they will stay there for a long time. The fix is to wake up all iowaits linked to dmawait. The send engine will build and post packets that get flushed back. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17IB/hfi1: Use aborts to trigger RC throttlingMike Marciniszyn
SDMA and pio flushes will cause a lot of packets to be transmitted after a link has gone down, using a lot of CPU to retransmit packets. Fix for RC QPs by recognizing the flush status and: - Forcing a timer start - Putting the QP into a "send one" mode Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17IB/hfi1: Create inline to get extended headersMike Marciniszyn
This paves the way for another patch that reacts to a flush sdma completion for RC. Fixes: 81cd3891f021 ("IB/hfi1: Add support for 16B Management Packets") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17IB/hfi1: Silence txreq allocation warningsMike Marciniszyn
The following warning can happen when a memory shortage occurs during txreq allocation: [10220.939246] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) [10220.939246] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016 [10220.939247] cache: mnt_cache, object size: 384, buffer size: 384, default order: 2, min order: 0 [10220.939260] Workqueue: hfi0_0 _hfi1_do_send [hfi1] [10220.939261] node 0: slabs: 1026568, objs: 43115856, free: 0 [10220.939262] Call Trace: [10220.939262] node 1: slabs: 820872, objs: 34476624, free: 0 [10220.939263] dump_stack+0x5a/0x73 [10220.939265] warn_alloc+0x103/0x190 [10220.939267] ? wake_all_kswapds+0x54/0x8b [10220.939268] __alloc_pages_slowpath+0x86c/0xa2e [10220.939270] ? __alloc_pages_nodemask+0x2fe/0x320 [10220.939271] __alloc_pages_nodemask+0x2fe/0x320 [10220.939273] new_slab+0x475/0x550 [10220.939275] ___slab_alloc+0x36c/0x520 [10220.939287] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1] [10220.939299] ? __get_txreq+0x54/0x160 [hfi1] [10220.939310] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1] [10220.939312] __slab_alloc+0x40/0x61 [10220.939323] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1] [10220.939325] kmem_cache_alloc+0x181/0x1b0 [10220.939336] hfi1_make_rc_req+0x90/0x18b0 [hfi1] [10220.939348] ? hfi1_verbs_send_dma+0x386/0xa10 [hfi1] [10220.939359] ? find_prev_entry+0xb0/0xb0 [hfi1] [10220.939371] hfi1_do_send+0x1d9/0x3f0 [hfi1] [10220.939372] process_one_work+0x171/0x380 [10220.939374] worker_thread+0x49/0x3f0 [10220.939375] kthread+0xf8/0x130 [10220.939377] ? max_active_store+0x80/0x80 [10220.939378] ? kthread_bind+0x10/0x10 [10220.939379] ret_from_fork+0x35/0x40 [10220.939381] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) The shortage is handled properly so the message isn't needed. Silence by adding the no warn option to the slab allocation. Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code") Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17IB/hfi1: Avoid hardlockup with flushlist_lockMike Marciniszyn
Heavy contention of the sde flushlist_lock can cause hard lockups at extreme scale when the flushing logic is under stress. Mitigate by replacing the item at a time copy to the local list with an O(1) list_splice_init() and using the high priority work queue to do the flushes. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17Merge tag 'v5.2-rc5' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>