summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)Author
2019-02-14IB/mlx5: Add support for 50Gbps per lane link modesAya Levin
Driver now supports new link modes: 50Gbps per lane support for 50G/100G/200G. This patch reads the correct field (legacy vs. extended) based on a FW indication bit, and adds a translation function (link modes to IB width and speed) to the new link modes. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Add support to ext_* fields introduced in Port Type and Speed registerAya Levin
This patch exposes new link modes (including 50Gbps per lane), and ext_* fields which describes the new link modes in Port Type and Speed register (PTYS). Access functions, translation functions (speed <-> HW bits) and link max speed function were modified. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Refactor queries to speed fields in Port Type and Speed registerAya Levin
This patch fascicles queries to speed related fields in Port Type and Speed register (PTYS) into a single API. I addition, this patch refactors functions which serves only Ethernet driver: remove the protocol type as an input parameter, move code from 'core' directory into 'en' directory and add 'eth' prefix to the function's name. The patch also encapsulates functions that are not used outside the Ethernet driver removes redundant include files. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: E-Switch, Normalize the name of uplink vport numberBodong Wang
Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to comply with the same naming convention along with the introduction of other vports. Use MLX5_VPORT as the prefix for such vports and relocate the uplink vport definition to public header file for the benefits of both net and IB drivers. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14IB/mlx5: Use unified register/load function for uplink and VF vportsBodong Wang
IB driver maintains different registration and load function calls for uplink and VF vports. This is not necessary as they only differ with each other on their profiles. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13RDMA/nes: Use for_each_sg_dma_page iterator for umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13RDMA: Fix allocation failure on pointer pdColin Ian King
The null check on an allocation failure on pd is currently checking if pd is non-null rather than null. Fix this by adding the missing ! operator. Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13Merge branch 'for-next' of ↵Doug Ledford
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into for-next I had merged the hfi1-tid code into my local copy of for-next, but was waiting on 0day testing before pushing it (I pushed it to my wip branch). Having waited several days for 0day testing to show up, I'm finally just going to push it out. In the meantime, though, Jason pushed other stuff to for-next, so I needed to merge up the branches before pushing. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-11RDMA/bnxt_re: fix or'ing of data into an uninitialized struct memberColin Ian King
The struct member comp_mask has not been initialized however a bit pattern is being bitwise or'd into the member and hence other bit fields in comp_mask may contain any garbage from the stack. Fix this by making the bitwise or into an assignment. Fixes: 95b86d1c91ad ("RDMA/bnxt_re: Update kernel user abi to pass chip context") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/mlx5: Fix memory leak in case we fail to add an IB deviceMark Bloch
Make sure the IB device is freed on failure. Fixes: b5ca15ad7e61 ("IB/mlx5: Add proper representors support") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11IB/mlx5: Fix bad flow upon DEVX mkey creationYishai Hadas
Fix bad flow upon DEVX mkey creation to prevent deleting the indirect mkey from the radix tree in case there was a previous failure to insert it. Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/ocrdma: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/qedr: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/vmw_pvrdma: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/cxgb3: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/cxgb4: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/hns: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/i40iw: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/mthca: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-09Merge branch 'wip/dl-for-next' into for-nextDoug Ledford
Due to concurrent work by myself and Jason, a normal fast forward merge was not possible. This brings in a number of hfi1 changes, mainly the hfi1 TID RDMA support (roughly 10,000 LOC change), which was reviewed and integrated over a period of days. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-08iw_cxgb4: fix srqidx leak during connection abortRaju Rangoju
When an application aborts the connection by moving QP from RTS to ERROR, then iw_cxgb4's modify_rc_qp() RTS->ERROR logic sets the *srqidxp to 0 via t4_set_wq_in_error(&qhp->wq, 0), and aborts the connection by calling c4iw_ep_disconnect(). c4iw_ep_disconnect() does the following: 1. sends up a close_complete_upcall(ep, -ECONNRESET) to libcxgb4. 2. sends abort request CPL to hw. But, since the close_complete_upcall() is sent before sending the ABORT_REQ to hw, libcxgb4 would fail to release the srqidx if the connection holds one. Because, the srqidx is passed up to libcxgb4 only after corresponding ABORT_RPL is processed by kernel in abort_rpl(). This patch handle the corner-case by moving the call to close_complete_upcall() from c4iw_ep_disconnect() to abort_rpl(). So that libcxgb4 is notified about the -ECONNRESET only after abort_rpl(), and libcxgb4 can relinquish the srqidx properly. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08iw_cxgb4: complete the cached SRQ buffersRaju Rangoju
If TP fetches an SRQ buffer but ends up not using it before the connection is aborted, then it passes the index of that SRQ buffer to the host in ABORT_REQ_RSS or ABORT_RPL CPL message. But, if the srqidx field is zero in the received ABORT_RPL or ABORT_REQ_RSS CPL, then we need to read the tcb.rq_start field to see if it really did have an RQE cached. This works around a case where HW does not include the srqidx in the ABORT_RPL/ABORT_REQ_RSS CPL. The final value of rq_start is the one present in TCB with the TF_RX_PDU_OUT bit cleared. So, we need to read the TCB, examine the TF_RX_PDU_OUT (bit 49 of t_flags) in order to determine if there's a rx PDU feedback event pending. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08RDMA: Handle PD allocations by IB/coreLeon Romanovsky
The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08IB/usnic: Fix locking when unregisteringParvi Kaustubhi
Move the call to usnic_ib_device_remove after usnic_ib_ibdev_list_lock has been released. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08iw_cxgb4: use tos when finding ipv6 routesSteve Wise
When IPv6 support was added, the correct tos was not passed to cxgb_find_route6(). This potentially results in the wrong route entry. Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08iw_cxgb4: use tos when importing the endpointSteve Wise
import_ep() is passed the correct tos, but doesn't use it correctly. Fixes: ac8e4c69a021 ("cxgb4/iw_cxgb4: TOS support") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08iw_cxgb4: use listening ep tos when accepting new connectionsSteve Wise
If the parent listening endpoint has a service type set, then use that when setting up the connection. This allows server-side applications to mandate the tos for passive side connections via rdma_set_service_type() on the listening endpoints. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Update kernel user abi to pass chip contextDevesh Sharma
User space verbs provider library would need chip context. Changing the ABI to add chip version details in structure. Furthermore, changing the kernel driver ucontext allocation code to initialize the abi structure with appropriate values. As suggested by community, appended the new fields at the bottom of the ABI structure and retaining to older fields as those were in the older versions. Keeping the ABI version at 1 and adding a new field in the ucontext response structure to hold the component mask. The user space library should check pre-defined flags to figure out if a certain feature is supported on not. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Add extended psn structure for 57500 adaptersDevesh Sharma
The new 57500 series of adapter has bigger psn search structure. The size of new structure is 16B. Changing the control path memory allocation and fast path code to accommodate the new psn structure while maintaining the backward compatibility. There are few additional changes listed below: - For 57500 chip max-sge are limited to 6 for now. - For 57500 chip max-receive-sge should be set to 6 for now. - Add driver/hardware interface structure for new chip. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Enable GSI QP support for 57500 seriesDevesh Sharma
In the new 57500 series of adapters the GSI qp is a UD type QP unlike the previous generation where it was a Raw Eth QP. Changing the control and data path to support the same. Listing all the significant diffs: - AH creation resolve network type unconditionally - Add check at relevant places to distinguish from Raw Eth processing flow. - bnxt_re_process_res_ud_wc report completion with GRH flag when qp is GSI. - Change length, cfa_meta and smac to match new driver/hardware interface. - Add new driver/hardware interface. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Skip backing store allocation for 57500 seriesDevesh Sharma
The backing store to keep HW context data structures is allocated and initialized by L2 driver. For 57500 chip RoCE driver do not require to allocate and initialize additional memory. Changing to skip duplicate allocation and initialization for 57500 adapters. Driver continues as before for older chips. This patch also takes care of stats context memory alignment to 128 boundary, a requirement for 57500 series of chip. Older chips do not care of alignment, thus the change is unconditional. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Add 64bit doorbells for 57500 seriesDevesh Sharma
The new chip series has 64 bit doorbell for notification queues. Thus, both control and data path event queues need new routines to write 64 bit doorbell. Adding the same. There is new doorbell interface between the chip and driver. Changing the chip specific data structure definitions. Additional significant changes are listed below - bnxt_re_net_ring_free/alloc takes a new argument - bnxt_qplib_enable_nq and enable_rcfw uses new doorbell offset for new chip. - DB mapping for NQ and CREQ now maps 8 bytes. - DBR_DBR_* macros renames to DBC_DBC_* - store nq_db_offset in a 32bit data type. - got rid of __iowrite64_copy, used writeq instead. - changed the DB header initialization to simpler scheme. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Add chip context to identify 57500 seriesDevesh Sharma
Adding setup and destroy routines for chip-context. The chip context would be used frequently in control and data path to take execution flow depending on the chip type. chip context structure pointer is added to the relevant data structures. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07IB/mlx5: Simplify WQE count power of two checkGal Pressman
Use is_power_of_2() instead of hard coding it in the driver. While at it, fix the meaningless error print. Signed-off-by: Gal Pressman <galpress@amazon.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07drivers/IB,usnic: reduce scope of mmap_semDavidlohr Bueso
usnic_uiom_get_pages() uses gup_longterm() so we cannot really get rid of mmap_sem altogether in the driver, but we can get rid of some complexity that mmap_sem brings with only pinned_vm. We can get rid of the wq altogether as we no longer need to defer work to unpin pages as the counter is now atomic. We also share the lock. Acked-by: Parvi Kaustubhi <pkaustub@cisco.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07drivers/IB,hfi1: do not se mmap_semDavidlohr Bueso
This driver already uses gup_fast() and thus we can just drop the mmap_sem protection around the pinned_vm counter. Note that the window between when hfi1_can_pin_pages() is called and the actual counter is incremented remains the same as mmap_sem was _only_ used for when ->pinned_vm was touched. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.det> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07drivers/IB,qib: optimize mmap_sem usageDavidlohr Bueso
The driver uses mmap_sem for both pinned_vm accounting and get_user_pages(). Because rdma drivers might want to use gup_longterm() in the future we still need some sort of mmap_sem serialization (as opposed to removing it entirely by using gup_fast()). Now that pinned_vm is atomic the writer lock can therefore be converted to reader. This also fixes a bug that __qib_get_user_pages was not taking into account the current value of pinned_vm. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07mm: make mm->pinned_vm an atomic64 counterDavidlohr Bueso
Taking a sleeping lock to _only_ increment a variable is quite the overkill, and pretty much all users do this. Furthermore, some drivers (ie: infiniband and scif) that need pinned semantics can go to quite some trouble to actually delay via workqueue (un)accounting for pinned pages when not possible to acquire it. By making the counter atomic we no longer need to hold the mmap_sem and can simply some code around it for pinned_vm users. The counter is 64-bit such that we need not worry about overflows such as rdma user input controlled from userspace. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-05IB/hfi1: Prioritize the sending of ACK packetsKaike Wan
ACK packets are generally associated with request completion and resource release and therefore should be sent first. This patch optimizes the send engine by using the following policies: (1) QPs with RVT_S_ACK_PENDING bit set in qp->s_flags or qpriv->s_flags should have their priority incremented; (2) QPs with ACK or TID-ACK packet queued should have their priority incremented; (3) When a QP is queued to the wait list due to resource constraints, it will be queued to the head if it has ACK packet to send; (4) When selecting qps to run from the wait list, the one with the highest priority and starve_cnt will be selected; each priority will be equivalent to a fixed number of starve_cnt (16). Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Add static trace for TID RDMA WRITE protocolKaike Wan
This patch makes the following changes to the static trace: 1. Adds the decoding of TID RDMA WRITE packets in IB header trace; 2. Adds trace events for various stages of the TID RDMA WRITE protocol. These events provide a fine-grained control for monitoring and debugging the hfi1 driver in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Enable TID RDMA WRITE protocolKaike Wan
This patch enables TID RDMA WRITE protocol by converting a qualified RDMA WRITE request into a TID RDMA WRITE request internally: (1) The TID RDMA cability must be enabled; (2) The request must start on a 4K page boundary; (3) The request length must be a multiple of 4K and must be larger or equal to 256K. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Add interlock between TID RDMA WRITE and other requestsKaike Wan
This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA WRITE requests are interleaved with TID RDMA READ requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; 4. WRITE-AFTER-WRITE. When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbsKaike Wan
This patch integrates TID RDMA WRITE protocol into normal RDMA verbs framework. The TID RDMA WRITE protocol is an end-to-end protocol between the hfi1 drivers on two OPA nodes that converts a qualified RDMA WRITE request into a TID RDMA WRITE request to avoid data copying on the responder side. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Add the dual leg codeKaike Wan
The "Second Leg" of the TID RDMA WRITE protocol deals with the transfer of data and ack packets, which are in the KDETH PSN space, as opposed to the IB PSN space. Therefore, the Second Leg could be considered as a separate state machine. As such, it is handled by a different work queue item which is scheduled along with the normal IB state machine work item. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Add the TID second leg ACK packet builderKaike Wan
This patch adds the TID packet builder for the responder side, which contains the state machine to build TID RDMA ACK packet for either TID RDMA WRITE DATA or TID RDMA RESYNC packets. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Add the TID second leg send packet builderKaike Wan
To improve performance, the TID RDMA WRITE protocol is designed to own a second leg to send data and ack packets in the KDETH PSN space. This patch adds the packet builder for the requester side, which contains the state machine to build TID RDMA WRITE DATA and TID RDMA RESYNC packet. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Resend the TID RDMA WRITE DATA packetsKaike Wan
This patch adds the logic to resend TID RDMA WRITE DATA packets. The tracking indices will be reset properly so that the correct TID entries will be used. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Add a function to receive TID RDMA RESYNC packetKaike Wan
This patch adds a function to receive TID RDMA RESYNC packet on the responder side. The QP's hardware flow will be updated and all allocated software flows will be updated accordingly in order to drop all stale packets. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-05IB/hfi1: Add a function to build TID RDMA RESYNC packetKaike Wan
This patch adds a function to build TID RDMA RESYNC packet, which is sent by the requester to notify the responder that no TID RDMA ACK packet has been received for a given KDETH PSN. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>