summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core/verbs.c
AgeCommit message (Collapse)Author
2017-05-01IB/core: Define 'ib' and 'roce' rdma_ah_attr typesDasaratharaman Chandramouli
rdma_ah_attr can now be either ib or roce allowing core components to use one type or the other and also to define attributes unique to a specific type. struct ib_ah is also initialized with the type when its first created. This ensures that calls such as modify_ah dont modify the type of the address handle attribute. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Use rdma_ah_attr accessor functionsDasaratharaman Chandramouli
Modify core and driver components to use accessor functions introduced to access individual fields of rdma_ah_attr Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_destroy_ah to rdma_destroy_ahDasaratharaman Chandramouli
Rename ib_destroy_ah to rdma_destroy_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_query_ah to rdma_query_ahDasaratharaman Chandramouli
Rename ib_query_ah to rdma_query_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_modify_ah to rdma_modify_ahDasaratharaman Chandramouli
Rename ib_modify_ah to rdma_modify_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_create_ah to rdma_create_ahDasaratharaman Chandramouli
Rename ib_create_ah to rdma_create_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename struct ib_ah_attr to rdma_ah_attrDasaratharaman Chandramouli
This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/core: For multicast functions, verify that LIDs are multicast LIDsMichael J. Ruhl
The Infiniband spec defines "A multicast address is defined by a MGID and a MLID" (section 10.5). Currently the MLID value is not validated. Add check to verify that the MLID value is in the correct address range. Fixes: 0c33aeedb2cf ("[IB] Add checks to multicast attach and detach") Cc: stable@vger.kernel.org Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/core: Add support for draining IB_POLL_DIRECT completion queuesBart Van Assche
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-27IB/core: Add inline function to validate portYuval Shaia
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/core: Support rate limit for packet pacingBodong Wang
Add new member rate_limit to ib_qp_attr which holds the packet pacing rate in kbps, 0 means unlimited. IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW QPs when changing QP state from RTR to RTS, RTS to RTS. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/core: Let create_ah return extended response to userMoni Shoua
Add struct ib_udata to the signature of create_ah callback that is implemented by IB device drivers. This allows HW drivers to return extra data to the userspace library. This patch prepares the ground for mlx5 driver to resolve destination mac address for a given GID and return it to userspace. This patch was previously submitted by Knut Omang as a part of the patch set to support Oracle's Infiniband HCA (SIF). Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/core: Change ib_resolve_eth_dmac to use it in create AHMoni Shoua
The function ib_resolve_eth_dmac() requires struct qp_attr * and qp_attr_mask as parameters while the function might be useful to resolve dmac for address handles. This patch changes the signature of the function so it can be used in the flow of creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/mlx4: Handle IPv4 header when demultiplexing MADMoni Shoua
When MAD arrives to the hypervisor, we need to identify which slave it should be sent by destination GID. When L3 protocol is IPv4 the GRH is replaced by an IPv4 header. This patch detects when IPv4 header needs to be parsed instead of GRH. Fixes: b6ffaeffaea4 ('mlx4: In RoCE allow guests to have multiple GIDS') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/core: Save QP in ib_flow structureMark Bloch
When we create flow steering rule, we need to save the related QP in the ib_flow struct. this QP is used in destroy flow. Move the QP assignment from ib_uverbs_ex_create_flow into ib_create_flow, this would allow both kernel and userspace consumers to use it. This bug wasn't seen in the wild because there are no kernel consumers currently in the kernel. Fixes: 319a441d1361 ("IB/core: Add receive flow steering support") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/core: correctly handle rdma_rw_init_mrs() failureSteve Wise
Function ib_create_qp() was failing to return an error when rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp() when trying to dereferece the qp pointer which was actually a negative errno. The crash: crash> log|grep BUG [ 136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 crash> bt PID: 3736 TASK: ffff8808543215c0 CPU: 2 COMMAND: "kworker/u64:2" #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0 #1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758 #2 [ffff88084d323480] crash_kexec at ffffffff8111682d #3 [ffff88084d3234b0] oops_end at ffffffff81032bd6 #4 [ffff88084d3234e0] no_context at ffffffff8106e431 #5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610 #6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4 #7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc #8 [ffff88084d323620] do_page_fault at ffffffff8106f057 #9 [ffff88084d323660] page_fault at ffffffff816e3148 [exception RIP: ib_create_qp+427] RIP: ffffffffa02554fb RSP: ffff88084d323718 RFLAGS: 00010246 RAX: 0000000000000004 RBX: fffffffffffffff4 RCX: 000000018020001f RDX: ffff880830997fc0 RSI: 0000000000000001 RDI: ffff88085f407200 RBP: ffff88084d323778 R8: 0000000000000001 R9: ffffea0020bae210 R10: ffffea0020bae218 R11: 0000000000000001 R12: ffff88084d3237c8 R13: 00000000fffffff4 R14: ffff880859fa5000 R15: ffff88082eb89800 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm] #11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma] #12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma] #13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma] #14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma] #15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm] #16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm] #17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm] #18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm] #19 [ffff88084d323cb0] process_one_work at ffffffff810a1483 #20 [ffff88084d323d90] worker_thread at ffffffff810a211d #21 [ffff88084d323ec0] kthread at ffffffff810a6c5c #22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf Fixes: 632bc3f65081 ("IB/core, RDMA RW API: Do not exceed QP SGE send limit") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/core: Improve ib_map_mr_sg() documentationBart Van Assche
Document that ib_map_mr_sg() is able to map physically discontiguous sg-lists as a single MR. Change IB_MR_TYPE_SG_GAPS_REG into IB_MR_TYPE_SG_GAPS. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: remove ib_get_dma_mrChristoph Hellwig
We now only use it from ib_alloc_pd to create a local DMA lkey if the device doesn't provide one, or a global rkey if the ULP requests it. This patch removes ib_get_dma_mr and open codes the functionality in ib_alloc_pd so that we can simplify the code and prevent abuse of the functionality. As a side effect we can also simplify things by removing the valid access bit check, and the PD refcounting. In the future I hope to also remove the per-PD global MR entirely by shifting this work into the HW drivers, as one step towards avoiding the struct ib_mr overload for various different use cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: rename pd->local_mr to pd->__internal_mrChristoph Hellwig
This has two reasons: a) to clearly mark that drivers don't have any business using it, and b) because we're going to use it for the (dangerous) global rkey soon, so that drivers don't create on themselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge tag 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-02IB/core, RDMA RW API: Do not exceed QP SGE send limitBart Van Assche
Compute the SGE limit for RDMA READ and WRITE requests in ib_create_qp(). Use that limit in the RDMA RW API implementation. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> #v4.7+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Extend create QP to get indirection tableYishai Hadas
Extend create QP to get Receive Work Queue (WQ) indirection table. QP can be created with external Receive Work Queue indirection table, in that case it is ready to receive immediately. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Introduce Receive Work Queue indirection tableYishai Hadas
Introduce Receive Work Queue (WQ) indirection table. This object can be used to spread incoming traffic to different receive Work Queues. A Receive WQ indirection table points to variable size of WQs. This table is given to a QP in downstream patches. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimerg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Introduce Work Queue object and its verbsYishai Hadas
Introduce Work Queue object and its create/destroy/modify verbs. QP can be created without internal WQs "packaged" inside it, this QP can be configured to use "external" WQ object as its receive/send queue. WQ is a necessary component for RSS technology since RSS mechanism is supposed to distribute the traffic between multiple Receive Work Queues. WQ associated (many to one) with Completion Queue and it owns WQ properties (PD, WQ size, etc.). WQ has a type, this patch introduces the IB_WQT_RQ (i.e.receive queue), it may be extend to others such as IB_WQT_SQ. (send queue). WQ from type IB_WQT_RQ contains receive work requests. PD is an attribute of a work queue (i.e. send/receive queue), it's used by the hardware for security validation before scattering to a memory region which is pointed by the WQ. For that, an external WQ object needs a PD, letting the hardware makes that validation. When accessing a memory region that is pointed by the WQ its PD is used and not the QP's PD, this behavior is similar to a SRQ and a QP. WQ context is subject to a well-defined state transitions done by the modify_wq verb. When WQ is created its initial state becomes IB_WQS_RESET. >From IB_WQS_RESET it can be modified to itself or to IB_WQS_RDY. >From IB_WQS_RDY it can be modified to itself, to IB_WQS_RESET or to IB_WQS_ERR. >From IB_WQS_ERR it can be modified to IB_WQS_RESET. Note: transition to IB_WQS_ERR might occur implicitly in case there was some HW error. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUIDEli Cohen
When virtualziation is supported, VFs may send SA MADs to a GID formed by the concatenation of the subnet prefix with the IB_SA_WELL_KNOWN_GUID. When a response is required, the current code will search the local HCA's port for the received GID to figure out the GID index of the entry containing this GID. However, since this is not a real GID it will not be found and error will be printed. We change the logic to check if the destination GID is this special GID and avoid lookup in this case and use GID index 0. Fixes: a0c1b2a35087 ('IB/core: Support accessing SA in virtualized environment') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Enhance ib_map_mr_sg()Bart Van Assche
The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: add RW API support for signature MRsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: generic RDMA READ/WRITE APIChristoph Hellwig
This supports both manual mapping of lots of SGEs, as well as using MRs from the QP's MR pool, for iWarp or other cases where it's more optimal. For now, MRs are only used for iWARP transports. The user of the RDMA-RW API must allocate the QP MR pool as well as size the SQ accordingly. Thanks to Steve Wise for testing, fixing and rewriting the iWarp support, and to Sagi Grimberg for ideas, reviews and fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: add a need_inval flag to struct ib_mrSteve Wise
This is the first step toward moving MR invalidation decisions to the core. It will be needed by the upcoming RW API. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: add a simple MR poolChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: refactor ib_create_qpChristoph Hellwig
Split the XRC magic into a separate function, and return early on failure to make the initialization code readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Add passing an offset into the SG to ib_map_mr_sgChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-26IB/core: Don't drain non-existent rq queue-pairSagi Grimberg
The drain_rq function expects a normal receive qp to drain. A qp can only have either a normal rq or an srq. If there is an srq, there is no rq to drain. Until the API supports draining SRQs, simply skip draining the rq when the qp has an srq attached. Fixes: 765d67748bcf ("IB: new common API for draining queues") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21IB/core: Add interfaces to control VF attributesEli Cohen
Following the practice exercised for network devices which allow the PF net device to configure attributes of its virtual functions, we introduce the following functions to be used by IPoIB which is the network driver implementation for IB devices. ib_set_vf_link_state - set the policy for a VF link. More below. ib_get_vf_config - read configuration information of a VF ib_get_vf_stats - read VF statistics ib_set_vf_guid - set the node or port GUID of a VF Also add an indication in the device cap flags that indicates that this IB devices is based on a virtual function. A VF shares the physical port with the PF and other VFs. When setting the link state we have three options: 1. Auto - in this mode, the virtual port follows the state of the physical port and becomes active only if the physical port's state is active. In all other cases it remains in a Down state. 2. Down - sets the state of the virtual port to Down 3. Up - causes the virtual port to transition into Initialize state if it was not already in this state. A virtualization aware subnet manager can then bring the state of the port into the Active state. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-16Merge branches 'mlx4', 'mlx5' and 'ocrdma' into k.o/for-4.6Doug Ledford
2016-03-04IB/core: Add arbitrary sg_list supportSagi Grimberg
Devices that are capable in registering SG lists with gaps can now expose it in the core to ULPs using a new device capability IB_DEVICE_SG_GAPS_REG (in a new field device_cap_flags_ex in the device attributes as we ran out of bits), and a new mr_type IB_MR_TYPE_SG_GAPS_REG which allocates a memory region which is capable of handling SG lists with gaps. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29IB: new common API for draining queuesSteve Wise
Add provider-specific drain_sq/drain_rq functions for providers needing special drain logic. Add static functions __ib_drain_sq() and __ib_drain_rq() which post noop WRs to the SQ or RQ and block until their completions are processed. This ensures the applications completions for work requests posted prior to the drain work request have all been processed. Add API functions ib_drain_sq(), ib_drain_rq(), and ib_drain_qp(). For the drain logic to work, the caller must: ensure there is room in the CQ(s) and QP for the drain work request and completion. allocate the CQ using ib_alloc_cq() and the CQ poll context cannot be IB_POLL_DIRECT. ensure that there are no other contexts that are posting WRs concurrently. Otherwise the drain is not guaranteed. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/core: Use hop-limit from IP stack for RoCEMatan Barak
Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as hop limit values. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/core: Rename rdma_addr_find_dmac_by_grhMatan Barak
rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and downsteram patch will also add hop_limit as an output parameter, thus we rename it to rdma_addr_find_l2_eth_by_grh. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/core: Remove set-but-not-used variable from ib_sg_to_pages()Bart Van Assche
Detected this by building the IB core with W=1. See also patch "IB core: Fix ib_sg_to_pages()" (commit 8f5ba10ed40a). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Leon Romanovsky <leon.romanovsky@mellanox.com> Acked-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB: remove the write-only usecnt field from struct ib_mrChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB: remove in-kernel support for memory windowsChristoph Hellwig
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB: remove ib_query_mrChristoph Hellwig
This functionality has no users and was only supported by the staged out EHCA driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB/core: Validate route when we init ahMatan Barak
In order to make sure API users don't try to use SGIDs which don't conform to the routing table, validate the route before searching the RoCE GID table. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB/core: Add rdma_network_type to wcSomnath Kotur
Providers should tell IB core the wc's network type. This is used in order to search for the proper GID in the GID table. When using HCAs that can't provide this info, IB core tries to deep examine the packet and extract the GID type by itself. We choose sgid_index and type from all the matching entries in RDMA-CM based on hint from the IP stack and we set hop_limit for the IP packet based on above hint from IP stack. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Somnath Kotur <Somnath.Kotur@Avagotech.Com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB/core: Add gid_type to gid attributeMatan Barak
In order to support multiple GID types, we need to store the gid_type with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT GID table entries shall have a "GID type" attribute that denotes the L3 Address type". The currently supported GID is IB_GID_TYPE_IB which is also RoCE v1 GID type. This implies that gid_type should be added to roce_gid_table meta-data. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-22IB/core: Avoid calling ib_query_deviceOr Gerlitz
Use the cached copy of the attributes present on the device, except for the case of a query originating from user-space, where we have to invoke the driver query_device entry, so they can fill in their udata. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07IB core: Fix ib_sg_to_pages()Bart Van Assche
On 12/03/2015 01:18 AM, Christoph Hellwig wrote: > The patch looks good to me, but while we touch this area, how about > throwing in a few cosmetic fixes as well? How about the patch below ? In that version of the ib_sg_to_pages() fix these concerns have been addressed and additionally to more bugs have been fixed. ------------ [PATCH] IB core: Fix ib_sg_to_pages() Fix the code for detecting gaps. A gap occurs not only if the second or later scatterlist element is not aligned but also if any scatterlist element other than the last does not end at a page boundary. In the code for coalescing contiguous elements, ensure that mr->length is correct and that last_page_addr is up-to-date. Ensure that this function returns a negative error code instead of zero if the first set_page() call fails. Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>