summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-04-21IB/ipoib: Update broadcast object if PKey value was changed in index 0Feras Daoud
Update the broadcast address in the priv->broadcast object when the Pkey value changes in index 0, otherwise the multicast GID value will keep the previous value of the PKey, and will not be updated. This leads to interface state down because the interface will keep the old PKey value. For example, in SR-IOV environment, if the PF changes the value of PKey index 0 for one of the VFs, then the VF receives PKey change event that triggers heavy flush. This flush calls update_parent_pkey that update the broadcast object and its relevant members. If in this case the multicast GID will not be updated, the interface state will be down. Fixes: c2904141696e ("IPoIB: Fix pkey change flow for virtualization environments") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Cache dst in QP instead of getting it for each sendyonatanc
In RC QP there is no need to resolve the outgoing interface for each packet, as this does not change during QP life cycle. Instead cache the interface on the socket and use that one. This improves performance by 12% by sparing redundant calls to rxe_find_route. ib_send_bw -d rxe0 -x 1 -n 9000 -e -s $((1024 * 1024 )) -l 100 ---------------------------------------------------------------------------------------- | | bytes | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] | ---------------------------------------------------------------------------------------- | before | 1048576 | 9000 | inf | 551.21 | 0.000551 | | after | 1048576 | 9000 | inf | 615.54 | 0.000616 | ---------------------------------------------------------------------------------------- Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Offload CRC calculation when possibleyonatanc
Use CPU ability to perform CRC calculations, by replacing direct calls to crc32_le() with crypto_shash_updata(). The overall performance gain measured with ib_send_bw tool is 10% and it was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU. ib_send_bw -d rxe0 -x 1 -n 9000 -e -s $((1024 * 1024 )) -l 100 --------------------------------------------------------------------------------------------- | | bytes | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] | --------------------------------------------------------------------------------------------- | crc32_le | 1048576 | 9000 | inf | 497.60 | 0.000498 | | CRC offload | 1048576 | 9000 | inf | 546.70 | 0.000547 | --------------------------------------------------------------------------------------------- Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Do not export module's private functionParav Pandit
Function rxe_rcv is used internally in RXE and don't need to be exported. This patch removes such export declaration. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Avoid accessing timers for non RC QPsParav Pandit
This patch avoids RNR NAK timer and retransmit timer initialization and cleanup for non RC QPs (such as UD QP, GSI QP). Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Add port protocol statsYonatan Cohen
Expose new counters using the get_hw_stats callback. We expose the following counters: +---------------------+----------------------------------------+ | Name | Description | |---------------------+----------------------------------------| |sent_pkts | number of sent pkts | |---------------------+----------------------------------------| |rcvd_pkts | number of received packets | |---------------------+----------------------------------------| |out_of_sequence | number of errors due to packet | | | transport sequence number | |---------------------+----------------------------------------| |duplicate_request | number of received duplicated packets. | | | A request that previously executed is | | | named duplicated. | |---------------------+----------------------------------------| |rcvd_rnr_err | number of received RNR by completer | |---------------------+----------------------------------------| |send_rnr_err | number of sent RNR by responder | |---------------------+----------------------------------------| |rcvd_seq_err | number of out of sequence packets | | | received | |---------------------+----------------------------------------| |ack_deffered | number of deferred handling of ack | | | packets. | |---------------------+----------------------------------------| |retry_exceeded_err | number of times retry exceeded | |---------------------+----------------------------------------| |completer_retry_err | number of times completer decided to | | | retry | |---------------------+----------------------------------------| |send_err | number of failed send packet | +---------------------+----------------------------------------+ Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20cxgb4: Convert PDBG to pr_debug the secondDoug Ledford
A couple spots were missed in the original patch to implement this change. Add those spots. Fixes: a9a42886d0b3 (cxgb4: Convert PDBG to pr_debug) Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hns: Use kcalloc() in hns_roce_buddy_init()Markus Elfring
* Multiplications for the size determination of memory allocations indicated that array data structures should be processed. Thus use the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. * Replace the specification of data types by pointer dereferences to make the corresponding size determinations a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hns: Use kmalloc_array() in hns_roce_cmd_use_events()Markus Elfring
* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Replace the specification of a data structure by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hfi1: Coding style improvement (make sizeof use safer)Markus Elfring
Replace the specification of a data structure by a reference to the desired member as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hfi1: Remove intermediate var in hfi1_user_sdma_alloc_queues()Markus Elfring
* Pass a product for a call of the function "vmalloc_user" without storing it in an intermediate variable. * Delete the local variable "memsize" which became unnecessary with this refactoring. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hfi1: Use kcalloc() in hfi1_user_sdma_alloc_queues()Markus Elfring
* Multiplications for the size determination of memory allocations indicated that array data structures should be processed. Thus reuse the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hfi1: Use kcalloc() in hfi1_user_exp_rcv_init()Markus Elfring
* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus reuse the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20cxgb4: Convert PDBG to pr_debugJoe Perches
Use a more typical logging style. Miscellanea: o Obsolete the c4iw_debug module parameter o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20cxgb4: Use more common logging styleJoe Perches
Convert printks to pr_<level> Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20cxgb3: Convert PDBG to pr_debugJoe Perches
Using the normal mechanism, not an indirected one, is clearer. Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20cxgb3: Use more common logging styleJoe Perches
Convert printks to pr_<level> Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/IPoIB: Support acceleration options callbacksErez Shitrit
IPoIB driver now uses the new set of callback functions. If the hardware provider supports the new ipoib_options implementation, the driver uses the callbacks in its data path flows, otherwise it uses the driver default implementation for all data flows in its code. The default implementation wasn't change and it is exactly as it was before introduction of acceleration support. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/IPoIB: Use defined function for netdev_priv functionErez Shitrit
Make ipoib_priv point to netdev_priv where the code calls netdev_priv. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/IPoIB: Rename qpn to be dqpn in ipoib_send and post_send functionsErez Shitrit
Change of function parameter name from qpn to be dqpn. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/IPoIB: Separate control from HW operation on ipoib_open/stop ndoErez Shitrit
This patch is preparing the netdev part at the IPoIB driver to be able to use the ipoib_options. It deals with the two flows from the .ndo: ipoib_open and ipoib_stop. The code is rearranged as follows: * All operations which deal with the hardware resources, (for example change QP state, post-receive etc.) are performed in one place. * All operations that are control oriented (like restart multicast task, start the reap_ah etc.) are performed in separate place. The functions that deal with the hardware resources now located at __ipoib_ib_dev_open for the ipoib_open flow and __ipoib_ib_dev_stop for ipoib_stop. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/IPoIB: Separate control and data related initializationsErez Shitrit
This patch prepares init and teardown flows so we can call them through ipoib_options function pointers. It arranges that area of code as the following: * All operations which deal with the resource allocation/deletion are performed in one place. * All operations that are control oriented, meaning that they are not connected to a specific hardware, are performed in a separate place. The operations for allocation of hardware resources are now in the function ipoib_dev_init_default, and the deletion of all the resources are in ipoib_dev_uninit_default The only exception is the creation of the PD object, which is used both for resource allocation (create QP etc.) and for control flows like creating AH. It also does: * Move creation of rx_ring and tx_ring to be in the resources allocation area. * Move the function ipoib_ib_dev_open that does the open device to the control area instead of the dev_init which creates resources. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/IPoIB: Introduce RDMA netdev interface and IPoIB structsNiranjana Vishwanathapura
Add RDMA netdev interface to ib device structure allowing RDMA netdev devices to be allocated by ib clients. The idea is to allow to providers to optimize IPoIB data path. New struct that includes functions and data member is exposed. It exposes set of callback functions for handling data path flows in IPoIB driver. Each provider can support these set of functions in order to optimize its specific data path, and let IPoIB to leverage its data path. There is an assumption, that providers should give the full set of functions and not only part of them, in order to work properly. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hfi1: VNIC SDMA supportVishwanathapura, Niranjana
HFI1 VNIC SDMA support enables transmission of VNIC packets over SDMA. Map VNIC queues to SDMA engines and support halting and wakeup of the VNIC queues. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hfi1: Virtual Network Interface Controller (VNIC) HW supportVishwanathapura, Niranjana
HFI1 HW specific support for VNIC functionality. Dynamically allocate a set of contexts for VNIC when the first vnic port is instantiated. Allocate VNIC contexts from user contexts pool and return them back to the same pool while freeing up. Set aside enough MSI-X interrupts for VNIC contexts and assign them when the contexts are allocated. On the receive side, use an RSM rule to spread TCP/UDP streams among VNIC contexts. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hfi1: OPA_VNIC RDMA netdev supportVishwanathapura, Niranjana
Add support to create and free OPA_VNIC rdma netdev devices. Implement netstack interface functionality including xmit_skb, receive side NAPI etc. Also implement rdma netdev control functions. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: VNIC Ethernet Management Agent (VEMA) functionVishwanathapura, Niranjana
OPA VEMA function interfaces with the Infiniband MAD stack to exchange the management information packets with the Ethernet Manager (EM). It interfaces with the OPA VNIC netdev function to SET/GET the management information. The information exchanged with the EM includes class port details, encapsulation configuration, various counters, unicast and multicast MAC list and the MAC table. It also supports sending traps to the EM. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: VNIC Ethernet Management Agent (VEMA) interfaceVishwanathapura, Niranjana
OPA VNIC EMA interface functions are the management interfaces to the OPA VNIC netdev. Add support to add and remove VNIC ports. Implement the required GET/SET management interface functions and processing of new management information. Add support to send trap notifications upon various events like interface status change, unicast/multicast mac list update and mac address change. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: VNIC MAC table supportVishwanathapura, Niranjana
OPA VNIC MAC table contains the MAC address to DLID mappings provided by the Ethernet manager. During transmission, the MAC table provides the MAC address to DLID translation. Implement MAC table using simple hash list. Also provide support to update/query the MAC table by Ethernet manager. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: VNIC statistics supportVishwanathapura, Niranjana
OPA VNIC driver statistics support maintains various counters including standard netdev counters and the Ethernet manager defined counters. Add the Ethtool hook to read the counters. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: VNIC Ethernet Management (EM) structure definitionsVishwanathapura, Niranjana
Define VNIC EM MAD structures and the associated macros. These structures are used for information exchange between VNIC EM agent (EMA) on the host and the Ethernet manager. These include the virtual ethernet switch (vesw) port information, vesw port mac table, summay and error counters, vesw port interface mac lists and the EMA trap. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: Virtual Network Interface Controller (VNIC) netdevVishwanathapura, Niranjana
OPA VNIC netdev function supports Ethernet functionality over Omni-Path fabric by encapsulating Ethernet packets inside Omni-Path packet header. It allocates a rdma netdev device and interfaces with the network stack to provide standard Ethernet network interfaces. It overrides HFI1 device's netdev operations where it is required. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: Virtual Network Interface Controller (VNIC) interfaceVishwanathapura, Niranjana
Define OPA VNIC interface between hardware independent VNIC functionality and the hardware dependent VNIC functionality. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: RDMA NETDEV interfaceVishwanathapura, Niranjana
Add rdma netdev interface to ib device structure allowing rdma netdev devices to be allocated by ib clients. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/opa-vnic: Virtual Network Interface Controller (VNIC) documentationVishwanathapura, Niranjana
Add OPA VNIC design document explaining the VNIC architecture and the driver design. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20Merge branch 'k.o/for-4.12' into k.o/for-4.12-rdma-netdeviceDoug Ledford
2017-04-20IB/core: Rename uverbs event file structureMatan Barak
Previously, ib_uverbs_event_file was suffixed by _file as it contained the actual file information. Since it's now only used as base struct for ib_uverbs_async_event_file and ib_uverbs_completion_event_file, we change its name to ib_uverbs_event_queue. This represents its logical role better. Fixes: 1e7710f3f656 ('IB/core: Change completion channel to use the reworked objects schema') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/core: Don't use is_async in event files to infer events sizeMatan Barak
Previously, we inferred the events size in ib_uverbs_event_read by using the is_async flag. Instead of that, we pass the event size directly. Fixes: 1e7710f3f656 ('IB/core: Change completion channel to use the reworked objects schema') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/core: A small refactor in destroy WQ handlerMatan Barak
Instead of having uverbs_uobject_put both in the error flow and the good flow, we unite them. Fixes: fd3c7904db6e ('IB/core: Change idr objects to use the new schema') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/core: Nullify ib_uobject during allocationMatan Barak
Currently, we initialize all fields of ib_uobject straight after allocation. Therefore, a kmalloc was sufficient. Since ib_uobject could be embedded in a type specific structure, we nullify it to spare programmer errors. Fixes: 3832125624b7 ('IB/core: Add support for idr types') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/core: Don't pass the lock state to _rdma_remove_commit_uobjectMatan Barak
The only scenario where this function was called while the lock is already taken is in the context cleanup scenario. Thus, in order not to pass the lock state to this function, we just call the remove logic straight from the cleanup context function. Fixes: 3832125624b7 ('IB/core: Add support for idr types') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/core: Rename write flag to exclusive in rdma_coreMatan Barak
We rename the "write" flags to "exclusive", as it's used for both WRITE and DESTROY actions. Fixes: 3832125624b7 ('IB/core: Add support for idr types') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-17Merge branch 'mlx5-RDMA-netdevice'David S. Miller
Saeed Mahameed says: ==================== Mellanox, mlx5 RDMA net device support This series provides the lower level mlx5 support of RDMA netdevice creation API [1] suggested and introduced by Intel's HFI OPA VNIC netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation. mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current IPoIB ULP generic netdevice, providing: - mlx5 RSS support. - mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..). - Full mlx5 HW features transparent to the ULP itself. The idea here is to reuse and benefit from the already implemented mlx5e netdevice management and channels API for both etherent and RDMA netdevices, since both IPoIB and Ethernet netdevices share same common mlx5 HW resources (with some small exceptions) and share most of the control/data path logic, it is more natural to have them share the same code. The differences between IPoIB and Ethernet netdevices can be summarized to: Steering: In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in Ethernet the traffic is handled by vports and vport steering is managed by e-switch or FW. For IPoIB traffic to get steered correctly the only thing we need to do is to create RSS HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the underlay QP attached to them (underlay QP will be 0 in case of Ethernet). RX,TX: Since IPoIB traffic is different, slightly modified RX and TX handlers are required, still we do some code reuse in data path via common helper functions. All of the other generic netdevice and mlx5 aspects will be shared between mlx5 Ethernet and IPoIB netdevices, e.g. - Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt moderation, etc..) - Offloads, checksum, GRO, LRO, TSO, and more. - netdevice logic and non Ethernet specific ndos (open/close, etc..) In order to achieve what we want: In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and refactored the mlx5 steering code to accept the underlay QP as a parameter for creating steering objects and enabled flow steering for IB link. Then we are going to use the mlx5e netdevice profile, which is already used to separate between NIC and VF representors netdevices, to create new type of IPoIB netdevice profile. For that, one small refactoring is required to make mlx5e netdevice profile management more genetic and agnostic to link type which is done in patch #4. In patch #5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific logic and a skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in next patches, using mlx5e already existing APIs. Patch #6 and #7, Implement init/cleanup RX mlx5i netdev profile handlers to create mlx5 RSS resources, same as mlx5e but without vlan and L2 steering tables. Patch #8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX resources same as mlx5e but with one TC (tc = 0) support. Patch #9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels API, to start/stop TX/RX channels. Patch #10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts. Patch #11 and #12, Break down the mlx5e xmit flow into smaller helper function and implement the mlx5i IPoIB xmit routine. Patch #13 and #14, Have an RX handler per netdevice profile. We already do this before this series in a non clean way to separate between NIC netdev and VF representor RX handlers, in patch 13 we make the RX handler generic and bound to a profile and in patch 14 we implement the IPoIB RX handlers. Patch #15, Small cleanup to avoid e-switch with IPoIB netdev. In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad support [3] - which was alread submitted to the rdma mailing list - and this series is required plus an extra small patch [4] which will connect between both sides and actually enables the offload. Once both patch-sets are merged into linux we will have to submit the extra small patch [4], to enable the feature. Thanks, Saeed. [1] https://patchwork.kernel.org/patch/9676637/ [2] https://lwn.net/Articles/715453/ https://patchwork.kernel.org/patch/9587815/ [3] https://patchwork.kernel.org/patch/9672069/ [4] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?id=0141db6a686e32294dee015b7d07706162ba48d8 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17hw/mlx5: Add New bit to check over QP creationErez Shitrit
Add check for bit IB_QP_CREATE_NETIF_QP while creating QP. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: E-switch vport manager is valid for ethernet onlySaeed Mahameed
Currently the driver support only ethernet eswitch, and we want to protect downstream IPoIB netdev from trying to access it in IB link. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: IPoIB, RX handlerSaeed Mahameed
Implement IPoIB RX SKB handler. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: RX handlers per netdev profileSaeed Mahameed
In order to have different RX handler per profile, fix and refactor the current code to take the rx handler directly from the netdevice profile rather than computing it on runtime as it was done with the switchdev mode representor rx handler. This will also remove the current wrong assumption in mlx5e_alloc_rq code that mlx5e_priv->ppriv is of the type vport_rep. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: IPoIB, Xmit flowSaeed Mahameed
Implement mlx5e's IPoIB SKB transmit using the helper functions provided by mlx5e ethernet tx flow, the only difference in the code between mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill (UD datagram segment) in the TX descriptor (WQE) and it doesn't need to have any vlan handling. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: Xmit flow break downSaeed Mahameed
Break current mlx5e xmit flow into smaller blocks (helper functions) in order to reuse them for IPoIB SKB transmission. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net/mlx5e: IPoIB, Underlay QPSaeed Mahameed
Create IPoIB underlay QP needed by the IPoIB netdevice profile for RSS and TX HW context to perform on IPoIB traffic. Reset the underlay QP on dev_uninit ndo to stop IPoIB traffic going through this QP when the ULP IPoIB decides to cleanup. Implement attach/detach mcast RDMA netdev callbacks for later RDMA netdev use. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>