summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2018-05-25net/mlx5: Use order-0 allocations for all WQ typesTariq Toukan
Complete the transition of all WQ types to use fragmented order-0 coherent memory instead of high-order allocations. CQ-WQ already uses order-0. Here we do the same for cyclic and linked-list WQs. This allows the driver to load cleanly on systems with a highly fragmented coherent memory. Performance tests: ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet rate of 64B packets, single transmit ring, size 8K. No degradation is sensed. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5i: Use compilation flag in IPOIB headerTariq Toukan
If CONFIG_MLX5_CORE_IPOIB is not set, compile-out the IPOIB related headers. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5e: TX, Use actual WQE size for SQ edge fillTariq Toukan
We fill SQ edge with NOPs to avoid WQEs wrap. Here, instead of doing that in advance for the maximum possible WQE size, we do it on-demand using the actual WQE size. We re-order some parts in mlx5e_sq_xmit to finish the calculation of WQE size (ds_cnt) before doing any writes to the WQE buffer. When SQ work queue is fragmented (introduced in an downstream patch), dealing with WQE wraps becomes more frequent. This change would drastically reduce the overhead in this case. Performance tests: ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet rate of 64B packets, single transmit ring, size 8K. Before: 14.9 Mpps After: 15.8 Mpps Improvement of 6%. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5e: Use WQ API functions instead of direct fields accessTariq Toukan
Use the WQ API to get the WQ size, and to map a counter into a WQ entry index. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5e: Split offloaded eswitch TC rules for port mirroringChris Mi
If a TC rule needs to be split for mirroring, create two HW rules, in the first level and the second level flow tables accordingly. In the first level flow table, forward the packet to the mirror port and forward the packet to the second level flow table for further processing, eg. encap, vlan push or header re-write. Currently the matching is repeated in both stages. While here, simplify the setup of the vhca id valid indicator also in the existing code. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5e: Parse mirroring action for offloaded TC eswitch flowsChris Mi
Currently, we only support the mirred redirect TC sub-action. In order to support flow based vport mirroring, add support to parse the mirred mirror sub-action. For mirroring, user-space will typically set the action order such that the mirror port (mirror VF) sees packets as the original port (VF under mirroring) sent them or as it will receive them. In the general case, it means that packets are potentially sent to the mirror port before or after some actions were applied on them. To properly do that, we should follow on the exact action order as set for the flow and make sure this will also be the case when we program the HW offload. We introduce a counter for the output ports (attr->out_count), which we increase when parsing each mirred redirect/mirror sub-action and when dealing with encap. We introduce a counter (attr->mirror_count) telling us if split is needed. If no split is needed and mirroring is just multicasting to vport, the mirror count is zero, all the actions of the TC flow should apply on that single HW flow. If split is needed, the mirror count tells where to do the split, all non-mirred tc actions should apply only after the split. The mirror count is set while parsing the following actions encap/decap, header re-write, vlan push/pop. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5: E-switch, Create a second level FDB flow tableChris Mi
If firmware supports the forward action with a destination list that includes a flow table, create a second level FDB flow table. This is going to be used for flow based mirroring under the switchdev offloads mode. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5: E-Switch, Reorganize and rename fdb flow tablesChris Mi
We have several fdb flow tables for each of the legacy and switchdev modes. In the switchdev mode, there are fast path and slow path flow tables. Towards adding more flow tables in upcoming patches, reorganize and rename the various existing ones to reflect their functionality. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25Merge tag 'mlx5e-updates-2018-05-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-05-19 This series contains updates for mlx5e netdevice driver with one subject, DSCP to priority mapping, in the first patch Huy adds the needed API in dcbnl, the second patch adds the needed mlx5 core capability bits for the feature, and all other patches are mlx5e (netdev) only changes to add support for the feature. From: Huy Nguyen Dscp to priority mapping for Ethernet packet: These patches enable differentiated services code point (dscp) to priority mapping for Ethernet packet. Once this feature is enabled, the packet is routed to the corresponding priority based on its dscp. User can combine this feature with priority flow control (pfc) feature to have priority flow control based on the dscp. Firmware interface: Mellanox firmware provides two control knobs for this feature: QPTS register allow changing the trust state between dscp and pcp mode. The default is pcp mode. Once in dscp mode, firmware will route the packet based on its dscp value if the dscp field exists. QPDPM register allow mapping a specific dscp (0 to 63) to a specific priority (0 to 7). By default, all the dscps are mapped to priority zero. Software interface: This feature is controlled via application priority TLV. IEEE specification P802.1Qcd/D2.1 defines priority selector id 5 for application priority TLV. This APP TLV selector defines DSCP to priority map. This APP TLV can be sent by the switch or can be set locally using software such as lldptool. In mlx5 drivers, we add the support for net dcb's getapp and setapp call back. Mlx5 driver only handles the selector id 5 application entry (dscp application priority application entry). If user sends multiple dscp to priority APP TLV entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. This attribute combined with pfc attribute allows advanced user to fine tune the qos setting for specific priority queue. For example, user can give dedicated buffer for one or more priorities or user can give large buffer to certain priorities. The dcb buffer configuration will be controlled by lldptool. >> lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6 maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6 >> lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0 sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to choose dcbnl over devlink interface since this feature is intended to set port attributes which are governed by the netdev instance of that port, where devlink API is more suitable for global ASIC configurations. The firmware trust state (in QPTS register) is changed based on the number of dscp to priority application entries. When the first dscp to priority application entry is added by the user, the trust state is changed to dscp. When the last dscp to priority application entry is deleted by the user, the trust state is changed to pcp. When the port is in DSCP trust state, the transmit queue is selected based on the dscp of the skb. When the port is in DSCP trust state and vport inline mode is not NONE, firmware requires mlx5 driver to copy the IP header to the wqe ethernet segment inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-258139too: Remove unnecessary netif_napi_del()Bo Chen
The call to free_netdev() in __rtl8139_cleanup_dev() clears the network device napi list, and explicit calls to netif_napi_del() are unnecessary. Signed-off-by: Bo Chen <chenbo@pdx.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25ibmvnic: Fix partial success login retriesThomas Falcon
In its current state, the driver will handle backing device login in a loop for a certain number of retries while the device returns a partial success, indicating that the driver may need to try again using a smaller number of resources. The variable it checks to continue retrying may change over the course of operations, resulting in reallocation of resources but exits without sending the login attempt. Guard against this by introducing a boolean variable that will retain the state indicating that the driver needs to reattempt login with backing device firmware. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25qed*: Support drop action classificationManish Chopra
With this patch, User can configure for the supported flows to be dropped. Added a stat "gft_filter_drop" as well to be populated in ethtool for the dropped flows. For example - ethtool -N p5p1 flow-type udp4 dst-port 8000 action -1 ethtool -N p5p1 flow-type tcp4 scr-ip 192.168.8.1 action -1 Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25qede: Support flow classification to the VFs.Manish Chopra
With the supported classification modes [4 tuples based, udp port based, src-ip based], flows can be classified to the VFs as well. With this patch, flows can be re-directed to the requested VF provided in "action" field of command. Please note that driver doesn't really care about the queue bits in "action" field for the VFs. Since queue will be still chosen by FW using RSS hash. [I.e., the classification would be done according to vport-only] For examples - ethtool -N p5p1 flow-type udp4 dst-port 8000 action 0x100000000 ethtool -N p5p1 flow-type tcp4 src-ip 192.16.6.10 action 0x200000000 ethtool -U p5p1 flow-type tcp4 src-ip 192.168.40.100 dst-ip \ 192.168.40.200 src-port 6660 dst-port 5550 \ action 0x100000000 Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25qed*: Support other classification modes.Manish Chopra
Currently, driver supports flow classification to PF receive queues based on TCP/UDP 4 tuples [src_ip, dst_ip, src_port, dst_port] only. This patch enables to configure different flow profiles [For example - only UDP dest port or src_ip based] on the adapter so that classification can be done according to just those fields as well. Although, at a time just one type of flow configuration is supported due to limited number of flow profiles available on the device. For example - ethtool -N enp7s0f0 flow-type udp4 dst-port 45762 action 2 ethtool -N enp7s0f0 flow-type tcp4 src-ip 192.16.4.10 action 1 ethtool -N enp7s0f0 flow-type udp6 dst-port 45762 action 3 Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25qede: Validate unsupported configurationsManish Chopra
Validate and prevent some of the configurations for unsupported [by firmware] inputs [for example - mac ext, vlans, masks/prefix, tos/tclass] via ethtool -N/-U. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25qede: Refactor ethtool rx classification flow.Manish Chopra
This patch simplifies the ethtool rx flow configuration [via ethtool -U/-N] flow code base by dividing it logically into various APIs based on given protocols. It also separates various validations and calculations done along the flow in their own APIs. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25cxgb4/cxgb4vf: Notify link changes to OS-dependent codeArjun Vynipadath
We have a confusion of two different abstractions in the Common Code: Physical Link (Port) and Logical Network Interface (Virtual Interface), and we haven't been properly managing the state of the intersection of those two abstractions. On the one hand we have the Physical state of the Link -- up or down -- and on the other we have the logical state of the VI, enabled or not. {ethN} refers to both the Physical and Logical State. In this case, ifconfig only affects/interrogates the Logical State of a VI, and ethtool only deals with the Physical State. And these are different. So, just because we disable the VI, we don't really want to change the Physical Link Up/Down state. Thus, the previous hack to set "lc->link_ok = 0" when we disable a VI is completely incorrect. Where we get into trouble is where the Physical Link State and the Logical VI State cross swords. And that happens in t4_handle_get_port_info() where we need to manage/safe the Physical Link State, but we also need to know when the Logical VI State has changed and pass that back up to the OS-dependent Driver routine t4_os_link_changed() which is concerned about the Logical Interface. So we enable a VI and that causes Firmware to send us a new Port Information message, but if none of the Physical Link State particulars have changed, we don't call t4_os_link_changed(). This fix uses the existing OS Contract APIs for the Common Code to inform the OS-dependent portion of the Host Driver when the "Link" (really Logical Network Interface) is "up" or "down". A new API t4_enable_pi_params() is added which calls t4_enable_vi_params() and, if that is successful, then calls back to the OS Contract API t4_os_link_changed() notifying the OS-dependent layer of the potential Link State change. Original Work by : Casey Leedom <leedom@chelsio.com> Signed-off-by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25cxgb4: clean up init_oneGanesh Goudar
clean up init_one and use chip_ver consistently throughout init_one() for chip version. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25cxgb4/cxgb4vf: link management changes for new SFPGanesh Goudar
newer SFPs like SFP28 and QSFP28 Transceiver Modules present several new possibilities which we haven't faced before. Fix the assumptions in the code reflecting the more limited capabilities of previous Transceiver Module systems Original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25net: fec: remove stale commentYueHaibing
This comment is outdated as fec_ptp_ioctl has been replaced by fec_ptp_set/fec_ptp_get since commit 1d5244d0e43b ("fec: Implement the SIOCGHWTSTAMP ioctl") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25sfc: stop the TX queue before pushing new buffersMartin Habets
efx_enqueue_skb() can push new buffers for the xmit_more functionality. We must stops the TX queue before this or else the TX queue does not get restarted and we get a netdev watchdog. In the error handling we may now need to unwind more than 1 packet, and we may need to push the new buffers onto the partner queue. v2: In the error leg also push this queue if xmit_more is set Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2") Reported-by: Jarod Wilson <jarod@redhat.com> Tested-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Martin Habets <mhabets@solarflare.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25mlx4_core: allocate ICM memory in page size chunksQing Huang
When a system is under memory presure (high usage with fragments), the original 256KB ICM chunk allocations will likely trigger kernel memory management to enter slow path doing memory compact/migration ops in order to complete high order memory allocations. When that happens, user processes calling uverb APIs may get stuck for more than 120s easily even though there are a lot of free pages in smaller chunks available in the system. Syslog: ... Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task oracle_205573_e:205573 blocked for more than 120 seconds. ... With 4KB ICM chunk size on x86_64 arch, the above issue is fixed. However in order to support smaller ICM chunk size, we need to fix another issue in large size kcalloc allocations. E.g. Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt entry). So we need a 16MB allocation for a table->icm pointer array to hold 2M pointers which can easily cause kcalloc to fail. The solution is to use kvzalloc to replace kcalloc which will fall back to vmalloc automatically if kmalloc fails. Signed-off-by: Qing Huang <qing.huang@oracle.com> Acked-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24nfp: flower: compute link aggregation actionJohn Hurley
If the egress device of an offloaded rule is a LAG port, then encode the output port to the NFP with a LAG identifier and the offloaded group ID. A prelag action is also offloaded which must be the first action of the series (although may appear after other pre-actions - e.g. tunnels). This causes the FW to check that it has the necessary information to output to the requested LAG port. If it does not, the packet is sent to the kernel before any other actions are applied to it. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24nfp: flower: implement host cmsg handler for LAGJohn Hurley
Adds the control message handler to synchronize offloaded group config with that of the kernel. Such messages are sent from fw to driver and feature the following 3 flags: - Data: an attached cmsg could not be processed - store for retransmission - Xon: FW can accept new messages - retransmit any stored cmsgs - Sync: full sync requested so retransmit all kernel LAG group info Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24nfp: flower: monitor and offload LAG groupsJohn Hurley
Monitor LAG events via the NETDEV_CHANGEUPPER/NETDEV_CHANGELOWERSTATE notifiers to maintain a list of offloadable groups. Sync these groups with HW via a delayed workqueue to prevent excessive re-configuration. When the workqueue is triggered it may generate multiple control messages for different groups. These messages are linked via a batch ID and flags to indicate a new batch and the end of a batch. Update private data in each repr to track their LAG lower state flags. The state of a repr is used to determine the active netdevs that can be offloaded. For example, in active-backup mode, we only offload the netdev currently active. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24nfp: flower: add per repr private data for LAG offloadJohn Hurley
Add a bitmap to each flower repr to track its state if it is enslaved by a bond. This LAG state may be different to the port state - for example, the port may be up but LAG state may be down due to the selection in an active/backup bond. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24nfp: flower: check for/turn on LAG support in firmwareJohn Hurley
Check if the fw contains the _abi_flower_balance_sync_enable symbol. If it does then write a 1 to this indicating that the driver is willing to receive NIC to kernel LAG related control messages. If the write is successful, update the list of extra features supported by the fw and add a stub to accept LAG cmsgs. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24nfp: nfpcore: add rtsym writing functionJohn Hurley
Add an rtsym API function that combines the lookup of a symbol and the writing of a value to it. Values can be written as unsigned 32 or 64 bits. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24nfp: add ndo_set_mac_address for representorsJohn Hurley
Adding a netdev to a bond requires that its mac address can be modified. The default eth_mac_addr is sufficient to satisfy this requirement. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24enic: set DMA mask to 47 bitGovindarajulu Varadarajan
In commit 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then failover to DMA") DMA mask was changed from 40 bits to 64 bits. Hardware actually supports only 47 bits. Fixes: 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then failover to DMA") Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-05-24 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc). 2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers. 3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit. 4) Jiong Wang adds support for indirect and arithmetic shifts to NFP 5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible. 6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions. 7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT. 8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events. 9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ibmvnic: Introduce hard reset recoveryThomas Falcon
Introduce a recovery hard reset to handle reset failure as a result of change of device context following a transport event, such as a backing device failover or partition migration. These operations reset the device context to its initial state. If this occurs during a reset, any initialization commands are likely to fail with an invalid state error as backing device firmware requests reinitialization. When this happens, make one more attempt by performing a hard reset, which frees any resources currently allocated and performs device initialization. If a transport event occurs during a device reset, a flag is set which will trigger a new hard reset following the completionof the current reset event. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ibmvnic: Set resetting state at earliest possible pointThomas Falcon
Set device resetting state at the earliest possible point: as soon as a reset is successfully scheduled. The reset state is toggled off when all resets have been processed to completion. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ibmvnic: Create separate initialization routine for resetsThomas Falcon
Instead of having one initialization routine for all cases, create a separate, simpler function for standard initialization, such as during device probe. Use the original initialization function to handle device reset scenarios. The goal of this patch is to avoid having a single, cluttered init function to handle all possible scenarios. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ibmvnic: Handle error case when setting link stateThomas Falcon
If setting the link state is not successful, print a warning with the resulting return code and return it to be handled by the caller. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ibmvnic: Return error code if init interrupted by transport eventThomas Falcon
If device init is interrupted by a failover, set the init return code so that it can be checked and handled appropriately by the init routine. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ibmvnic: Check CRQ command return codesThomas Falcon
Check whether CRQ command is successful before awaiting a response from the management partition. If the command was not successful, the driver may hang waiting for a response that will never come. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ibmvnic: Introduce active CRQ stateThomas Falcon
Introduce an "active" state for a IBM vNIC Command-Response Queue. A CRQ is considered active once it has initialized or linked with its partner by sending an initialization request and getting a successful response back from the management partition. Until this has happened, do not allow CRQ commands to be sent other than the initialization request. This change will avoid a protocol error in case of a device transport event occurring during a initialization. When the driver receives a transport event notification indicating that the backing hardware has changed and needs reinitialization, any further commands other than the initialization handshake with the VIOS management partition will result in an invalid state error. Instead of sending a command that will be returned with an error, print a warning and return an error that will be handled by the caller. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ibmvnic: Mark NAPI flag as disabled when releasedThomas Falcon
Set adapter NAPI state as disabled if they are removed. This will allow them to be enabled again if reallocated in case of a hard reset. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24cxgb4: Check for kvzalloc allocation failureYueHaibing
t4_prep_fw doesn't check for card_fw pointer before store the read data, which could lead to a NULL pointer dereference if kvzalloc failed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24xdp: change ndo_xdp_xmit API to support bulkingJesper Dangaard Brouer
This patch change the API for ndo_xdp_xmit to support bulking xdp_frames. When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. Most of the slowdown is caused by DMA API indirect function calls, but also the net_device->ndo_xdp_xmit() call. Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed performance improved: for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps With frames avail as a bulk inside the driver ndo_xdp_xmit call, further optimizations are possible, like bulk DMA-mapping for TX. Testing without CONFIG_RETPOLINE show the same performance for physical NIC drivers. The virtual NIC driver tun sees a huge performance boost, as it can avoid doing per frame producer locking, but instead amortize the locking cost over the bulk. V2: Fix compile errors reported by kbuild test robot <lkp@intel.com> V4: Isolated ndo, driver changes and callers. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24net/mlx5: IPSec, Fix a race between concurrent sandbox QP commandsYossi Kuperman
Sandbox QP Commands are retired in the order they are sent. Outstanding commands are stored in a linked-list in the order they appear. Once a response is received and the callback gets called, we pull the first element off the pending list, assuming they correspond. Sending a message and adding it to the pending list is not done atomically, hence there is an opportunity for a race between concurrent requests. Bind both send and add under a critical section. Fixes: bebb23e6cb02 ("net/mlx5: Accel, Add IPSec acceleration interface") Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5e: When RXFCS is set, add FCS data into checksum calculationEran Ben Elisha
When RXFCS feature is enabled, the HW do not strip the FCS data, however it is not present in the checksum calculated by the HW. Fix that by manually calculating the FCS checksum and adding it to the SKB checksum field. Add helper function to find the FCS data for all SKB forms (linear, one fragment or more). Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5e: Receive buffer support for DCBXHuy Nguyen
Add dcbnl's set/get buffer configuration callback that allows user to set/get buffer size configuration and priority to buffer mapping. By default, firmware controls receive buffer configuration and priority of buffer mapping based on the changes in pfc settings. When set buffer call back is triggered, the buffer configuration changes to manual mode. The manual mode means mlx5 driver will adjust the buffer configuration accordingly based on the changes in pfc settings. ConnectX buffer stride is 128 Bytes. If the buffer size is not multiple of 128, the buffer size will be rounded down to the nearest multiple of 128. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5e: Receive buffer configurationHuy Nguyen
Add APIs for buffer configuration based on the changes in pfc configuration, cable len, buffer size configuration, and priority to buffer mapping. Note that the xoff fomula is as below xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B] xoff_threshold = buffer_size - xoff xon_threshold = xoff_threshold - MTU Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5: PPTB and PBMC register firmware command supportHuy Nguyen
Add firmware command interface to read and write PPTB and PBMC registers. PPTB register enables mappings priority to a specific receive buffer. PBMC registers enables changing the receive buffer's configuration such as buffer size, xon/xoff thresholds, buffer's lossy property and buffer's shared property. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5e: Move port speed code from en_ethtool.c to en/port.cHuy Nguyen
Move four below functions from en_ethtool.c to en/port.c. These functions are used by both en_ethtool.c and en_main.c. Future code can use these functions without ethtool link mode dependency. u32 mlx5e_port_ptys2speed(u32 eth_proto_oper); int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed); int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed); u32 mlx5e_port_speed2linkmodes(u32 speed); Delete the speed field from table mlx5e_build_ptys2ethtool_map. This table only keeps the mapping between the mlx5e link mode and ethtool link mode. Add new table mlx5e_link_speed for translation from mlx5e link mode to actual speed. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24Merge tag 'mlx5-updates-2018-05-17' of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into for-next mlx5-updates-2018-05-17 mlx5 core dirver updates for both net-next and rdma-next branches. From Christophe JAILLET, first three patche to use kvfree where needed. From: Or Gerlitz <ogerlitz@mellanox.com> Next six patches from Roi and Co adds support for merged sriov e-switch which comes to serve cases where both PFs, VFs set on them and both uplinks are to be used in single v-switch SW model. When merged e-switch is supported, the per-port e-switch is logically merged into one e-switch that spans both physical ports and all the VFs. This model allows to offload TC eswitch rules between VFs belonging to different PFs (and hence have different eswitch affinity), it also sets the some of the foundations needed for uplink LAG support. * tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5e: Explicitly set source e-switch in offloaded TC rules net/mlx5: Add source e-switch owner net/mlx5e: Explicitly set destination e-switch in FDB rules net/mlx5: Add destination e-switch owner net/mlx5: Properly handle a vport destination when setting FTE net/mlx5: Add merged e-switch cap IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()' net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()' net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
2018-05-23amd-xgbe: Improve SFP 100Mbps auto-negotiationTom Lendacky
After changing speed to 100Mbps as a result of auto-negotiation (AN), some 10/100/1000Mbps SFPs indicate a successful link (no faults or loss of signal), but cannot successfully transmit or receive data. These SFPs required an extra auto-negotiation (AN) after the speed change in order to operate properly. Add a quirk for these SFPs so that if the outcome of the AN actually results in changing to a new speed, re-initiate AN at that new speed. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23amd-xgbe: Update the BelFuse quirk to support SGMIITom Lendacky
Instead of using a quirk to make the BelFuse 1GBT-SFP06 part look like a 1000baseX part, program the SFP PHY to support SGMII and 10/100/1000 baseT. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>