summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2021-04-16net: enetc: fix buffer leaks with XDP_TX enqueue rejectionsVladimir Oltean
If the TX ring is congested, enetc_xdp_tx() returns false for the current XDP frame (represented as an array of software BDs). This array of software TX BDs is constructed in enetc_rx_swbd_to_xdp_tx_swbd from software BDs freshly cleaned from the RX ring. The issue is that we scrub the RX software BDs too soon, more precisely before we know that we can enqueue the TX BDs successfully into the TX ring. If we can't enqueue them (and enetc_xdp_tx returns false), we call enetc_xdp_drop which attempts to recycle the buffers held by the RX software BDs. But because we scrubbed those RX BDs already, two things happen: (a) we leak their memory (b) we populate the RX software BD ring with an all-zero rx_swbd structure, which makes the buffer refill path allocate more memory. enetc_refill_rx_ring -> if (unlikely(!rx_swbd->page)) -> enetc_new_page That is a recipe for fast OOM. Fixes: 7ed2bc80074e ("net: enetc: add support for XDP_TX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: enetc: handle the invalid XDP action the same way as XDP_DROPVladimir Oltean
When the XDP program returns an invalid action, we should free the RX buffer. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: enetc: use dedicated TX rings for XDPVladimir Oltean
It is possible for one CPU to perform TX hashing (see netdev_pick_tx) between the 8 ENETC TX rings, and the TX hashing to select TX queue 1. At the same time, it is possible for the other CPU to already use TX ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual exclusion between XDP and the network stack, we run into an issue because the ENETC TX procedure is not reentrant. The obvious approach would be to just make XDP take the lock of the network stack's TX queue corresponding to the ring it's about to enqueue in. For XDP_REDIRECT, this is quite straightforward, a lock at the beginning and end of enetc_xdp_xmit() should do the trick. But for XDP_TX, it's a bit more complicated. For one, we do TX batching all by ourselves for frames with the XDP_TX verdict. This is something we would like to keep the way it is, for performance reasons. But batching means that the network stack's lock should be kept from the first enqueued XDP_TX frame and until we ring the doorbell. That is mostly fine, except for cases when in the same NAPI loop we have mixed XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while we are holding the lock from the RX NAPI, then bam, deadlock. The naive answer could be 'just flush the XDP_TX frames first, then release the network stack's TX queue lock, then call xdp_do_flush_map()'. But even xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT frames, so unless we unlock/relock the TX queue around xdp_do_redirect(), there simply isn't any clean way to protect XDP_TX from concurrent network stack .ndo_start_xmit() on another CPU. So we need to take a different approach, and that is to reserve two rings for the sole use of XDP. We leave TX rings 0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we pick them from the end of the priv->tx_ring array. We make an effort to keep the mapping done by enetc_alloc_msix() which decides which CPU handles the TX completions of which TX ring in its NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the XDP TX ring of CPU 1 is handled by TX ring 7. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: enetc: increase TX ring sizeVladimir Oltean
Now that commit d6a2829e82cf ("net: enetc: increase RX ring default size") has increased the RX ring size, it is quite easy to congest the TX rings when the traffic is predominantly XDP_TX, as the RX ring is quite a bit larger than the TX one. Since we bit the bullet and did the expensive thing already (larger RX rings consume more memory pages), it seems quite foolish to keep the TX rings small. So make them equally sized with TX. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: enetc: remove unneeded xdp_do_flush_map()Vladimir Oltean
xdp_do_redirect already contains: -> dev_map_enqueue -> __xdp_enqueue -> bq_enqueue -> bq_xmit_all // if we have more than 16 frames So the logic from enetc will never be hit, because ENETC_DEFAULT_TX_WORK is 128. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: enetc: stop XDP NAPI processing when build_skb() failsVladimir Oltean
When the code path below fails: enetc_clean_rx_ring_xdp // XDP_PASS -> enetc_build_skb -> enetc_map_rx_buff_to_skb -> build_skb enetc_clean_rx_ring_xdp will 'break', but that 'break' instruction isn't strong enough to actually break the NAPI poll loop, just the switch/case statement for XDP actions. So we increment rx_frm_cnt and go to the next frames minding our own business. Instead let's do what the skb NAPI poll function does, and break the loop now, waiting for the memory pressure to go away. Otherwise the next calls to build_skb() are likely to fail too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: enetc: recycle buffers for frames with RX errorsVladimir Oltean
When receiving a frame with errors, currently we do nothing with it (we don't construct an skb or an xdp_buff), we just exit the NAPI poll loop. Let's put the buffer back into the RX ring (similar to XDP_DROP). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: enetc: rename the buffer reuse helpersVladimir Oltean
enetc_put_xdp_buff has nothing to do with XDP, frankly, it is just a helper to populate the recycle end of the shadow RX BD ring (next_to_alloc) with a given buffer. On the other hand, enetc_put_rx_buff plays more tricks than its name would suggest. So let's rename enetc_put_rx_buff into enetc_flip_rx_buff to reflect the half-page buffer reuse tricks that it employs, and enetc_put_xdp_buff into enetc_put_rx_buff which suggests a more garden-variety operation. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: enetc: remove redundant clearing of skb/xdp_frame pointer in TX conf pathVladimir Oltean
Later in enetc_clean_tx_ring we have: /* Scrub the swbd here so we don't have to do that * when we reuse it during xmit */ memset(tx_swbd, 0, sizeof(*tx_swbd)); So these assignments are unnecessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2021-04-16 This series contains updates to igb and igc drivers. Ederson adjusts Tx buffer distributions in Qav mode to improve TSN-aware traffic for igb. He also enable PPS support and auxiliary PHC functions for igc. Grzegorz checks that the MTA register was properly written and retries if not for igb. Sasha adds reporting of EEE low power idle counters to ethtool and fixes a return value being overwritten through looping for igc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mlx5: implement ethtool standard statsJakub Kicinski
Add support for PHY/MAC/Ctrl/RMON stats. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16bnxt: implement ethtool standard statsJakub Kicinski
Most of the names seem to strongly correlate with names from the standard and RFC. Whether ..+good_frames are indeed Frames..OK I'm the least sure of. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mlxsw: implement ethtool standard statsJakub Kicinski
mlxsw has nicely grouped stats, add support for standard uAPI. I'm guessing the register access part. Compile tested only. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16Merge tag 'mlx5-updates-2021-04-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-04-16 This patchset introduces updates to mlx5e netdev driver. 1) Tariq refactors TLS offloads and adds resiliency against RX resync failures 2) Maxim reduces code duplications by unifying channels reset flow regardless if channels are closed or open 3) Aya Enhances TX/RX health reporters diagnostics to expose the internal clock time-stamping format 4) Moshe adds support for ethtool extended link state, to show the reason for link down ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16gianfar: Drop GFAR_MQ_POLLING supportClaudiu Manoil
Gianfar used to enable all 8 Rx queues (DMA rings) per ethernet device, even though the controller can only support 2 interrupt lines at most. This meant that multiple Rx queues would have to be grouped per NAPI poll routine, and the CPU would have to split the budget and service them in a round robin manner. The overhead of this scheme proved to outweight the potential benefits. The alternative was to introduce the "Single Queue" polling mode, supporting one Rx queue per NAPI, which became the default packet processing option and helped improve the performance of the driver. MQ_POLLING also relies on undocumeted device tree properties to specify how to map the 8 Rx and Tx queues to a given interrupt line (aka "interrupt group"). Using module parameters to enable this mode wasn't an option either. Long story short, MQ_POLLING became obsolete, now it is just dead code, and no one asked for it so far. For the Tx queues, multi-queue support (more than 1 Tx queue per CPU) could be revisited by adding tc MQPRIO support, but again, one has to consider that there are only 2 interrupt lines. So the NAPI poll routine would have to service multiple Tx rings. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: mvpp2: Add parsing support for different IPv4 IHL valuesStefan Chulski
Add parser entries for different IPv4 IHL values. Each entry will set the L4 header offset according to the IPv4 IHL field. L3 header offset will set during the parsing of the IPv4 protocol. Because of missed parser support for IP header length > 20, RX IPv4 checksum HW offload fails and skb->ip_summed set to CHECKSUM_NONE(checksum done by Network stack). This patch adds RX IPv4 checksum HW offload capability for frames with IP header length > 20. v1 --> v2 - Improve commit message. Suggested-by: Dana Vardi <danat@marvell.com> Signed-off-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: ethernet: mediatek: ppe: fix busy wait loopIlya Lipnitskiy
The intention is for the loop to timeout if the body does not succeed. The current logic calls time_is_before_jiffies(timeout) which is false until after the timeout, so the loop body never executes. Fix by using readl_poll_timeout as a more standard and less error-prone solution. Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE") Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Cc: Felix Fietkau <nbd@nbd.name> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16atl1c: move tx cleanup processing out of interruptGatis Peisenieks
Tx queue cleanup happens in interrupt handler on same core as rx queue processing. Both can take considerable amount of processing in high packet-per-second scenarios. Sending big amounts of packets can stall the rx processing which is unfair and also can lead to out-of-memory condition since __dev_kfree_skb_irq queues the skbs for later kfree in softirq which is not allowed to happen with heavy load in interrupt handler. This puts tx cleanup in its own napi and enables threaded napi to allow the rx/tx queue processing to happen on different cores. The ability to sustain equal amounts of tx/rx traffic increased: from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming Mikrotik 10/25G NIC, from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter. Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: bridge: switchdev: include local flag in FDB notificationsVladimir Oltean
As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify switchdev for local FDB addresses") as well as in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug, which was that drivers would not know what to do with local FDB entries, because they were not told that they are local. The bug fix was to simply not notify them of those addresses. Let us now add the 'is_local' bit to bridge FDB entries, and make all drivers ignore these entries by their own choice. Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16igc: Expose LPI countersSasha Neftin
Expose EEE Tx and Rx low power idle counters via ethtool A EEE TX or RX LPI event occurs when the transmitter or the receiver enters EEE (IEEE802.3az) LPI state. ethtool --statistics <iface> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16igc: Fix overwrites return valueSasha Neftin
drivers/net/ethernet/intel/igc/igc_i225.c:235 igc_write_nvm_srwr() warn: loop overwrites return value 'ret_val' Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16igc: enable auxiliary PHC functions for the i225Ederson de Souza
The i225 device offers a number of special PTP Hardware Clock features on the Software Defined Pins (SDPs) - much like i210, which is used as inspiration for this patch. It enables two possible functions, namely time stamping external events and periodic output signals. The assignment of PHC functions to the four SDP can be freely chosen by the user. For the external events time stamping, when the SDP (configured as input by user) level changes, an interrupt is generated and the kernel Precision Time Protocol (PTP) is informed. For the periodic output signals, the i225 is configured to generate them (so the SDP level will change periodically) and the driver also has to keep updating the time of the next level change. However, this work is not necessary for some frequencies as the i225 takes care of them (namely, anything with a half-cycle of 500ms, 250ms, 125ms or < 70ms). While i225 allows up to four timers to be used to source the time used on the external events or output signals, this patch uses only one of those timers. Main reason is to keep it simple, as it's not clear how these extra timers would be exposed to users. Note that currently a NIC can expose a single PTP device. Signed-off-by: Ederson de Souza <ederson.desouza@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16igc: Enable internal i225 PPSEderson de Souza
The i225 device can produce one interrupt on the full second, much like i210 - from where this patch is inspired. This patch sets up the full second interruption on the i225 and when receiving it, it sends a PPS event to PTP (Precision Time Protocol) kernel subsystem. The PTP subsystem exposes the PPS events via ioctl and sysfs, and one can use the `testptp` tool (tools/testing/selftests/ptp) to check that the events are being generated. Signed-off-by: Ederson de Souza <ederson.desouza@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16igb: Add double-check MTA_REGISTER for i210 and i211Grzegorz Siwik
Add new function which checks MTA_REGISTER if its filled correctly. If not then writes again to same register. There is possibility that i210 and i211 could not accept MTA_REGISTER settings, specially when you add and remove many of multicast addresses in short time. Without this patch there is possibility that multicast settings will be not always set correctly in hardware. Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16net/mlx5: Enhance diagnostics info for TX/RX reportersAya Levin
Add ts_format to 'Common Config' section of the TX/RX devlink reporters diagnostics info. Possible values for ts_format: 'RT' or 'FRC' which stands for: Real Time and Free Running Counters correspondingly. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5: Add helper to initialize 1PPSAya Levin
Wrap 1PPS initialization in a helper for a cleaner init flow. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: Add ethtool extended link stateMoshe Tal
In case the interface was set up but cannot establish the link, ethtool will print more information to help the user troubleshoot the state. For example, no link due to missing cable: $ ethtool eth1 ... Link detected: no (No cable) Beside the general extended state, drivers can pass additional information about the link state using the sub-state field. For example: $ ethtool eth1 ... Link detected: no (Autoneg, No partner detected) The extended state is available only for specific cases, in other cases ethtool with print only "Link detected: no" as before Signed-off-by: Moshe Tal <moshet@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5: Allocate FC bulk structs with kvzalloc() instead of kzalloc()Maor Dickman
The bulk size is larger than 16K so use kvzalloc(). The bulk bitmask upper size limit is 16K so use kvcalloc(). Signed-off-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: Cleanup safe switch channels API by passing paramsMaxim Mikityanskiy
mlx5e_safe_switch_channels accepts new_chs as a parameter and opens new channels in place, then copying them to priv->channels. It requires all the callers to allocate space for this temporary storage of the new channels. This commit cleans up the API by replacing new_chs with new_params, a meaningful subset of new_chs to be filled by the caller. The temporary space for the new channels is allocated inside mlx5e_safe_switch_params (a new name for mlx5e_safe_switch_channels). An extra copy of params is made, but since it's control flow, it's not critical. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: Refactor on-the-fly configuration changesMaxim Mikityanskiy
This commit extends mlx5e_safe_switch_channels() to support on-the-fly configuration changes, when the channels are open, but don't need to be recreated. Such flows exist when a parameter being changed doesn't affect how the queues are created, or when the queues can be modified while remaining active. Before this commit, such flows were handled as special cases on the caller site. This commit adds this functionality to mlx5e_safe_switch_channels(), allowing the caller to pass a boolean indicating whether it's required to recreate the channels or it's allowed to skip it. The logic of switching channel parameters is now completely encapsulated into mlx5e_safe_switch_channels(). Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: Use mlx5e_safe_switch_channels when channels are closedMaxim Mikityanskiy
This commit uses new functionality of mlx5e_safe_switch_channels introduced by the previous commit to reduce the amount of repeating similar code all over the driver. It's very common in mlx5e to call mlx5e_safe_switch_channels when the channels are open, but assign parameters and run hardware commands manually when the channels are closed. After the previous commit it's no longer needed to do such manual things every time, so this commit removes unneeded code and relies on the new functionality of mlx5e_safe_switch_channels. Some of the places are refactored and simplified, where more complex flows are used to change configuration on the fly, without recreating the channels (the logic is rewritten in a more robust way, with a reset required by default and a list of exceptions). Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: Allow mlx5e_safe_switch_channels to work with channels closedMaxim Mikityanskiy
mlx5e_safe_switch_channels is used to modify channel parameters and/or hardware configuration in a safe way, so that if anything goes wrong, everything reverts to the old configuration and remains in a consistent state. However, this function only works when the channels are open. When the caller needs to modify some parameters, first it has to check that the channels are open, otherwise it has to assign parameters directly, and such boilerplate repeats in many different places. This commit prepares for the refactoring of such places by allowing mlx5e_safe_switch_channels to work when the channels are closed. In this case it will assign the new parameters and run the preactivate hook. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: kTLS, Add resiliency to RX resync failuresTariq Toukan
When the TLS logic finds a tcp seq match for a kTLS RX resync request, it calls the driver callback function mlx5e_ktls_resync() to handle it and communicate it to the device. Errors might occur during mlx5e_ktls_resync(), however, they are not reported to the stack. Moreover, there is no error handling in the stack for these errors. In this patch, the driver obtains responsibility on errors handling, adding queue and retry mechanisms to these resyncs. We maintain a linked list of resync matches, and try posting them to the async ICOSQ in the NAPI context. Only possible failure that demands driver handling is ICOSQ being full. By relying on the NAPI mechanism, we make sure that the entries in list will be handled when ICOSQ completions arrive and make some room available. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: TX, Inline function mlx5e_tls_handle_tx_wqe()Tariq Toukan
When TLS is supported, WQE ctrl segment of every transmitted packet is updated with the (possibly empty, for non-TLS packets) TISN field. Take this one-liner function into the header file and inline it, to save the overhead of a function call per packet. While here, remove unused function parameter. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: TX, Inline TLS skb checkTariq Toukan
When TLS is supported and enabled, every transmitted packet is tested to identify if TLS offload is required. Take the early-return condition into an inline function, to save the overhead of a function call for non-TLS packets. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: Cleanup unused function parameterTariq Toukan
Socket parameter is not used in accel_rule_init(), remove it. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: Remove non-essential TLS SQ state bitTariq Toukan
Maintaining an SQ state bit to indicate TLS support has no real need, a simple and fast test [1] for the SKB is almost equally good. [1] !skb->sk || !tls_is_sk_tx_device_offloaded(skb->sk) Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16igb: Redistribute memory for transmit packet buffers when in Qav modeEderson de Souza
i210 has a total of 24KB of transmit packet buffer. When in Qav mode, this buffer is divided into four pieces, one for each Tx queue. Currently, 8KB are given to each of the two SR queues and 4KB are given to each of the two SP queues. However, it was noticed that such distribution can make best effort traffic (which would usually go to the SP queues when Qav is enabled, as the SR queues would be used by ETF or CBS qdiscs for TSN-aware traffic) perform poorly. Using iperf3 to measure, one could see the performance of best effort traffic drop by nearly a third (from 935Mbps to 578Mbps), with no TSN traffic competing. This patch redistributes the 24KB to each queue equally: 6KB each. On tests, there was no notable performance reduction of best effort traffic performance when there was no TSN traffic competing. Below, more details about the data collected: All experiments were run using the following qdisc setup: qdisc taprio 100: root refcnt 9 tc 4 map 3 3 3 2 3 0 0 3 3 3 3 3 3 3 3 3 queues offset 0 count 1 offset 1 count 1 offset 2 count 1 offset 3 count 1 clockid TAI base-time 0 cycle-time 10000000 cycle-time-extension 0 index 0 cmd S gatemask 0xf interval 10000000 qdisc etf 8045: parent 100:1 clockid TAI delta 1000000 offload on deadline_mode off skip_sock_check off TSN traffic, when enabled, had this characteristics: Packet size: 1500 bytes Transmission interval: 125us ---------------------------------- Without this patch: ---------------------------------- - TCP data: - No TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.35 GBytes 578 Mbits/sec 0 - With TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.07 GBytes 460 Mbits/sec 1 - TCP data limiting iperf3 buffer size to 4K: - No TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.35 GBytes 579 Mbits/sec 0 - With TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.08 GBytes 462 Mbits/sec 0 - TCP data limiting iperf3 buffer size to 192 bytes (smallest size without serious performance degradation): - No TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.34 GBytes 577 Mbits/sec 0 - With TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.07 GBytes 461 Mbits/sec 1 - UDP data at 1000Mbit/sec: - No TSN traffic: [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-20.00 sec 1.36 GBytes 586 Mbits/sec 0.000 ms 0/1011407 (0%) - With TSN traffic: [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-20.00 sec 1.05 GBytes 451 Mbits/sec 0.000 ms 0/778672 (0%) ---------------------------------- With this patch: ---------------------------------- - TCP data: - No TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 2.17 GBytes 932 Mbits/sec 0 - With TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.50 GBytes 646 Mbits/sec 1 - TCP data limiting iperf3 buffer size to 4K: - No TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 2.17 GBytes 931 Mbits/sec 0 - With TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.50 GBytes 645 Mbits/sec 0 - TCP data limiting iperf3 buffer size to 192 bytes (smallest size without serious performance degradation): - No TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 2.17 GBytes 932 Mbits/sec 1 - With TSN traffic: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 1.50 GBytes 645 Mbits/sec 0 - UDP data at 1000Mbit/sec: - No TSN traffic: [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-20.00 sec 2.23 GBytes 956 Mbits/sec 0.000 ms 0/1650226 (0%) - With TSN traffic: [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-20.00 sec 1.51 GBytes 649 Mbits/sec 0.000 ms 0/1120264 (0%) Signed-off-by: Ederson de Souza <ederson.desouza@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-15mlx5: implement ethtool::get_fec_statsJakub Kicinski
Report corrected bits. v2: catch reg access errors (Saeed) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15sfc: ef10: implement ethtool::get_fec_statsJakub Kicinski
Report what appears to be the standard block counts: - 30.5.1.1.17 aFECCorrectedBlocks - 30.5.1.1.18 aFECUncorrectableBlocks Don't report the per-lane symbol counts, if those really count symbols they are not what the standard calls for (even if symbols seem like the most useful thing to count.) Fingers crossed that fec_corrected_errors is not in symbols. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15bnxt: implement ethtool::get_fec_statsJakub Kicinski
Report corrected bits. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15ch_ktls: do not send snd_una update to TCB in middleVinay Kumar Yadav
snd_una update should not be done when the same skb is being sent out.chcr_short_record_handler() sends it again even though SND_UNA update is already sent for the skb in chcr_ktls_xmit(), which causes mismatch in un-acked TCP seq number, later causes problem in sending out complete record. Fixes: 429765a149f1 ("chcr: handle partial end part of a record") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15ch_ktls: tcb close causes tls connection failureVinay Kumar Yadav
HW doesn't need marking TCB closed. This TCB state change sometimes causes problem to the new connection which gets the same tid. Fixes: 34aba2c45024 ("cxgb4/chcr : Register to tls add and del callback") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15ch_ktls: fix device connection closeVinay Kumar Yadav
When sge queue is full and chcr_ktls_xmit_wr_complete() returns failure, skb is not freed if it is not the last tls record in this skb, causes refcount never gets freed and tls_dev_del() never gets called on this connection. Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15ch_ktls: Fix kernel panicVinay Kumar Yadav
Taking page refcount is not ideal and causes kernel panic sometimes. It's better to take tx_ctx lock for the complete skb transmit, to avoid page cleanup if ACK received in middle. Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15enetc: convert to schedule_work()Yangbo Lu
Convert system_wq queue_work() to schedule_work() which is a wrapper around it, since the former is a rare construct. Fixes: 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15net: hns3: VF not request link status when PF support push link status featureGuangbin Huang
To reduce the processing of unnecessary mailbox command when PF supports actively push its link status to VFs, VFs stop sending request link status command in periodic service task in this case. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15net: hns3: PF add support for pushing link status to VFsGuangbin Huang
Previously, VF updates its link status every second by send query command to PF in periodic service task. If link stats of PF is changed, VF may need at most one second to update its link status. To reduce delay of link status between PF and VFs, PF actively push its link status to VFs when its link status is updated. And to let VF know PF supports this new feature, the link status changed mailbox command adds one bit to indicate it. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15Merge tag 'mlx5-fixes-2021-04-14' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-04-14 This series provides 3 small fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15i40e: fix the panic when running bpf in xdpdrv modeJason Xing
Fix this panic by adding more rules to calculate the value of @rss_size_max which could be used in allocating the queues when bpf is loaded, which, however, could cause the failure and then trigger the NULL pointer of vsi->rx_rings. Prio to this fix, the machine doesn't care about how many cpus are online and then allocates 256 queues on the machine with 32 cpus online actually. Once the load of bpf begins, the log will go like this "failed to get tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI failed". Thus, I attach the key information of the crash-log here. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e] Call Trace: [2160294.717292] ? i40e_reconfig_rss_queues+0x170/0x170 [i40e] [2160294.717666] dev_xdp_install+0x4f/0x70 [2160294.718036] dev_change_xdp_fd+0x11f/0x230 [2160294.718380] ? dev_disable_lro+0xe0/0xe0 [2160294.718705] do_setlink+0xac7/0xe70 [2160294.719035] ? __nla_parse+0xed/0x120 [2160294.719365] rtnl_newlink+0x73b/0x860 Fixes: 41c445ff0f48 ("i40e: main driver core") Co-developed-by: Shujin Li <lishujin@kuaishou.com> Signed-off-by: Shujin Li <lishujin@kuaishou.com> Signed-off-by: Jason Xing <xingwanli@kuaishou.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>