summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-09net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()Yufeng Mo
The tqp_index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this tqp_index before using it in hclge_get_ring_chain_from_mbx(). Fixes: 84e095d64ed9 ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: hns3: add a check for queue_id in hclge_reset_vf_queue()Yufeng Mo
The queue_id is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this queue_id before using it in hclge_reset_vf_queue(). Fixes: 1a426f8b40fc ("net: hns3: fix the VF queue reset flow error") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: dsa: felix: implement port flushing on .phylink_mac_link_downVladimir Oltean
There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: broadcom: bcm4908enet: add BCM4908 controller driverRafał Miłecki
BCM4908 SoCs family uses Ethernel controller that includes UniMAC but uses different DMA engine (than other controllers) and requires different programming. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09dt-bindings: net: document BCM4908 Ethernet controllerRafał Miłecki
BCM4908 is a family of SoCs with integrated Ethernet controller. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-02-09 1) Support TSO on xfrm interfaces. From Eyal Birger. 2) Variable calculation simplifications in esp4/esp6. From Jiapeng Chong / Jiapeng Zhong. 3) Fix a return code in xfrm_do_migrate. From Zheng Yongjun. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09Documentation: networking: ip-sysctl: Document src_valid_mark sysctlJay Vosburgh
Provide documentation for src_valid_mark sysctl, which was added in commit 28f6aeea3f12 ("net: restore ip source validation"). Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: phy: broadcom: remove BCM5482 1000Base-BX supportMichael Walle
It is nowhere used in the kernel. It also seems to be lacking the proper fiber advertise flags. Remove it. Signed-off-by: Michael Walle <michael@walle.cc> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: phy: drop explicit genphy_read_status() opMichael Walle
genphy_read_status() is already the default for the .read_status() op. Drop the unnecessary references. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08i40e: Log error for oversized MTU on deviceEryk Rybak
When attempting to link XDP prog with MTU larger than supported, user is not informed why XDP linking fails. Adding proper error message: "MTU too large to enable XDP". Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08i40e: consolidate handling of XDP program actionsCristian Dumitrescu
Consolidate the actions performed on the packet based on the XDP program result into a separate function that is easier to read and maintain. Simplify the i40e_construct_skb_zc function, so that the input xdp buffer is always freed, regardless of whether the output skb is successfully created or not. Simplify the behavior of the i40e_clean_rx_irq_zc function, so that the current packet descriptor is dropped when function i40_construct_skb_zc returns an error as opposed to re-processing the same description on the next invocation. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08i40e: remove the redundant buffer info updatesCristian Dumitrescu
For performance reasons, remove the redundant buffer info updates (*bi = NULL). The buffers ready to be cleaned can easily be tracked based on the ring next-to-clean variable, which is consistently updated. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08i40e: remove unnecessary cleaned_count updatesCristian Dumitrescu
For performance reasons, remove the redundant updates of the cleaned_count variable, as its value can be computed based on the ring next-to-clean variable, which is consistently updated. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08i40e: remove unnecessary memory writes of the next to clean pointerCristian Dumitrescu
For performance reasons, avoid writing the ring next-to-clean pointer value back to memory on every update, as it is not really necessary. Instead, simply read it at initialization into a local copy, update the local copy as necessary and write the local copy back to memory after the last update. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08Merge branch 'route-offload-failure'David S. Miller
net: Add support for route offload failure notifications Ido Schimmel says: ==================== This is a complementary series to the one merged in commit 389cb1ecc86e ("Merge branch 'add-notifications-when-route-hardware-flags-change'"). The previous series added RTM_NEWROUTE notifications to user space whenever a route was successfully installed in hardware or when its state in hardware changed. This allows routing daemons to delay advertisement of routes until they are installed in hardware. However, if route installation failed, a routing daemon will wait indefinitely for a notification that will never come. The aim of this series is to provide a failure notification via a new flag (RTM_F_OFFLOAD_FAILED) in the RTM_NEWROUTE message. Upon such a notification a routing daemon may decide to withdraw the route from the FIB. Series overview: Patch #1 adds the new RTM_F_OFFLOAD_FAILED flag Patches #2-#3 and #4-#5 add failure notifications to IPv4 and IPv6, respectively Patches #6-#8 teach netdevsim to fail route installation via a new knob in debugfs Patch #9 extends mlxsw to mark routes with the new flag Patch #10 adds test cases for the new notification over netdevsim ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08selftests: netdevsim: Test route offload failure notificationsAmit Cohen
Add cases to verify that when debugfs variable "fail_route_offload" is set, notification with "rt_offload_failed" flag is received. Extend the existing cases to verify that when sysctl "fib_notify_on_flag_change" is set to 2, the kernel emits notifications only for failed route installation. $ ./fib_notifications.sh TEST: IPv4 route addition [ OK ] TEST: IPv4 route deletion [ OK ] TEST: IPv4 route replacement [ OK ] TEST: IPv4 route offload failed [ OK ] TEST: IPv6 route addition [ OK ] TEST: IPv6 route deletion [ OK ] TEST: IPv6 route replacement [ OK ] TEST: IPv6 route offload failed [ OK ] Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08mlxsw: spectrum_router: Set offload_failed flagAmit Cohen
When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion fails, FIB abort is triggered. After aborting, set the appropriate hardware flag to make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08netdevsim: fib: Add debugfs to debug route offload failureAmit Cohen
Add "fail_route_offload" flag to disallow offloading routes. It is needed to test "offload failed" notifications. Create the flag as part of nsim_fib_create() under fib directory and set it to false by default. When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and "fail_route_offload" value is true, set the appropriate hardware flag to make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08netdevsim: dev: Initialize FIB module after debugfsIdo Schimmel
Initialize the dummy FIB offload module after debugfs, so that the FIB module could create its own directory there. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08netdevsim: fib: Do not warn if route was not found for several eventsAmit Cohen
The next patch will add the ability to fail route offload controlled by debugfs variable called "fail_route_offload". If we vetoed the addition, we might get a delete or append notification for a route we do not have. Therefore, do not warn if route was not found. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08IPv6: Extend 'fib_notify_on_flag_change' sysctlAmit Cohen
Add the value '2' to 'fib_notify_on_flag_change' to allow sending notifications only for failed route installation. Separate value is added for such notifications because there are less of them, so they do not impact performance and some users will find them more important. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08IPv6: Add "offload failed" indication to routesAmit Cohen
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl. With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come. This patch adds an "offload_failed" indication to IPv6 routes, so that users will have better visibility into the offload process. 'struct fib6_info' is extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase struct size. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08IPv4: Extend 'fib_notify_on_flag_change' sysctlAmit Cohen
Add the value '2' to 'fib_notify_on_flag_change' to allow sending notifications only for failed route installation. Separate value is added for such notifications because there are less of them, so they do not impact performance and some users will find them more important. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08IPv4: Add "offload failed" indication to routesAmit Cohen
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl. With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come. This patch adds an "offload_failed" indication to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias', and 'struct fib_rt_info' are extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase structs size. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08rtnetlink: Add RTM_F_OFFLOAD_FAILED flagAmit Cohen
The flag indicates to user space that route offload failed. Previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, but if the offload fails there is no indication to user-space. The flag will be used in subsequent patches by netdevsim and mlxsw to indicate to user space that route offload failed, so that users will have better visibility into the offload process. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08Documentation: ice: update documentationTony Nguyen
The ice documentation has not been updated since the initial commits of the driver. Update the documentation with features and information that are now available. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: Improve MSI-X fallback logicTony Nguyen
Currently if the driver is unable to get all the MSI-X vectors it wants, it falls back to the minimum configuration which equates to a single Tx/Rx traffic queue pair. Instead of using the minimum configuration, if given more vectors than the minimum, utilize those vectors for additional traffic queues after accounting for other interrupts. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
2021-02-08ice: Fix trivial error messageMitch Williams
This message indicates an error on close, not open. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: remove unnecessary castsBruce Allan
Casting a void * rvalue in an assignment is unnecessary in C; remove the casts. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: Refactor DCB related variables out of the ice_port_info structChinh T Cao
Refactor the DCB related variables out of the ice_port_info_struct. The goal is to make the ice_port_info struct cleaner. Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: fix writeback enable logicJesse Brandeburg
The writeback enable logic was incorrectly implemented (due to misunderstanding what the side effects of the implementation would be during polling). Fix this logic issue, while implementing a new feature allowing the user to control the writeback frequency using the knobs for controlling interrupt throttling that we already have. Basically if you leave adaptive interrupts enabled, the writeback frequency will be varied even if busy_polling or if napi-poll is in use. If the interrupt rates are set to a fixed value by ethtool -C and adaptive is off, the driver will allow the user-set interrupt rate to guide how frequently the hardware will complete descriptors to the driver. Effectively the user will get a control over the hardware efficiency, allowing the choice between immediate interrupts or delayed up to a maximum of the interrupt rate, even when interrupts are disabled during polling. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: Use PSM clock frequency to calculate RL profilesBen Shelton
The core clock frequency is currently hardcoded at 446 MHz for the RL profile calculations. This causes issues since not all devices use that clock frequency. Read the GLGEN_CLKSTAT_SRC register to determine which PSM clock frequency is selected. This ensures that the rate limiter profile calculations will be correct. Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: create scheduler aggregator node config and move VSIsKiran Patil
Create set scheduler aggregator node and move for VSIs into respective scheduler node. Max children per aggregator node is 64. There are two types of aggregator node(s) created. 1. dedicated node for PF and _CTRL VSIs 2. dedicated node(s) for VFs. As part of reset and rebuild, aggregator nodes are recreated and VSIs are moved to respective aggregator node. Having related VSIs in respective tree avoid starvation between PF and VF w.r.t Tx bandwidth. Co-developed-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Co-developed-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Victor Raj <victor.raj@intel.com> Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: Add initial support framework for LAGDave Ertman
Add the framework and initial implementation for receiving and processing netdev bonding events. This is only the software support and the implementation of the HW offload for bonding support will be coming at a later time. There are some architectural gaps that need to be closed before that happens. Because this is a software only solution that supports in kernel bonding, SR-IOV is not supported with this implementation. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: Remove xsk_buff_pool from VSI structureMichal Swiatkowski
Current implementation of netdev already contains xsk_buff_pools. We no longer have to contain these structures in ice_vsi. Refactor the code to operate on netdev-provided xsk_buff_pools. Move scheduling napi on each queue to a separate function to simplify setup function. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: implement new LLDP filter commandDave Ertman
There is an issue with some NVMs where an already existent LLDP filter is blocking the creation of a filter to allow LLDP packets to be redirected to the default VSI for the interface. This is blocking all LLDP functionality based in the kernel when the FW LLDP agent is disabled (e.g. software based DCBx). Implement the new AQ command to allow adding VSI destinations to existent filters on NVM versions that support the new command. The new lldp_fltr_ctrl AQ command supports Rx filters only, so the code flow for adding filters to disable Tx of control frames will remain intact. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: log message when trusted VF goes in/out of promisc modeBrett Creeley
Currently there is no message printed on the host when a VF goes in and out of promiscuous mode. This is causing confusion because this is the expected behavior based on i40e. Fix this. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08Merge branch 'bridge-mrp'David S. Miller
Horatiu Vultur says: ==================== bridge: mrp: Fix br_mrp_port_switchdev_set_state Based on the discussion here[1], there was a problem with the function br_mrp_port_switchdev_set_state. The problem was that it was called both with BR_STATE* and BR_MRP_PORT_STATE* types. This patch series fixes this issue and removes SWITCHDEV_ATTR_ID_MRP_PORT_STAT because is not used anymore. [1] https://www.spinics.net/lists/netdev/msg714816.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STATHoratiu Vultur
Now that MRP started to use also SWITCHDEV_ATTR_ID_PORT_STP_STATE to notify HW, then SWITCHDEV_ATTR_ID_MRP_PORT_STAT is not used anywhere else, therefore we can remove it. Fixes: c284b545900830 ("switchdev: mrp: Extend switchdev API to offload MRP") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_stateHoratiu Vultur
The function br_mrp_port_switchdev_set_state was called both with MRP port state and STP port state, which is an issue because they don't match exactly. Therefore, update the function to be used only with STP port state and use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE. The choice of using STP over MRP is that the drivers already implement SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port STP state. Fixes: 9a9f26e8f7ea30 ("bridge: mrp: Connect MRP API with the switchdev API") Fixes: fadd409136f0f2 ("bridge: switchdev: mrp: Implement MRP API for switchdev") Fixes: 2f1a11ae11d222 ("bridge: mrp: Add MRP interface.") Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08net: watchdog: hold device global xmit lock during tx disableEdwin Peer
Prevent netif_tx_disable() running concurrently with dev_watchdog() by taking the device global xmit lock. Otherwise, the recommended: netif_carrier_off(dev); netif_tx_disable(dev); driver shutdown sequence can happen after the watchdog has already checked carrier, resulting in possible false alarms. This is because netif_tx_lock() only sets the frozen bit without maintaining the locks on the individual queues. Fixes: c3f26a269c24 ("netdev: Fix lockdep warnings in multiqueue configurations.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08Merge tag 'mlx5-updates-2021-02-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2021-02-04 Vlad Buslov says: ================= Implement support for VF tunneling Abstract Currently, mlx5 only supports configuration with tunnel endpoint IP address on uplink representor. Remove implicit and explicit assumptions of tunnel always being terminated on uplink and implement necessary infrastructure for configuring tunnels on VF representors and updating rules on such tunnels according to routing changes. SW TC model From TC perspective VF tunnel configuration requires two rules in both directions: TX rules 1. Rule that redirects packets from UL to VF rep that has the tunnel endpoint IP address: $ tc -s filter show dev enp8s0f0 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 16:c9:a0:2d:69:2c src_mac 0c:42:a1:58:ab:e4 eth_type ipv4 ip_flags nofrag in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen index 3 ref 1 bind 1 installed 377 sec used 0 sec Action statistics: Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 114096 bytes 952 pkt backlog 0b 0p requeues 0 cookie 878fa48d8c423fc08c3b6ca599b50a97 no_percpu used_hw_stats delayed 2. Rule that decapsulates the tunneled flow and redirects to destination VF representor: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed RX rules 1. Rule that encapsulates the tunneled flow and redirects packets from source VF rep to tunnel device: $ tc -s filter show dev enp8s0f0_1 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 0a:40:bd:30:89:99 src_mac ca:2e:a7:3f:f5:0f eth_type ipv4 ip_tos 0/0x3 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key set src_ip 7.7.7.5 dst_ip 7.7.7.1 key_id 98 dst_port 4789 nocsum ttl 64 pipe index 1 ref 1 bind 1 installed 411 sec used 411 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu used_hw_stats delayed action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen index 1 ref 1 bind 1 installed 411 sec used 0 sec Action statistics: Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 5615833 bytes 4028 pkt backlog 0b 0p requeues 0 cookie bb406d45d343bf7ade9690ae80c7cba4 no_percpu used_hw_stats delayed 2. Rule that redirects from tunnel device to UL rep: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed HW offloads model For hardware offload the goal is to mach packet on both rules without exposing it to software on tunnel endpoint VF. In order to achieve this for tx, TC implementation marks encap rules with tunnel endpoint on mlx5 VF of same eswitch with MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE flag and adds header modification rule to overwrite packet source port to the value of tunnel VF. Eswitch code is modified to recirculate such packets after source port value is changed, which allows second tx rules to match. For rx path indirect table infrastructure is used to allow fully processing VF tunnel traffic in hardware. To implement such pipeline driver needs to program the hardware after matching on UL rule to overwrite source vport from UL to tunnel VF and recirculate the packet to the root table to allow matching on the rule installed on tunnel VF. For this, indirect table matches all encapsulated traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by the miss rule. Such configuration will cause packet to appear on VF representor instead of VF itself if packet has been matches by indirect table rule based on tunnel parameters but missed on second rule (after recirculation). Handle such case by marking packets processed by indirect table with special 0xFFF value in reg_c1 and extending slow table with additional flow group that matches on reg_c0 (source port value set by indirect tables) and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install one rule per VF vport to match on recirculated miss packets and redirect them to appropriate VF vport. Routing events In order to support routing changes and migration of tunnel device between different endpoint VFs, implement routing infrastructure and update it with FIB events. Routing entry table is introduced to mlx5 TC. Every rx and tx VF tunnel rule is attached to a routing entry, which is shared for rules of same tunnel. On FIB event the work is scheduled to delete/recreate all rules of affected tunnel. Note: only vxlan tunnel type is supported by this series. =================
2021-02-08cxgb4: remove unused vpd_cap_addrHeiner Kallweit
It is likely that this is a leftover from T3 driver heritage. cxgb4 uses the PCI core VPD access code that handles detection of VPD capabilities. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09netfilter: nftables: relax check for stateful expressions in set definitionPablo Neira Ayuso
Restore the original behaviour where users are allowed to add an element with any stateful expression if the set definition specifies no stateful expressions. Make sure upper maximum number of stateful expressions of NFT_SET_EXPR_MAX is not reached. Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions support") Fixes: 48b0ae046ee9 ("netfilter: nftables: netlink support for several set element expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-08net: bridge: use switchdev for port flags set through sysfs tooVladimir Oltean
Looking through patchwork I don't see that there was any consensus to use switchdev notifiers only in case of netlink provided port flags but not sysfs (as a sort of deprecation, punishment or anything like that), so we should probably keep the user interface consistent in terms of functionality. http://patchwork.ozlabs.org/project/netdev/patch/20170605092043.3523-3-jiri@resnulli.us/ http://patchwork.ozlabs.org/project/netdev/patch/20170608064428.4785-3-jiri@resnulli.us/ Fixes: 3922285d96e7 ("net: bridge: Add support for offloading port attributes") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08selftests: tc-testing: u32: Add tests covering sample optionPhil Sutter
Kernel's key folding basically consists of shifting away least significant zero bits in mask and masking the resulting value with (divisor - 1). Test for u32's 'sample' option to behave identical. Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socketXin Long
In rxrpc_open_socket(), now it's using sock_create_kern() and kernel_bind() to create a udp tunnel socket, and other kernel APIs to set up it. These code can be replaced with udp tunnel APIs udp_sock_create() and setup_udp_tunnel_sock(), and it'll simplify rxrpc_open_socket(). Note that with this patch, the udp tunnel socket will always bind to a random port if transport is not provided by users, which is suggested by David Howells, thanks! Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08net-sysfs: Add rtnl locking for getting Tx queue traffic classAlexander Duyck
In order to access the suboordinate dev for a device we should be holding the rtnl_lock when outside of the transmit path. The existing code was not doing that for the sysfs dump function and as a result we were open to a possible race. To resolve that take the rtnl lock prior to accessing the sb_dev field of the Tx queue and release it after we have retrieved the tc for the queue. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09netfilter: conntrack: skip identical origin tuple in same zone onlyFlorian Westphal
The origin skip check needs to re-test the zone. Else, we might skip a colliding tuple in the reply direction. This only occurs when using 'directional zones' where origin tuples reside in different zones but the reply tuples share the same zone. This causes the new conntrack entry to be dropped at confirmation time because NAT clash resolution was elided. Fixes: 4e35c1cb9460240 ("netfilter: nf_nat: skip nat clash resolution for same-origin entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-08nfc: st-nci: Remove unnecessary variablewengjianfeng
The variable r is defined at the beginning and initialized to 0 until the function returns r, and the variable r is not reassigned.Therefore, we do not need to define the variable r, just return 0 directly at the end of the function. Signed-off-by: wengjianfeng <wengjianfeng@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net>