summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-17Merge branch 'vxlan-MDB-support'David S. Miller
Ido Schimmel says: ==================== vxlan: Add MDB support tl;dr ===== This patchset implements MDB support in the VXLAN driver, allowing it to selectively forward IP multicast traffic to VTEPs with interested receivers instead of flooding it to all the VTEPs as BUM. The motivating use case is intra and inter subnet multicast forwarding using EVPN [1][2], which means that MDB entries are only installed by the user space control plane and no snooping is implemented, thereby avoiding a lot of unnecessary complexity in the kernel. Background ========== Both the bridge and VXLAN drivers have an FDB that allows them to forward Ethernet frames based on their destination MAC addresses and VLAN/VNI. These FDBs are managed using the same PF_BRIDGE/RTM_*NEIGH netlink messages and bridge(8) utility. However, only the bridge driver has an MDB that allows it to selectively forward IP multicast packets to bridge ports with interested receivers behind them, based on (S, G) and (*, G) MDB entries. When these packets reach the VXLAN driver they are flooded using the "all-zeros" FDB entry (00:00:00:00:00:00). The entry either includes the list of all the VTEPs in the tenant domain (when ingress replication is used) or the multicast address of the BUM tunnel (when P2MP tunnels are used), to which all the VTEPs join. Networks that make heavy use of multicast in the overlay can benefit from a solution that allows them to selectively forward IP multicast traffic only to VTEPs with interested receivers. Such a solution is described in the next section. Motivation ========== RFC 7432 [3] defines a "MAC/IP Advertisement route" (type 2) [4] that allows VTEPs in the EVPN network to advertise and learn reachability information for unicast MAC addresses. Traffic destined to a unicast MAC address can therefore be selectively forwarded to a single VTEP behind which the MAC is located. The same is not true for IP multicast traffic. Such traffic is simply flooded as BUM to all VTEPs in the broadcast domain (BD) / subnet, regardless if a VTEP has interested receivers for the multicast stream or not. This is especially problematic for overlay networks that make heavy use of multicast. The issue is addressed by RFC 9251 [1] that defines a "Selective Multicast Ethernet Tag Route" (type 6) [5] which allows VTEPs in the EVPN network to advertise multicast streams that they are interested in. This is done by having each VTEP suppress IGMP/MLD packets from being transmitted to the NVE network and instead communicate the information over BGP to other VTEPs. The draft in [2] further extends RFC 9251 with procedures to allow efficient forwarding of IP multicast traffic not only in a given subnet, but also between different subnets in a tenant domain. The required changes in the bridge driver to support the above were already merged in merge commit 8150f0cfb24f ("Merge branch 'bridge-mcast-extensions-for-evpn'"). However, full support entails MDB support in the VXLAN driver so that it will be able to selectively forward IP multicast traffic only to VTEPs with interested receivers. The implementation of this MDB is described in the next section. Implementation ============== The user interface is extended to allow user space to specify the destination VTEP(s) and related parameters. Example usage: # bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent dst 198.51.100.1 # bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent dst 192.0.2.1 $ bridge -d -s mdb show dev vxlan0 port vxlan0 grp 239.1.1.1 permanent filter_mode exclude proto static dst 192.0.2.1 0.00 dev vxlan0 port vxlan0 grp 239.1.1.1 permanent filter_mode exclude proto static dst 198.51.100.1 0.00 Since the MDB is fully managed by user space and since snooping is not implemented, only permanent entries can be installed and temporary entries are rejected by the kernel. The netlink interface is extended with a few new attributes in the RTM_NEWMDB / RTM_DELMDB request messages: [ struct nlmsghdr ] [ struct br_port_msg ] [ MDBA_SET_ENTRY ] struct br_mdb_entry [ MDBA_SET_ENTRY_ATTRS ] [ MDBE_ATTR_SOURCE ] struct in_addr / struct in6_addr [ MDBE_ATTR_SRC_LIST ] [ MDBE_SRC_LIST_ENTRY ] [ MDBE_SRCATTR_ADDRESS ] struct in_addr / struct in6_addr [ ...] [ MDBE_ATTR_GROUP_MODE ] u8 [ MDBE_ATTR_RTPORT ] u8 [ MDBE_ATTR_DST ] // new struct in_addr / struct in6_addr [ MDBE_ATTR_DST_PORT ] // new u16 [ MDBE_ATTR_VNI ] // new u32 [ MDBE_ATTR_IFINDEX ] // new s32 [ MDBE_ATTR_SRC_VNI ] // new u32 RTM_NEWMDB / RTM_DELMDB responses and notifications are extended with corresponding attributes. One MDB entry that can be installed in the VXLAN MDB, but not in the bridge MDB is the catchall entry (0.0.0.0 / ::). It is used to transmit unregistered multicast traffic that is not link-local and is especially useful when inter-subnet multicast forwarding is required. See patch #12 for a detailed explanation and motivation. It is similar to the "all-zeros" FDB entry that can be installed in the VXLAN FDB, but not the bridge FDB. "added_by_star_ex" entries -------------------------- The bridge driver automatically installs (S, G) MDB port group entries marked as "added_by_star_ex" whenever it detects that an (S, G) entry can prevent traffic from being forwarded via a port associated with an EXCLUDE (*, G) entry. The bridge will add the port to the port group of the (S, G) entry, thereby creating a new port group entry. The complexity associated with these entries is not trivial, but it needs to reside in the bridge driver because it automatically installs MDB entries in response to snooped IGMP / MLD packets. The same in not true for the VXLAN MDB which is entirely managed by user space who is fully capable of forming the correct replication lists on its own. In addition, the complexity associated with the "added_by_star_ex" entries in the VXLAN driver is higher compared to the bridge: Whenever a remote VTEP is added to the catchall entry, it needs to be added to all the existing MDB entries, as such a remote requested all the multicast traffic to be forwarded to it. Similarly, whenever an (*, G) or (S, G) entry is added, all the remotes associated with the catchall entry need to be added to it. Given the above, this patchset does not implement support for such entries. One argument against this decision can be that in the future someone might want to populate the VXLAN MDB in response to decapsulated IGMP / MLD packets and not according to EVPN routes. Regardless of my doubts regarding this possibility, it can be implemented using a new VXLAN device knob that will also enable the "added_by_star_ex" functionality. Testing ======= Tested using existing VXLAN and MDB selftests under "net/" and "net/forwarding/". Added a dedicated selftest in the last patch. Patchset overview ================= Patches #1-#3 are small preparations in the bridge driver. I plan to submit them separately together with an MDB dump test case. Patches #4-#6 are additional preparations centered around the extraction of the MDB netlink handlers from the bridge driver to the common rtnetlink code. This allows reusing the existing MDB netlink messages for the configuration of the VXLAN MDB. Patches #7-#9 include more small preparations in the common rtnetlink code and the VXLAN driver. Patch #10 implements the MDB control path in the VXLAN driver, which will allow user space to create, delete, replace and dump MDB entries. Patches #11-#12 implement the MDB data path in the VXLAN driver, allowing it to selectively forward IP multicast traffic according to the matched MDB entry. Patch #13 finally enables MDB support in the VXLAN driver. iproute2 patches can be found here [6]. Note that in order to fully support the specifications in [1] and [2], additional functionality is required from the data path. However, it can be achieved using existing kernel interfaces which is why it is not described here. Changelog ========= Since v1 [7]: Patch #9: Use htons() in 'case' instead of ntohs() in 'switch'. Since RFC [8]: Patch #3: Use NL_ASSERT_DUMP_CTX_FITS(). Patch #3: memset the entire context when moving to the next device. Patch #3: Reset sequence counters when moving to the next device. Patch #3: Use NL_SET_ERR_MSG_ATTR() in rtnl_validate_mdb_entry(). Patch #7: Remove restrictions regarding mixing of multicast and unicast remote destination IPs in an MDB entry. While such configuration does not make sense to me, it is no forbidden by the VXLAN FDB code and does not crash the kernel. Patch #7: Fix check regarding all-zeros MDB entry and source. Patch #11: New patch. [1] https://datatracker.ietf.org/doc/html/rfc9251 [2] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast [3] https://datatracker.ietf.org/doc/html/rfc7432 [4] https://datatracker.ietf.org/doc/html/rfc7432#section-7.2 [5] https://datatracker.ietf.org/doc/html/rfc9251#section-9.1 [6] https://github.com/idosch/iproute2/commits/submit/mdb_vxlan_rfc_v1 [7] https://lore.kernel.org/netdev/20230313145349.3557231-1-idosch@nvidia.com/ [8] https://lore.kernel.org/netdev/20230204170801.3897900-1-idosch@nvidia.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17selftests: net: Add VXLAN MDB testIdo Schimmel
Add test cases for VXLAN MDB, testing the control and data paths. Two different sets of namespaces (i.e., ns{1,2}_v4 and ns{1,2}_v6) are used in order to test VXLAN MDB with both IPv4 and IPv6 underlays, respectively. Example truncated output: # ./test_vxlan_mdb.sh [...] Tests passed: 620 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17vxlan: Enable MDB supportIdo Schimmel
Now that the VXLAN MDB control and data paths are in place we can expose the VXLAN MDB functionality to user space. Set the VXLAN MDB net device operations to the appropriate functions, thereby allowing the rtnetlink code to reach the VXLAN driver. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17vxlan: Add MDB data path supportIdo Schimmel
Integrate MDB support into the Tx path of the VXLAN driver, allowing it to selectively forward IP multicast traffic according to the matched MDB entry. If MDB entries are configured (i.e., 'VXLAN_F_MDB' is set) and the packet is an IP multicast packet, perform up to three different lookups according to the following priority: 1. For an (S, G) entry, using {Source VNI, Source IP, Destination IP}. 2. For a (*, G) entry, using {Source VNI, Destination IP}. 3. For the catchall MDB entry (0.0.0.0 or ::), using the source VNI. The catchall MDB entry is similar to the catchall FDB entry (00:00:00:00:00:00) that is currently used to transmit BUM (broadcast, unknown unicast and multicast) traffic. However, unlike the catchall FDB entry, this entry is only used to transmit unregistered IP multicast traffic that is not link-local. Therefore, when configured, the catchall FDB entry will only transmit BULL (broadcast, unknown unicast, link-local multicast) traffic. The catchall MDB entry is useful in deployments where inter-subnet multicast forwarding is used and not all the VTEPs in a tenant domain are members in all the broadcast domains. In such deployments it is advantageous to transmit BULL (broadcast, unknown unicast and link-local multicast) and unregistered IP multicast traffic on different tunnels. If the same tunnel was used, a VTEP only interested in IP multicast traffic would also pull all the BULL traffic and drop it as it is not a member in the originating broadcast domain [1]. If the packet did not match an MDB entry (or if the packet is not an IP multicast packet), return it to the Tx path, allowing it to be forwarded according to the FDB. If the packet did match an MDB entry, forward it to the associated remote VTEPs. However, if the entry is a (*, G) entry and the associated remote is in INCLUDE mode, then skip over it as the source IP is not in its source list (otherwise the packet would have matched on an (S, G) entry). Similarly, if the associated remote is marked as BLOCKED (can only be set on (S, G) entries), then skip over it as well as the remote is in EXCLUDE mode and the source IP is in its source list. [1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17vxlan: mdb: Add an internal flag to indicate MDB usageIdo Schimmel
Add an internal flag to indicate whether MDB entries are configured or not. Set the flag after installing the first MDB entry and clear it before deleting the last one. The flag will be consulted by the data path which will only perform an MDB lookup if the flag is set, thereby keeping the MDB overhead to a minimum when the MDB is not used. Another option would have been to use a static key, but it is global and not per-device, unlike the current approach. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17vxlan: mdb: Add MDB control path supportIdo Schimmel
Implement MDB control path support, enabling the creation, deletion, replacement and dumping of MDB entries in a similar fashion to the bridge driver. Unlike the bridge driver, each entry stores a list of remote VTEPs to which matched packets need to be replicated to and not a list of bridge ports. The motivating use case is the installation of MDB entries by a user space control plane in response to received EVPN routes. As such, only allow permanent MDB entries to be installed and do not implement snooping functionality, avoiding a lot of unnecessary complexity. Since entries can only be modified by user space under RTNL, use RTNL as the write lock. Use RCU to ensure that MDB entries and remotes are not freed while being accessed from the data path during transmission. In terms of uAPI, reuse the existing MDB netlink interface, but add a few new attributes to request and response messages: * IP address of the destination VXLAN tunnel endpoint where the multicast receivers reside. * UDP destination port number to use to connect to the remote VXLAN tunnel endpoint. * VXLAN VNI Network Identifier to use to connect to the remote VXLAN tunnel endpoint. Required when Ingress Replication (IR) is used and the remote VTEP is not a member of originating broadcast domain (VLAN/VNI) [1]. * Source VNI Network Identifier the MDB entry belongs to. Used only when the VXLAN device is in external mode. * Interface index of the outgoing interface to reach the remote VXLAN tunnel endpoint. This is required when the underlay destination IP is multicast (P2MP), as the multicast routing tables are not consulted. All the new attributes are added under the 'MDBA_SET_ENTRY_ATTRS' nest which is strictly validated by the bridge driver, thereby automatically rejecting the new attributes. [1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-3.2.2 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17vxlan: Expose vxlan_xmit_one()Ido Schimmel
Given a packet and a remote destination, the function will take care of encapsulating the packet and transmitting it to the destination. Expose it so that it could be used in subsequent patches by the MDB code to transmit a packet to the remote destination(s) stored in the MDB entry. It will allow us to keep the MDB code self-contained, not exposing its data structures to the rest of the VXLAN driver. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17vxlan: Move address helpers to private headersIdo Schimmel
Move the helpers out of the core C file to the private header so that they could be used by the upcoming MDB code. While at it, constify the second argument of vxlan_nla_get_addr(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17rtnetlink: bridge: mcast: Relax group address validation in common codeIdo Schimmel
In the upcoming VXLAN MDB implementation, the 0.0.0.0 and :: MDB entries will act as catchall entries for unregistered IP multicast traffic in a similar fashion to the 00:00:00:00:00:00 VXLAN FDB entry that is used to transmit BUM traffic. In deployments where inter-subnet multicast forwarding is used, not all the VTEPs in a tenant domain are members in all the broadcast domains. It is therefore advantageous to transmit BULL (broadcast, unknown unicast and link-local multicast) and unregistered IP multicast traffic on different tunnels. If the same tunnel was used, a VTEP only interested in IP multicast traffic would also pull all the BULL traffic and drop it as it is not a member in the originating broadcast domain [1]. Prepare for this change by allowing the 0.0.0.0 group address in the common rtnetlink MDB code and forbid it in the bridge driver. A similar change is not needed for IPv6 because the common code only validates that the group address is not the all-nodes address. [1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17rtnetlink: bridge: mcast: Move MDB handlers out of bridge driverIdo Schimmel
Currently, the bridge driver registers handlers for MDB netlink messages, making it impossible for other drivers to implement MDB support. As a preparation for VXLAN MDB support, move the MDB handlers out of the bridge driver to the core rtnetlink code. The rtnetlink code will call into individual drivers by invoking their previously added MDB net device operations. Note that while the diffstat is large, the change is mechanical. It moves code out of the bridge driver to rtnetlink code. Also note that a similar change was made in 2012 with commit 77162022ab26 ("net: add generic PF_BRIDGE:RTM_ FDB hooks") that moved FDB handlers out of the bridge driver to the core rtnetlink code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17bridge: mcast: Implement MDB net device operationsIdo Schimmel
Implement the previously added MDB net device operations in the bridge driver so that they could be invoked by core rtnetlink code in the next patch. The operations are identical to the existing br_mdb_{dump,add,del} functions. The '_new' suffix will be removed in the next patch. The functions are re-implemented in this patch to make the conversion in the next patch easier to review. Add dummy implementations when 'CONFIG_BRIDGE_IGMP_SNOOPING' is disabled, so that an error will be returned to user space when it is trying to add or delete an MDB entry. This is consistent with existing behavior where the bridge driver does not even register rtnetlink handlers for RTM_{NEW,DEL,GET}MDB messages when this Kconfig option is disabled. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17net: Add MDB net device operationsIdo Schimmel
Add MDB net device operations that will be invoked by rtnetlink code in response to received RTM_{NEW,DEL,GET}MDB messages. Subsequent patches will implement these operations in the bridge and VXLAN drivers. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17Merge branch 'J784S4-CPSW9G-bindings'David S. Miller
Siddharth Vadapalli says: ==================== Add J784S4 CPSW9G NET Bindings This series cleans up the bindings by reordering the compatibles, followed by adding the bindings for CPSW9G instance of CPSW Ethernet Switch on TI's J784S4 SoC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17dt-bindings: net: ti: k3-am654-cpsw-nuss: Add J784S4 CPSW9G supportSiddharth Vadapalli
Update bindings for TI K3 J784S4 SoC which contains 9 ports (8 external ports) CPSW9G module and add compatible for it. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17dt-bindings: net: ti: k3-am654-cpsw-nuss: Fix compatible orderSiddharth Vadapalli
Reorder compatibles to follow alphanumeric order. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17net: mana: Add new MANA VF performance counters for easier troubleshootingShradha Gupta
Extended performance counter stats in 'ethtool -S <interface>' output for MANA VF to facilitate troubleshooting. Tested-on: Ubuntu22 Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17Merge branch 'bonding-fixes'David S. Miller
Nikolay Aleksandrov says: ==================== bonding: properly restore flags when bond changes ether type A bug was reported by syzbot[1] that causes a warning and a myriad of other potential issues if a bond, that is also a slave, fails to enslave a non-eth device. While fixing that bug I found that we have the same issues when such enslave passes and after that the bond changes back to ARPHRD_ETHER (again due to ether_setup). This set fixes all issues by extracting the ether_setup() sequence in a helper which does the right thing about bond flags when it needs to change back to ARPHRD_ETHER. It also adds selftests for these cases. Patch 01 adds the new bond_ether_setup helper and fixes the issues when a bond device changes its ether type due to successful enslave. Patch 02 fixes the issues when it changes its ether type due to an unsuccessful enslave. Note we need two patches because the bugs were introduced by different commits. Patch 03 adds the new selftests. Due to the comment adjustment and squash, could you please review patch 01 again? I've kept the other acks since there were no code changes. v3: squash the helper patch and the first fix, adjust the comment above it to be explicit about the bond device, no code changes v2: new set, all patches are new due to new approach of fixing these bugs [1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17selftests: bonding: add tests for ether type changesNikolay Aleksandrov
Add new network selftests for the bonding device which exercise the ether type changing call paths. They also test for the recent syzbot bug[1] which causes a warning and results in wrong device flags (IFF_SLAVE missing). The test adds three bond devices and a nlmon device, enslaves one of the bond devices to the other and then uses the nlmon device for successful and unsuccesful enslaves both of which change the bond ether type. Thus we can test for both MASTER and SLAVE flags at the same time. If the flags are properly restored we get: TEST: Change ether type of an enslaved bond device with unsuccessful enslave [ OK ] TEST: Change ether type of an enslaved bond device with successful enslave [ OK ] [1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17bonding: restore bond's IFF_SLAVE flag if a non-eth dev enslave failsNikolay Aleksandrov
syzbot reported a warning[1] where the bond device itself is a slave and we try to enslave a non-ethernet device as the first slave which fails but then in the error path when ether_setup() restores the bond device it also clears all flags. In my previous fix[2] I restored the IFF_MASTER flag, but I didn't consider the case that the bond device itself might also be a slave with IFF_SLAVE set, so we need to restore that flag as well. Use the bond_ether_setup helper which does the right thing and restores the bond's flags properly. Steps to reproduce using a nlmon dev: $ ip l add nlmon0 type nlmon $ ip l add bond1 type bond $ ip l add bond2 type bond $ ip l set bond1 master bond2 $ ip l set dev nlmon0 master bond1 $ ip -d l sh dev bond1 22: bond1: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noqueue master bond2 state DOWN mode DEFAULT group default qlen 1000 (now bond1's IFF_SLAVE flag is gone and we'll hit a warning[3] if we try to delete it) [1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [2] commit 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") [3] example warning: [ 27.008664] bond1: (slave nlmon0): The slave device specified does not support setting the MAC address [ 27.008692] bond1: (slave nlmon0): Error -95 calling set_mac_address [ 32.464639] bond1 (unregistering): Released all slaves [ 32.464685] ------------[ cut here ]------------ [ 32.464686] WARNING: CPU: 1 PID: 2004 at net/core/dev.c:10829 unregister_netdevice_many+0x72a/0x780 [ 32.464694] Modules linked in: br_netfilter bridge bonding virtio_net [ 32.464699] CPU: 1 PID: 2004 Comm: ip Kdump: loaded Not tainted 5.18.0-rc3+ #47 [ 32.464703] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014 [ 32.464704] RIP: 0010:unregister_netdevice_many+0x72a/0x780 [ 32.464707] Code: 99 fd ff ff ba 90 1a 00 00 48 c7 c6 f4 02 66 96 48 c7 c7 20 4d 35 96 c6 05 fa c7 2b 02 01 e8 be 6f 4a 00 0f 0b e9 73 fd ff ff <0f> 0b e9 5f fd ff ff 80 3d e3 c7 2b 02 00 0f 85 3b fd ff ff ba 59 [ 32.464710] RSP: 0018:ffffa006422d7820 EFLAGS: 00010206 [ 32.464712] RAX: ffff8f6e077140a0 RBX: ffffa006422d7888 RCX: 0000000000000000 [ 32.464714] RDX: ffff8f6e12edbe58 RSI: 0000000000000296 RDI: ffffffff96d4a520 [ 32.464716] RBP: ffff8f6e07714000 R08: ffffffff96d63600 R09: ffffa006422d7728 [ 32.464717] R10: 0000000000000ec0 R11: ffffffff9698c988 R12: ffff8f6e12edb140 [ 32.464719] R13: dead000000000122 R14: dead000000000100 R15: ffff8f6e12edb140 [ 32.464723] FS: 00007f297c2f1740(0000) GS:ffff8f6e5d900000(0000) knlGS:0000000000000000 [ 32.464725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.464726] CR2: 00007f297bf1c800 CR3: 00000000115e8000 CR4: 0000000000350ee0 [ 32.464730] Call Trace: [ 32.464763] <TASK> [ 32.464767] rtnl_dellink+0x13e/0x380 [ 32.464776] ? cred_has_capability.isra.0+0x68/0x100 [ 32.464780] ? __rtnl_unlock+0x33/0x60 [ 32.464783] ? bpf_lsm_capset+0x10/0x10 [ 32.464786] ? security_capable+0x36/0x50 [ 32.464790] rtnetlink_rcv_msg+0x14e/0x3b0 [ 32.464792] ? _copy_to_iter+0xb1/0x790 [ 32.464796] ? post_alloc_hook+0xa0/0x160 [ 32.464799] ? rtnl_calcit.isra.0+0x110/0x110 [ 32.464802] netlink_rcv_skb+0x50/0xf0 [ 32.464806] netlink_unicast+0x216/0x340 [ 32.464809] netlink_sendmsg+0x23f/0x480 [ 32.464812] sock_sendmsg+0x5e/0x60 [ 32.464815] ____sys_sendmsg+0x22c/0x270 [ 32.464818] ? import_iovec+0x17/0x20 [ 32.464821] ? sendmsg_copy_msghdr+0x59/0x90 [ 32.464823] ? do_set_pte+0xa0/0xe0 [ 32.464828] ___sys_sendmsg+0x81/0xc0 [ 32.464832] ? mod_objcg_state+0xc6/0x300 [ 32.464835] ? refill_obj_stock+0xa9/0x160 [ 32.464838] ? memcg_slab_free_hook+0x1a5/0x1f0 [ 32.464842] __sys_sendmsg+0x49/0x80 [ 32.464847] do_syscall_64+0x3b/0x90 [ 32.464851] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 32.464865] RIP: 0033:0x7f297bf2e5e7 [ 32.464868] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 32.464869] RSP: 002b:00007ffd96c824c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 32.464872] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f297bf2e5e7 [ 32.464874] RDX: 0000000000000000 RSI: 00007ffd96c82540 RDI: 0000000000000003 [ 32.464875] RBP: 00000000640f19de R08: 0000000000000001 R09: 000000000000007c [ 32.464876] R10: 00007f297bffabe0 R11: 0000000000000246 R12: 0000000000000001 [ 32.464877] R13: 00007ffd96c82d20 R14: 00007ffd96c82610 R15: 000055bfe38a7020 [ 32.464881] </TASK> [ 32.464882] ---[ end trace 0000000000000000 ]--- Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type changeNikolay Aleksandrov
Add bond_ether_setup helper which is used to fix ether_setup() calls in the bonding driver. It takes care of both IFF_MASTER and IFF_SLAVE flags, the former is always restored and the latter only if it was set. If the bond enslaves non-ARPHRD_ETHER device (changes its type), then releases it and enslaves ARPHRD_ETHER device (changes back) then we use ether_setup() to restore the bond device type but it also resets its flags and removes IFF_MASTER and IFF_SLAVE[1]. Use the bond_ether_setup helper to restore both after such transition. [1] reproduce (nlmon is non-ARPHRD_ETHER): $ ip l add nlmon0 type nlmon $ ip l add bond2 type bond mode active-backup $ ip l set nlmon0 master bond2 $ ip l set nlmon0 nomaster $ ip l add bond1 type bond (we use bond1 as ARPHRD_ETHER device to restore bond2's mode) $ ip l set bond1 master bond2 $ ip l sh dev bond2 37: bond2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether be:d7:c5:40:5b:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 (notice bond2's IFF_MASTER is missing) Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17net: wangxun: Implement the ndo change mtu interfaceMengyuan Lou
Add ngbe and txgbe ndo_change_mtu support. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17Merge branch 'net-renesas-rswitch-fixes'David S. Miller
Yoshihiro Shimoda says: ==================== net: renesas: rswitch: Fix rx and timestamp I got reports locally about issues on the rswitch driver. So, fix the issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17net: renesas: rswitch: Fix GWTSDIE register handlingYoshihiro Shimoda
Since the GWCA has the TX timestamp feature, this driver should not disable it if one of ports is opened. So, fix it. Reported-by: Phong Hoang <phong.hoang.wz@renesas.com> Fixes: 33f5d733b589 ("net: renesas: rswitch: Improve TX timestamp accuracy") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17net: renesas: rswitch: Fix the output value of quote from rswitch_rx()Yoshihiro Shimoda
If the RX descriptor doesn't have any data, the output value of quote from rswitch_rx() will be increased unexpectedily. So, fix it. Reported-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17ethernet: sun: add check for the mdesc_grab()Liang He
In vnet_port_probe() and vsw_port_probe(), we should check the return value of mdesc_grab() as it may return NULL which can caused NPD bugs. Fixes: 5d01fa0c6bd8 ("ldmvsw: Add ldmvsw.c driver code") Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.") Signed-off-by: Liang He <windhl@126.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17net: dsa: realtek: rtl8365mb: add change_mtuLuiz Angelo Daros de Luca
The rtl8365mb was using a fixed MTU size of 1536, which was probably inspired by the rtl8366rb's initial frame size. However, unlike that family, the rtl8365mb family can specify the max frame size in bytes, rather than in fixed steps. DSA calls change_mtu for the CPU port once the max MTU value among the ports changes. As the max frame size is defined globally, the switch is configured only when the call affects the CPU port. The available specifications do not directly define the max supported frame size, but it mentions a 16k limit. This driver will use the 0x3FFF limit as it is used in the vendor API code. However, the switch sets the max frame size to 16368 bytes (0x3FF0) after it resets. change_mtu uses MTU size, or ethernet payload size, while the switch works with frame size. The frame size is calculated considering the ethernet header (14 bytes), a possible 802.1Q tag (4 bytes), the payload size (MTU), and the Ethernet FCS (4 bytes). The CPU tag (8 bytes) is consumed before the switch enforces the limit. During setup, the driver will use the default 1500-byte MTU of DSA to set the maximum frame size. The current sum will be VLAN_ETH_HLEN+1500+ETH_FCS_LEN, which results in 1522 bytes. Although it is lower than the previous initial value of 1536 bytes, the driver will increase the frame size for a larger MTU. However, if something requires more space without increasing the MTU, such as QinQ, we would need to add the extra length to the rtl8365mb_port_change_mtu() formula. MTU was tested up to 2018 (with 802.1Q) as that is as far as mt7620 (where rtl8367s is stacked) can go. The register was manually manipulated byte-by-byte to ensure the MTU to frame size conversion was correct. For frames without 802.1Q tag, the frame size limit will be 4 bytes over the required size. There is a jumbo register, enabled by default at 6k frame size. However, the jumbo settings do not seem to limit nor expand the maximum tested MTU (2018), even when jumbo is disabled. More tests are needed with a device that can handle larger frames. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17drm/ttm: drop extra ttm_bo_put in ttm_bo_cleanup_refsChristian König
That was accidentially left over when we switched to the delayed delete worker. Suggested-by: Matthew Auld <matthew.william.auld@gmail.com> Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 9bff18d13473 ("drm/ttm: use per BO cleanup workers") Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230316072647.406707-1-christian.koenig@amd.com
2023-03-17Merge tag 'amd-drm-fixes-6.3-2023-03-15' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.3-2023-03-15: amdgpu: - SMU 13 update - RDNA2 suspend/resume fix when overclocking is enabled - SRIOV VCN fixes - HDCP suspend/resume fix - Fix drm polling splat regression - Fix dirty rectangle tracking for PSR - Fix vangogh regression on certain BIOSes - Misc display fixes - Suspend/resume IOMMU regression fix amdkfd: - Fix BO offset for multi-VMA page migration - Fix a possible double free - Fix potential use after free - Fix process cleanup on module exit Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230315224400.7558-1-alexander.deucher@amd.com
2023-03-16Merge branch 'net-ipa-minor-bug-fixes'Jakub Kicinski
Alex Elder says: ==================== net: ipa: minor bug fixes The four patches in this series fix some errors, though none of them cause any compile or runtime problems. The first changes the files included by "drivers/net/ipa/reg.h" to ensure everything it requires is included with the file. It also stops unnecessarily including another file. The prerequisites are apparently satisfied other ways, currently. The second adds two struct declarations to "gsi_reg.h", to ensure they're declared before they're used later in the file. Again, it seems these declarations are currently resolved wherever this file is included. The third removes register definitions that were added for IPA v5.0 that are not needed. And the last updates some validity checks for IPA v5.0 registers. No IPA v5.0 platforms are yet supported, so the issues resolved here were never harmful. Versions 2 and 3 of this series change the "Fixes" tags in patches so they supply legitimate commit hashes. ==================== Link: https://lore.kernel.org/r/20230316145136.1795469-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net: ipa: fix some register validity checksAlex Elder
A recent commit defined HW_PARAM_4 as a GSI register ID but did not add it to gsi_reg_id_valid() to indicate it's valid (for IPA v5.0+). Add version checks for the HW_PARAM_2 and INTER_EE IRQ GSI registers there as well. IPA v5.0 supports up to 8 source and destination resource groups. Update the validity check (and the comments where the register IDs are defined) to reflect that. Similarly update comments and validity checks for the hash/cache-related registers. Note that this patch fixes an omission and constrains things further, but these don't technically represent bugs. Fixes: f651334e1ef5 ("net: ipa: add HW_PARAM_4 GSI register") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net: ipa: kill FILT_ROUT_CACHE_CFG IPA registerAlex Elder
A recent commit defined a few IPA registers used for IPA v5.0+. One of those was a mistake. Although the filter and router caches get *flushed* using a single register, they use distinct registers (ENDP_FILTER_CACHE_CFG and ENDP_ROUTER_CACHE_CFG) for configuration. And although there *exists* a FILT_ROUT_CACHE_CFG register, it is not needed in upstream code. So get rid of definitions related to FILT_ROUT_CACHE_CFG, because they are not needed. Fixes: 8ba59716d16a ("net: ipa: define IPA v5.0+ registers") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net: ipa: add two missing declarationsAlex Elder
When gsi_reg_init() got added, its declaration was added to "gsi_reg.h" without declaring the two struct pointer types it uses. Add these struct declarations to "gsi_reg.h". Fixes: 3c506add35c7 ("net: ipa: introduce gsi_reg_init()") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net: ipa: reg: include <linux/bug.h>Alex Elder
When "reg.h" got created, it included calls to WARN() and WARN_ON(). Those macros are defined via <linux/bug.h>. In addition, it uses is_power_of_2(), which is defined in <linux/log2.h>. Include those files so IPA "reg.h" has access to all definitions it requires. Meanwhile, <linux/bits.h> is included but nothing defined therein is required directly in "reg.h", so get rid of that. Fixes: 81772e444dbe ("net: ipa: start generalizing "ipa_reg"") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net: xdp: don't call notifiers during driver initJakub Kicinski
Drivers will commonly perform feature setting during init, if they use the xdp_set_features_flag() helper they'll likely run into an ASSERT_RTNL() inside call_netdevice_notifiers_info(). Don't call the notifier until the device is actually registered. Nothing should be tracking the device until its registered and after its unregistration has started. Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Link: https://lore.kernel.org/r/20230316220234.598091-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16Merge branch 'net-sched-fix-parsing-of-tca_ext_warn_msg-for-tc-action'Jakub Kicinski
Hangbin Liu says: ==================== net/sched: fix parsing of TCA_EXT_WARN_MSG for tc action In my previous commit 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") I didn't notice the tc action use different enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action. Let's rever the previous fix 923b2e30dc9c ("net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy") and add a new TCA_ROOT_EXT_WARN_MSG for tc action specifically. Here is the tdc test result: 1..1119 ok 1 d959 - Add cBPF action with valid bytecode ok 2 f84a - Add cBPF action with invalid bytecode ok 3 e939 - Add eBPF action with valid object-file ok 4 282d - Add eBPF action with invalid object-file ok 5 d819 - Replace cBPF bytecode and action control ok 6 6ae3 - Delete cBPF action ok 7 3e0d - List cBPF actions ok 8 55ce - Flush BPF actions ok 9 ccc3 - Add cBPF action with duplicate index ok 10 89c7 - Add cBPF action with invalid index [...] ok 1115 2348 - Show TBF class ok 1116 84a0 - Create TEQL with default setting ok 1117 7734 - Create TEQL with multiple device ok 1118 34a9 - Delete TEQL with valid handle ok 1119 6289 - Show TEQL stats ==================== Link: https://lore.kernel.org/r/20230316033753.2320557-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net/sched: act_api: add specific EXT_WARN_MSG for tc actionHangbin Liu
In my previous commit 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") I didn't notice the tc action use different enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action. Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this param before going to the TCA_ACT_TAB nest. Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"Hangbin Liu
This reverts commit 923b2e30dc9cd05931da0f64e2e23d040865c035. This is not a correct fix as TCA_EXT_WARN_MSG is not a hierarchy to TCA_ACT_TAB. I didn't notice the TC actions use different enum when adding TCA_EXT_WARN_MSG. To fix the difference I will add a new WARN enum in TCA_ROOT_MAX as Jamal suggested. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net: dsa: microchip: fix RGMII delay configuration on KSZ8765/KSZ8794/KSZ8795Marek Vasut
The blamed commit has replaced a ksz_write8() call to address REG_PORT_5_CTRL_6 (0x56) with a ksz_set_xmii() -> ksz_pwrite8() call to regs[P_XMII_CTRL_1], which is also defined as 0x56 for ksz8795_regs[]. The trouble is that, when compared to ksz_write8(), ksz_pwrite8() also adjusts the register offset with the port base address. So in reality, ksz_pwrite8(offset=0x56) accesses register 0x56 + 0x50 = 0xa6, which in this switch appears to be unmapped, and the RGMII delay configuration on the CPU port does nothing. So if the switch wasn't fine with the RGMII delay configuration done through pin strapping and relied on Linux to apply a different one in order to pass traffic, this is now broken. Using the offset translation logic imposed by ksz_pwrite8(), the correct value for regs[P_XMII_CTRL_1] should have been 0x6 on ksz8795_regs[], in order to really end up accessing register 0x56. Static code analysis shows that, despite there being multiple other accesses to regs[P_XMII_CTRL_1] in this driver, the only code path that is applicable to ksz8795_regs[] and ksz8_dev_ops is ksz_set_xmii(). Therefore, the problem is isolated to RGMII delays. In its current form, ksz8795_regs[] contains the same value for P_XMII_CTRL_0 and for P_XMII_CTRL_1, and this raises valid suspicions that writes made by the driver to regs[P_XMII_CTRL_0] might overwrite writes made to regs[P_XMII_CTRL_1] or vice versa. Again, static analysis shows that the only accesses to P_XMII_CTRL_0 from the driver are made from code paths which are not reachable with ksz8_dev_ops. So the accesses made by ksz_set_xmii() are safe for this switch family. [ vladimiroltean: rewrote commit message ] Fixes: c476bede4b0f ("net: dsa: microchip: ksz8795: use common xmii function") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230315231916.2998480-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16Merge branch 'ynl-another-license-adjustment'Jakub Kicinski
Jakub Kicinski says: ==================== ynl: another license adjustment Hopefully the last adjustment to the licensing of the specs. I'm still the author so should be fine to do this. ==================== Link: https://lore.kernel.org/r/20230315230351.478320-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16ynl: make the tooling check the licenseJakub Kicinski
The (only recently documented) expectation is that all specs are under a certain license, but we don't actually enforce it. What's worse we then go ahead and assume the license was right, outputting the expected license into generated files. Fixes: 37d9df224d1e ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16ynl: broaden the license even moreJakub Kicinski
I relicensed Netlink spec code to GPL-2.0 OR BSD-3-Clause but we still put a slightly different license on the uAPI header than the rest of the code. Use the Linux-syscall-note on all the specs and all generated code. It's moot for kernel code, but should not hurt. This way the licenses match everywhere. Cc: Chuck Lever <chuck.lever@oracle.com> Fixes: 37d9df224d1e ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16tools: ynl: make definitions optional againJakub Kicinski
definitions are optional, commit in question breaks cli for ethtool. Fixes: 6517a60b0307 ("tools: ynl: move the enum classes to shared code") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16Merge tag 'mlx5-fixes-2023-03-15' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-03-15 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: TC, Remove error message log print net/mlx5e: TC, fix cloned flow attribute net/mlx5e: TC, fix missing error code net/sched: TC, fix raw counter initialization net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisites net/mlx5: Set BREAK_FW_WAIT flag first when removing driver net/mlx5e: kTLS, Fix missing error unwind on unsupported cipher type net/mlx5e: Fix cleanup null-ptr deref on encap lock net/mlx5: E-switch, Fix missing set of split_count when forward to ovs internal port net/mlx5: E-switch, Fix wrong usage of source port rewrite in split rules net/mlx5: Disable eswitch before waiting for VF pages net/mlx5: Fix setting ec_function bit in MANAGE_PAGES net/mlx5e: Don't cache tunnel offloads capability net/mlx5e: Fix macsec ASO context alignment ==================== Link: https://lore.kernel.org/r/20230315225847.360083-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16hsr: ratelimit only when errors are printedMatthieu Baerts
Recently, when automatically merging -net and net-next in MPTCP devel tree, our CI reported [1] a conflict in hsr, the same as the one reported by Stephen in netdev [2]. When looking at the conflict, I noticed it is in fact the v1 [3] that has been applied in -net and the v2 [4] in net-next. Maybe the v1 was applied by accident. As mentioned by Jakub Kicinski [5], the new condition makes more sense before the net_ratelimit(), not to update net_ratelimit's state which is unnecessary if we're not going to print either way. Here, this modification applies the v2 but in -net. Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/4423171069 [1] Link: https://lore.kernel.org/netdev/20230315100914.53fc1760@canb.auug.org.au/ [2] Link: https://lore.kernel.org/netdev/20230307133229.127442-1-koverskeid@gmail.com/ [3] Link: https://lore.kernel.org/netdev/20230309092302.179586-1-koverskeid@gmail.com/ [4] Link: https://lore.kernel.org/netdev/20230308232001.2fb62013@kernel.org/ [5] Fixes: 28e8cabe80f3 ("net: hsr: Don't log netdev_err message on unknown prp dst node") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Link: https://lore.kernel.org/r/20230315-net-20230315-hsr_framereg-ratelimit-v1-1-61d2ef176d11@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16qed/qed_mng_tlv: correctly zero out ->min instead of ->hourDaniil Tatianin
This fixes an issue where ->hour would erroneously get zeroed out instead of ->min because of a bad copy paste. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Fixes: f240b6882211 ("qed: Add support for processing fcoe tlv request.") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Link: https://lore.kernel.org/r/20230315194618.579286-1-d-tatianin@yandex-team.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-17Merge tag 'drm-intel-fixes-2023-03-15' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v6.3-rc3: - Fix hwmon PL1 power limit enabling - Fix audio ELD handling for DP MST - Fix PSR io and wake line calculations - Fix DG2 HDMI modes with 267.30 and 319.89 MHz pixel clocks - Fix SSEU subslice out-of-bounds access - Fix misuse of non-idle barriers as fence trackers Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87r0tq5nyn.fsf@intel.com
2023-03-17Merge tag 'drm-misc-fixes-2023-03-16' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * fix info leak in edid * build fix for accel/ * ref-counting fix for fbdev deferred I/O * driver fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230316143347.GA9246@linux-uq9g
2023-03-16selftests: net: devlink_port_split.py: skip test if no suitable device availablePo-Hsu Lin
The `devlink -j port show` command output may not contain the "flavour" key, an example from Ubuntu 22.10 s390x LPAR(5.19.0-37-generic), with mlx4 driver and iproute2-5.15.0: {"port":{"pci/0001:00:00.0/1":{"type":"eth","netdev":"ens301"}, "pci/0001:00:00.0/2":{"type":"eth","netdev":"ens301d1"}, "pci/0002:00:00.0/1":{"type":"eth","netdev":"ens317"}, "pci/0002:00:00.0/2":{"type":"eth","netdev":"ens317d1"}}} This will cause a KeyError exception. Create a validate_devlink_output() to check for this "flavour" from devlink command output to avoid this KeyError exception. Also let it handle the check for `devlink -j dev show` output in main(). Apart from this, if the test was not started because the max lanes of the designated device is 0. The script will still return 0 and thus causing a false-negative test result. Use a found_max_lanes flag to determine if these tests were skipped due to this reason and return KSFT_SKIP to make it more clear. Link: https://bugs.launchpad.net/bugs/1937133 Fixes: f3348a82e727 ("selftests: net: Add port split test") Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Link: https://lore.kernel.org/r/20230315165353.229590-1-po-hsu.lin@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16i825xx: sni_82596: use eth_hw_addr_set()Thomas Bogendoerfer
netdev->dev_addr is now const, we can't write to it directly. Copy scrambled mac address octects into an array then eth_hw_addr_set(). Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://lore.kernel.org/r/20230315134117.79511-1-tsbogend@alpha.franken.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net/iucv: Fix size of interrupt dataAlexandra Winter
iucv_irq_data needs to be 4 bytes larger. These bytes are not used by the iucv module, but written by the z/VM hypervisor in case a CPU is deconfigured. Reported as: BUG dma-kmalloc-64 (Not tainted): kmalloc Redzone overwritten ----------------------------------------------------------------------------- 0x0000000000400564-0x0000000000400567 @offset=1380. First byte 0x80 instead of 0xcc Allocated in iucv_cpu_prepare+0x44/0xd0 age=167839 cpu=2 pid=1 __kmem_cache_alloc_node+0x166/0x450 kmalloc_node_trace+0x3a/0x70 iucv_cpu_prepare+0x44/0xd0 cpuhp_invoke_callback+0x156/0x2f0 cpuhp_issue_call+0xf0/0x298 __cpuhp_setup_state_cpuslocked+0x136/0x338 __cpuhp_setup_state+0xf4/0x288 iucv_init+0xf4/0x280 do_one_initcall+0x78/0x390 do_initcalls+0x11a/0x140 kernel_init_freeable+0x25e/0x2a0 kernel_init+0x2e/0x170 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 Freed in iucv_init+0x92/0x280 age=167839 cpu=2 pid=1 __kmem_cache_free+0x308/0x358 iucv_init+0x92/0x280 do_one_initcall+0x78/0x390 do_initcalls+0x11a/0x140 kernel_init_freeable+0x25e/0x2a0 kernel_init+0x2e/0x170 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 Slab 0x0000037200010000 objects=32 used=30 fp=0x0000000000400640 flags=0x1ffff00000010200(slab|head|node=0|zone=0| Object 0x0000000000400540 @offset=1344 fp=0x0000000000000000 Redzone 0000000000400500: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Redzone 0000000000400510: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Redzone 0000000000400520: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Redzone 0000000000400530: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Object 0000000000400540: 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object 0000000000400550: f3 86 81 f2 f4 82 f8 82 f0 f0 f0 f0 f0 f0 f0 f2 ................ Object 0000000000400560: 00 00 00 00 80 00 00 00 cc cc cc cc cc cc cc cc ................ Object 0000000000400570: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Redzone 0000000000400580: cc cc cc cc cc cc cc cc ........ Padding 00000000004005d4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ Padding 00000000004005e4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ Padding 00000000004005f4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZ CPU: 6 PID: 121030 Comm: 116-pai-crypto. Not tainted 6.3.0-20230221.rc0.git4.99b8246b2d71.300.fc37.s390x+debug #1 Hardware name: IBM 3931 A01 704 (z/VM 7.3.0) Call Trace: [<000000032aa034ec>] dump_stack_lvl+0xac/0x100 [<0000000329f5a6cc>] check_bytes_and_report+0x104/0x140 [<0000000329f5aa78>] check_object+0x370/0x3c0 [<0000000329f5ede6>] free_debug_processing+0x15e/0x348 [<0000000329f5f06a>] free_to_partial_list+0x9a/0x2f0 [<0000000329f5f4a4>] __slab_free+0x1e4/0x3a8 [<0000000329f61768>] __kmem_cache_free+0x308/0x358 [<000000032a91465c>] iucv_cpu_dead+0x6c/0x88 [<0000000329c2fc66>] cpuhp_invoke_callback+0x156/0x2f0 [<000000032aa062da>] _cpu_down.constprop.0+0x22a/0x5e0 [<0000000329c3243e>] cpu_device_down+0x4e/0x78 [<000000032a61dee0>] device_offline+0xc8/0x118 [<000000032a61e048>] online_store+0x60/0xe0 [<000000032a08b6b0>] kernfs_fop_write_iter+0x150/0x1e8 [<0000000329fab65c>] vfs_write+0x174/0x360 [<0000000329fab9fc>] ksys_write+0x74/0x100 [<000000032aa03a5a>] __do_syscall+0x1da/0x208 [<000000032aa177b2>] system_call+0x82/0xb0 INFO: lockdep is turned off. FIX dma-kmalloc-64: Restoring kmalloc Redzone 0x0000000000400564-0x0000000000400567=0xcc FIX dma-kmalloc-64: Object at 0x0000000000400540 not freed Fixes: 2356f4cb1911 ("[S390]: Rewrite of the IUCV base code, part 2") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20230315131435.4113889-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>