summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-20igb: Fix an end of loop testDan Carpenter
When we exit a list_for_each_entry() without hitting a break statement, the list iterator isn't NULL, it just point to an offset off the list_head. In that situation, it wouldn't be too surprising for entry->free to be true and we end up corrupting memory. The way to test for these is to just set a flag. Fixes: c1fec890458a ("ethernet/intel: Use list_for_each_entry() helper") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: cleanup ice_find_netlist_nodeJacob Keller
The ice_find_netlist_node function was introduced in commit 8a3a565ff210 ("ice: add admin commands to access cgu configuration"). Variations of this function were reviewed concurrently on both intel-wired-lan[1][2], and netdev [3][4] [1]: https://lore.kernel.org/intel-wired-lan/20230913204943.1051233-7-vadim.fedorenko@linux.dev/ [2]: https://lore.kernel.org/intel-wired-lan/20230817000058.2433236-5-jacob.e.keller@intel.com/ [3]: https://lore.kernel.org/netdev/20230918212814.435688-1-anthony.l.nguyen@intel.com/ [4]: https://lore.kernel.org/netdev/20230913204943.1051233-7-vadim.fedorenko@linux.dev/ The variant I posted had a few changes due to review feedback which were never incorporated into the DPLL series: * Replace the references to ancient and long removed ICE_SUCCESS and ICE_ERR_DOES_NOT_EXIST status codes in the function comment. * Return -ENOENT instead of -ENOTBLK, as a more common way to indicate that an entry doesn't exist. * Avoid the use of memset() and use simple static initialization for the cmd variable. * Use FIELD_PREP to assign the node_type_ctx. * Remove an unnecessary local variable to keep track of rec_node_handle, just pass the node_handle pointer directly into ice_aq_get_netlist_node. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: make ice_get_pf_c827_idx staticJacob Keller
The ice_get_pf_c827_idx function is only called inside of ice_ptp_hw.c, so there is no reason to export it. Mark it static and remove the declaration from ice_ptp_hw.h Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: manage VFs MSI-X using resource trackingMichal Swiatkowski
Track MSI-X for VFs using bitmap, by setting and clearing bitmap during allocation and freeing. Try to linearize irqs usage for VFs, by freeing them and allocating once again. Do it only for VFs that aren't currently running. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: set MSI-X vector count on VFMichal Swiatkowski
Implement ops needed to set MSI-X vector count on VF. sriov_get_vf_total_msix() should return total number of MSI-X that can be used by the VFs. Return the value set by devlink resources API (pf->req_msix.vf). sriov_set_msix_vec_count() will set number of MSI-X on particular VF. Disable VF register mapping, rebuild VSI with new MSI-X and queues values and enable new VF register mapping. For best performance set number of queues equal to number of MSI-X. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: add bitmap to track VF MSI-X usageMichal Swiatkowski
Create a bitamp to track MSI-X usage for VFs. The bitmap has the size of total MSI-X amount on device, because at init time the amount of MSI-X used by VFs isn't known. The bitmap is used in follow up patchset to provide a block of continuous block of MSI-X indexes for each created VF. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: implement num_msix field per VFMichal Swiatkowski
Store the amount of MSI-X per VF instead of storing it in pf struct. It is used to calculate number of q_vectors (and queues) for VF VSI. This is necessary because with follow up changes the number of MSI-X can be different between VFs. Use it instead of using pf->vf_msix value in all cases. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: store VF's pci_dev ptr in ice_vfPrzemek Kitszel
Extend struct ice_vf by vfdev. Calculation of vfdev falls more nicely into ice_create_vf_entries(). Caching of vfdev enables simplification of ice_restore_all_vfs_msi_state(). Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: add drop rule matching on not active lportMichal Swiatkowski
Inactive LAG port should not receive any packets, as it can cause adding invalid FDBs (bridge offload). Add a drop rule matching on inactive lport in LAG. Reviewed-by: Simon Horman <horms@kernel.org> Co-developed-by: Marcin Szycik <marcin.szycik@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ice: remove unused ice_flow_entry fieldsPrzemek Kitszel
Remove ::entry and ::entry_sz fields of &ice_flow_entry, as they were never set. Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20ethtool: untangle the linkmode and ethtool headersJakub Kicinski
Commit 26c5334d344d ("ethtool: Add forced speed to supported link modes maps") added a dependency between ethtool.h and linkmode.h. The dependency in the opposite direction already exists so the new code was inserted in an awkward place. The reason for ethtool.h to include linkmode.h, is that ethtool_forced_speed_maps_init() is a static inline helper. That's not really necessary. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.Heng Guo
Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1500 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1500 Reproduce: VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0. scapy command: send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1, inter=1.000000) Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- "ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase from 7 to 8. Issue description and patch: IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS ("OutOctets") in ip_finish_output2(). According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but not "OutRequests". "OutRequests" does not include any datagrams counted in "ForwDatagrams". ipSystemStatsOutOctets OBJECT-TYPE DESCRIPTION "The total number of octets in IP datagrams delivered to the lower layers for transmission. Octets from datagrams counted in ipIfStatsOutTransmits MUST be counted here. ipSystemStatsOutRequests OBJECT-TYPE DESCRIPTION "The total number of IP datagrams that local IP user- protocols (including ICMP) supplied to IP in requests for transmission. Note that this counter does not include any datagrams counted in ipSystemStatsOutForwDatagrams. So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add IPSTATS_MIB_OUTREQUESTS for "OutRequests". Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6. Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0 ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0 ---------------------------------------------------------------------- "ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3. "OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428. Signed-off-by: Heng Guo <heng.guo@windriver.com> Reviewed-by: Kun Song <Kun.Song@windriver.com> Reviewed-by: Filip Pudak <filip.pudak@windriver.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20Merge branch 'ksz886x-forced-link-modes'David S. Miller
Oleksij Rempel says: ==================== fix forced link mode for KSZ886X switches changes v3: - squash patch 1 and 2 - use genphy_config_aneg() instead of genphy_setup_forced() changes v2: - address kernel test robot warning - change comment explaining clearing of KSZ886X_CTRL_FORCE_LINK bit - s/PHY we create/PHY will create/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20net: phy: micrel: Fix forced link mode for KSZ886X switchesOleksij Rempel
Address a link speed detection issue in KSZ886X PHY driver when in forced link mode. Previously, link partners like "ASIX AX88772B" with KSZ8873 could fall back to 10Mbit instead of configured 100Mbit. The issue arises as KSZ886X PHY continues sending Fast Link Pulses (FLPs) even with autonegotiation off, misleading link partners in autoneg mode, leading to incorrect link speed detection. Now, when autonegotiation is disabled, the driver sets the link state forcefully using KSZ886X_CTRL_FORCE_LINK bit. This action, beyond just disabling autonegotiation, makes the PHY state more reliably detected by link partners using parallel detection, thus fixing the link speed misconfiguration. With autonegotiation enabled, link state is not forced, allowing proper autonegotiation process participation. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Divya Koppera <divya.koppera@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20net: dsa: microchip: ksz8: Enable MIIM PHY Control reg accessOleksij Rempel
Provide access to MIIM PHY Control register (Reg. 31) through ksz8_r_phy_ctrl() and ksz8_w_phy_ctrl() functions. Necessary for upcoming micrel.c patch to address forced link mode configuration. Closes: https://lore.kernel.org/oe-kbuild-all/202310112224.iYgvjBUy-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20Merge branch 'mlxsw-lag-table-allocation'David S. Miller
Petr Machata says: ==================== mlxsw: Move allocation of LAG table to the driver PGT is an in-HW table that maps addresses to sets of ports. Then when some HW process needs a set of ports as an argument, instead of embedding the actual set in the dynamic configuration, what gets configured is the address referencing the set. The HW then works with the appropriate PGT entry. Within the PGT is placed a LAG table. That is a contiguous block of PGT memory where each entry describes which ports are members of the corresponding LAG port. The PGT is split to two parts: one managed by the FW, and one managed by the driver. Historically, the FW part included also the LAG table, referred to as FW LAG mode. Giving the responsibility for placement of the LAG table to the driver, referred to as SW LAG mode, makes the whole system more flexible. The FW currently supports both FW and SW LAG modes. To shed complexity, the FW should in the future only support SW LAG mode. Hence this patchset, where support for placement of LAG is added to mlxsw. There are FW versions out there that do not support SW LAG mode, and on Spectrum-1 in particular, there is no plan to support it at all. mlxsw will therefore have to support both modes of operation. Another aspect is that at least on Spectrum-1, there are FW versions out there that claim to support driver-placed LAG table, but then reject or ignore configurations enabling the same. The driver thus has to have a say in whether an attempt to configure SW LAG mode should even be done. The feature is therefore expressed in terms of "does the driver prefer SW LAG mode?", and "what LAG mode the PCI module managed to configure the FW with". This is unlike current flood mode configuration, where the driver can give a strict value, and that's what gets configured. But it gives a chance to the driver to determine whether LAG mode should be enabled at all. The "does the driver prefer SW LAG mode?" bit is expressed as a boolean lag_mode_prefer_sw. The reason for this is largely another feature that will be introduced in a follow-up patchset: support for CFF flood mode. The driver currently requires that the FW be configured with what is called controlled flood mode. But on capable systems, CFF would be preferred. So there are two values in flight: the preferred flood mode, and the fallback. This could be expressed with an array of flood modes ordered by preference, but that looks like an overkill in comparison. This flag/value model is then reused for LAG mode as well, except the fallback value is absent and implied to be FW, because there are no other values to choose from. The patchset progresses as follows: - Patches #1 to #5 adjust reg.h and cmd.h with new register fields, constants and remarks. - Patches #6 and #7 add the ability to request SW LAG mode and to query the LAG mode that was actually negotiated. This is where the abovementioned lag_mode_prefer_sw flag is added. - Patches #7 to #9 generalize PGT allocations to make it possible to allocate the LAG table, which is done in patch #10. - In patch #11, toggle lag_mode_prefer_sw on Spectrum-2 and above, which makes the newly-added code live. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum: Set SW LAG mode on Spectrum>1Petr Machata
On Spectrum-2, Spectrum-3 and Spectrum-4 machines, request SW responsibility for placement of the LAG table. On Spectrum-1, some FW versions claim to support lag_mode field despite quietly ignoring any settings made to that field. Thus refrain from attempting to configure lag_mode on those systems at all. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum: Allocate LAG table when in SW LAG modePetr Machata
In this patch, if the LAG mode is SW, allocate the LAG table and configure SGCR to indicate where it was allocated. We use the default "DDD" (for dynamic data duplication) layout of the LAG table. In the DDD mode, the membership information for each LAG is copied in 8 PGT entries. This is done for performance reasons. The LAG table then needs to be allocated on an address aligned to 8. Deal with this by moving the LAG init ahead so that the LAG table is allocated at address 0. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum_pgt: Generalize PGT allocationPetr Machata
PGT blocks are allocated through the function mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows which piece of PGT exactly they want to get. That was fine while the FID code was the only client allocating blocks of PGT. However for SW-allocated LAG table, there will be an additional client: mlxsw_sp_lag_init(). The interface should therefore be changed to not require particular coordinates, but to take just the requested size, allocate the block wherever, and give back the PGT address. In this patch, change the interface accordingly. Initialize FID family's pgt_base from the result of the PGT allocation (note that mlxsw makes a copy of the family structure, so what gets initialized is not actually the global structure). Drop the now-unnecessary pgt_base initializations and the corresponding defines. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum_fid: Allocate PGT for the whole FID family in one goPetr Machata
PGT blocks are allocated through the function mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows which piece of PGT exactly they want to get. That was fine while the FID code was the only client allocating blocks of PGT. However for SW-allocated LAG table, there will be an additional client: mlxsw_sp_lag_init(). The interface should therefore be changed to not require particular coordinates, but to take just the requested size, allocate the block wherever, and give back the PGT address. The current FID mode has one place where PGT address can be stored: the FID family's pgt_base. The allocation scheme should therefore be changed from allocating a block per FID flood table, to allocating a block per FID family. Do just that in this patch. The per-family allocation is going to be useful for another related feature as well: the CFF mode. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: pci: Permit toggling LAG modePetr Machata
Add to struct mlxsw_config_profile a field lag_mode_prefer_sw for the driver to indicate that SW LAG mode should be configured if possible. Add to the PCI module code to set lag_mode as appropriate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: core, pci: Add plumbing related to LAG modePetr Machata
lag_mode describes where the responsibility for LAG table placement lies: SW or FW. The bus module determines whether LAG is supported, can configure it if it is, and knows what (if any) configuration has been applied. Therefore add a bus callback to determine the configured LAG mode. Also add to core an API to query it. The LAG mode is for now kept at the default value of 0 for FW-managed. The code to actually toggle it will be added later. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: cmd: Add QUERY_FW.lag_mode_supportPetr Machata
Add QUERY_FW.lag_mode_support, which determines whether CONFIG_PROFILE.lag_mode is available. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: cmd: Add CONFIG_PROFILE.{set_, }lag_modePetr Machata
Add CONFIG_PROFILE.lag_mode, which serves for moving responsibility for placement of the LAG table from FW to SW. Whether lag_mode should be configured is determined by CONFIG_PROFILE.set_lag_mode, which also add. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: cmd: Fix omissions in CONFIG_PROFILE field names in commentsPetr Machata
A number of CONFIG_PROFILE fields' comments refer to a field named like cmd_mbox_config_* instead of cmd_mbox_config_profile_*. Correct these omissions. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: reg: Add SGCR.lag_lookup_pgt_basePetr Machata
Add SGCR.lag_lookup_pgt_base, which is used for configuring the base address of the LAG table within the PGT table for cases when the driver is responsible for the table placement. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: reg: Drop SGCR.llbPetr Machata
SGCR, Switch General Configuration Register, has not been used since commit b0d80c013b04 ("mlxsw: Remove Mellanox SwitchX-2 ASIC support"). We will need the register again shortly, so instead of dropping it and reintroducing again, just drop the sole unused field. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20Merge branch 'netlink-auto-integers'David S. Miller
Jakub Kicinski says: ==================== netlink: add variable-length / auto integers Add netlink support for "common" / variable-length / auto integers which are carried at the message level as either 4B or 8B depending on the exact value. This saves space and will hopefully decrease the number of instances where we realize that we needed more bits after uAPI is set is stone. It also loosens the alignment requirements, avoiding the need for padding. This mini-series is a fuller version of the previous RFC: https://lore.kernel.org/netdev/20121204.130914.1457976839967676240.davem@davemloft.net/ No user included here. I have tested (and will use) it in the upcoming page pool API but the assumption is that it will be widely applicable. So sending without a user. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20netlink: specs: add support for auto-sized scalarsJakub Kicinski
Support uint / sint types in specs and YNL. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20netlink: add variable-length / auto integersJakub Kicinski
We currently push everyone to use padding to align 64b values in netlink. Un-padded nla_put_u64() doesn't even exist any more. The story behind this possibly start with this thread: https://lore.kernel.org/netdev/20121204.130914.1457976839967676240.davem@davemloft.net/ where DaveM was concerned about the alignment of a structure containing 64b stats. If user space tries to access such struct directly: struct some_stats *stats = nla_data(attr); printf("A: %llu", stats->a); lack of alignment may become problematic for some architectures. These days we most often put every single member in a separate attribute, meaning that the code above would use a helper like nla_get_u64(), which can deal with alignment internally. Even for arches which don't have good unaligned access - access aligned to 4B should be pretty efficient. Kernel and well known libraries deal with unaligned input already. Padded 64b is quite space-inefficient (64b + pad means at worst 16B per attr vs 32b which takes 8B). It is also more typing: if (nla_put_u64_pad(rsp, NETDEV_A_SOMETHING_SOMETHING, value, NETDEV_A_SOMETHING_PAD)) Create a new attribute type which will use 32 bits at netlink level if value is small enough (probably most of the time?), and (4B-aligned) 64 bits otherwise. Kernel API is just: if (nla_put_uint(rsp, NETDEV_A_SOMETHING_SOMETHING, value)) Calling this new type "just" sint / uint with no specific size will hopefully also make people more comfortable with using it. Currently telling people "don't use u8, you may need the bits, and netlink will round up to 4B, anyway" is the #1 comment we give to newcomers. In terms of netlink layout it looks like this: 0 4 8 12 16 32b: [nlattr][ u32 ] 64b: [ pad ][nlattr][ u64 ] uint(32) [nlattr][ u32 ] uint(64) [nlattr][ u64 ] Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20tools: ynl-gen: make the mnl_type() method publicJakub Kicinski
uint/sint support will add more logic to mnl_type(), deduplicate it and make it more accessible. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20Merge branch 'devlink-errors-fmsg'David S. Miller
Przemek Kitszel says: ==================== devlink: retain error in struct devlink_fmsg Extend devlink fmsg to retain error (patch 1), so drivers could omit error checks after devlink_fmsg_*() (patches 2-10), and finally enforce future uses to follow this practice by change to return void (patch 11) Note that it was compile tested only. bloat-o-meter for whole series: add/remove: 8/18 grow/shrink: 23/40 up/down: 2017/-5833 (-3816) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20devlink: convert most of devlink_fmsg_*() to return voidPrzemek Kitszel
Since struct devlink_fmsg retains error by now (see 1st patch of this series), there is no longer need to keep returning it in each call. This is a separate commit to allow per-driver conversion to stop using those return values. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20staging: qlge: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20qed: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20net/mlx5: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: core: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20octeontx2-af: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20hinic: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20bnxt_en: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20pds_core: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20netdevsim: devlink health: use retained error fmsg APIPrzemek Kitszel
Drop unneeded error checking. devlink_fmsg_*() family of functions is now retaining errors, so there is no need to check for them after each call. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20devlink: retain error in struct devlink_fmsgPrzemek Kitszel
Retain error value in struct devlink_fmsg, to relieve drivers from checking it after each call. Note that fmsg is an in-memory builder/buffer of formatted message, so it's not the case that half baked message was sent somewhere. We could find following scheme in multiple drivers: err = devlink_fmsg_obj_nest_start(fmsg); if (err) return err; err = devlink_fmsg_string_pair_put(fmsg, "src", src); if (err) return err; err = devlink_fmsg_something(fmsg, foo, bar); if (err) return err; // and so on... err = devlink_fmsg_obj_nest_end(fmsg); With retaining error API that translates to: devlink_fmsg_obj_nest_start(fmsg); devlink_fmsg_string_pair_put(fmsg, "src", src); devlink_fmsg_something(fmsg, foo, bar); // and so on... devlink_fmsg_obj_nest_end(fmsg); What means we check error just when is time to send. Possible error scenarios are developer error (API misuse) and memory exhaustion, both cases are good candidates to choose readability over fastest possible exit. Note that this patch keeps returning errors, to allow per-driver conversion to the new API, but those are not needed at this point already. This commit itself is an illustration of benefits for the dev-user, more of it will be in separate commits of the series. Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-19Merge branch 'tools-ynl-gen-support-full-range-of-min-max-checks'Jakub Kicinski
Jakub Kicinski says: ==================== tools: ynl-gen: support full range of min/max checks YNL code gen currently supports only very simple range checks within the range of s16. Add support for full range of u64 / s64 which is good to have, and will be even more important with uint / sint. ==================== Link: https://lore.kernel.org/r/20231018163917.2514503-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19tools: ynl-gen: support limit namesJakub Kicinski
Support the use of symbolic names like s8-min or u32-max in checks to make writing specs less painful. Link: https://lore.kernel.org/r/20231018163917.2514503-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19tools: ynl-gen: support full range of min/max checks for integer valuesJakub Kicinski
Extend the support to full range of min/max checks. None of the existing YNL families required complex integer validation. The support is less than trivial, because we try to keep struct nla_policy tiny the min/max members it holds in place are s16. Meaning we can only express checks in range of s16. For larger ranges we need to define a structure and link it in the policy. Link: https://lore.kernel.org/r/20231018163917.2514503-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19tools: ynl-gen: track attribute useJakub Kicinski
For range validation we'll need to know if any individual attribute is used on input (i.e. whether we will generate a policy for it). Track this information. Link: https://lore.kernel.org/r/20231018163917.2514503-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19ptp: prevent string overflowDan Carpenter
The ida_alloc_max() function can return up to INT_MAX so this buffer is not large enough. Also use snprintf() for extra safety. Fixes: 403376ddb422 ("ptp: add debugfs interface to see applied channel masks") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/d4b1a995-a0cb-4125-aa1d-5fd5044aba1d@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. net/mac80211/key.c 02e0e426a2fb ("wifi: mac80211: fix error path key leak") 2a8b665e6bcc ("wifi: mac80211: remove key_mtx") 7d6904bf26b9 ("Merge wireless into wireless-next") https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/ti/Kconfig a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object") 98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19Merge tag 'net-6.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, netfilter, WiFi. Feels like an up-tick in regression fixes, mostly for older releases. The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as fairly clear-cut user reported regressions. The mlx5 DMA bug was causing strife for 390x folks. The fixes themselves are not particularly scary, tho. No open investigations / outstanding reports at the time of writing. Current release - regressions: - eth: mlx5: perform DMA operations in the right locations, make devices usable on s390x, again - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve, previous fix of rejecting invalid config broke some scripts - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock - revert "ethtool: Fix mod state of verbose no_mask bitset", needs more work Current release - new code bugs: - tcp: fix listen() warning with v4-mapped-v6 address Previous releases - regressions: - tcp: allow tcp_disconnect() again when threads are waiting, it was denied to plug a constant source of bugs but turns out .NET depends on it - eth: mlx5: fix double-free if buffer refill fails under OOM - revert "net: wwan: iosm: enable runtime pm support for 7560", it's causing regressions and the WWAN team at Intel disappeared - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb, fix single-stream perf regression on some devices Previous releases - always broken: - Bluetooth: - fix issues in legacy BR/EDR PIN code pairing - correctly bounds check and pad HCI_MON_NEW_INDEX name - netfilter: - more fixes / follow ups for the large "commit protocol" rework, which went in as a fix to 6.5 - fix null-derefs on netlink attrs which user may not pass in - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless Debian for keeping HZ=250 alive) - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent letting frankenstein UDP super-frames from getting into the stack - net: fix interface altnames when ifc moves to a new namespace - eth: qed: fix the size of the RX buffers - mptcp: avoid sending RST when closing the initial subflow" * tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) Revert "ethtool: Fix mod state of verbose no_mask bitset" selftests: mptcp: join: no RST when rm subflow/addr mptcp: avoid sending RST when closing the initial subflow mptcp: more conservative check for zero probes tcp: check mptcp-level constraints for backlog coalescing selftests: mptcp: join: correctly check for no RST net: ti: icssg-prueth: Fix r30 CMDs bitmasks selftests: net: add very basic test for netdev names and namespaces net: move altnames together with the netdevice net: avoid UAF on deleted altname net: check for altname conflicts when changing netdev's netns net: fix ifname in netlink ntf during netns move net: ethernet: ti: Fix mixed module-builtin object net: phy: bcm7xxx: Add missing 16nm EPHY statistics ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr tcp_bpf: properly release resources on error paths net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve net: mdio-mux: fix C45 access returning -EIO after API change tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb octeon_ep: update BQL sent bytes before ringing doorbell ...