summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-25af_unix: Explicitly include headers for non-pointer struct fields.Kuniyuki Iwashima
include/net/af_unix.h indirectly includes some definitions for structs. Let's include such headers explicitly. linux/atomic.h : scm_stat.nr_fds linux/net.h : unix_sock.peer_wq linux/path.h : unix_sock.path linux/spinlock.h : unix_sock.lock linux/wait.h : unix_sock.peer_wake uapi/linux/un.h : unix_address.name[] linux/socket.h is removed as the structs there are not used directly, and linux/un.h is clarified with uapi as un.h only exists under include/uapi. While at it, duplicate headers are removed from .c files. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250318034934.86708-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25af_unix: Move internal definitions to net/unix/.Kuniyuki Iwashima
net/af_unix.h is included by core and some LSMs, but most definitions need not be. Let's move struct unix_{vertex,edge} to net/unix/garbage.c and other definitions to net/unix/af_unix.h. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250318034934.86708-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25af_unix: Sort headers.Kuniyuki Iwashima
This is a prep patch to make the following changes cleaner. No functional change intended. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250318034934.86708-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch 'support-tcp_rto_min_us-and-tcp_delack_max_us-for-set-getsockopt'Jakub Kicinski
Jason Xing says: ==================== support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt Add set/getsockopt supports for TCP_RTO_MIN_US and TCP_DELACK_MAX_US. ==================== Link: https://patch.msgid.link/20250317120314.41404-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25tcp: support TCP_DELACK_MAX_US for set/getsockopt useJason Xing
Support adjusting/reading delayed ack max for socket level by using set/getsockopt(). This option aligns with TCP_BPF_DELACK_MAX usage. Considering that bpf option was implemented before this patch, so we need to use a standalone new option for pure tcp set/getsockopt() use. Add WRITE_ONCE/READ_ONCE() to prevent data-race if setsockopt() happens to write one value to icsk_delack_max while icsk_delack_max is being read. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250317120314.41404-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25tcp: support TCP_RTO_MIN_US for set/getsockopt useJason Xing
Support adjusting/reading RTO MIN for socket level by using set/getsockopt(). This new option has the same effect as TCP_BPF_RTO_MIN, which means it doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that bpf option was implemented before this patch, so we need to use a standalone new option for pure tcp set/getsockopt() use. When the socket is created, its icsk_rto_min is set to the default value that is controlled by sysctl_tcp_rto_min_us. Then if application calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then icsk_rto_min will be overridden in jiffies unit. This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around icsk_rto_min. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch ↵Jakub Kicinski
'mlxsw-add-vxlan-to-the-same-hardware-domain-as-physical-bridge-ports' Petr Machata says: ==================== mlxsw: Add VXLAN to the same hardware domain as physical bridge ports Amit Cohen writes: Packets which are trapped to CPU for forwarding in software data path are handled according to driver marking of skb->offload_{,l3}_fwd_mark. Packets which are marked as L2-forwarded in hardware, will not be flooded by the bridge to bridge ports which are in the same hardware domain as the ingress port. Currently, mlxsw does not add VXLAN bridge ports to the same hardware domain as physical bridge ports despite the fact that the device is able to forward packets to and from VXLAN tunnels in hardware. In some scenarios this can result in remote VTEPs receiving duplicate packets. To solve such packets duplication, add VXLAN bridge ports to the same hardware domain as other bridge ports. One complication is ARP suppression which requires the local VTEP to avoid flooding ARP packets to remote VTEPs if the local VTEP is able to reply on behalf of remote hosts. This is currently implemented by having the device flood ARP packets in hardware and trapping them during VXLAN encapsulation, but marking them with skb->offload_fwd_mark=1 so that the bridge will not re-flood them to physical bridge ports. The above scheme will break when VXLAN bridge ports are added to the same hardware domain as physical bridge ports as ARP packets that cannot be suppressed by the bridge will not be able to egress the VXLAN bridge ports due to hardware domain filtering. This is solved by trapping ARP packets when they enter the device and not marking them as being forwarded in hardware. Patch set overview: Patch #1 sets hardware to trap ARP packets at layer 2 Patches #2-#4 are preparations for setting hardwarwe domain of VXLAN Patch #5 sets hardware domain of VXLAN Patch #6 extends VXLAN flood test to verify that this set solves the packets duplication ==================== Link: https://patch.msgid.link/cover.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24selftests: vxlan_bridge: Test flood with unresolved FDB entryAmit Cohen
Extend flood test to configure FDB entry with unresolved destination IP, check that packets are not sent twice. Without the previous patch which handles such scenario in mlxsw, the tests fail: $ TESTS='test_flood' ./vxlan_bridge_1d.sh Running tests with UDP port 4789 TEST: VXLAN: flood [ OK ] TEST: VXLAN: flood, unresolved FDB entry [FAIL] vx2 ns2: Expected to capture 10 packets, got 20. $ TESTS='test_flood' ./vxlan_bridge_1q.sh INFO: Running tests with UDP port 4789 TEST: VXLAN: flood vlan 10 [ OK ] TEST: VXLAN: flood vlan 20 [ OK ] TEST: VXLAN: flood vlan 10, unresolved FDB entry [FAIL] vx10 ns2: Expected to capture 10 packets, got 20. TEST: VXLAN: flood vlan 20, unresolved FDB entry [FAIL] vx20 ns2: Expected to capture 10 packets, got 20. With the previous patch, the tests pass. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: Add VXLAN bridge ports to same hardware domain as physical bridge portsAmit Cohen
When hardware floods packets to bridge ports, but flooding to VXLAN bridge port fails during encapsulation to one of the remote VTEPs, the packets are trapped to CPU. In such case, the packets are marked with skb->offload_fwd_mark, which means that packet was L2-forwarded in hardware. Software data path repeats flooding, but packets which are marked with skb->offload_fwd_mark will not be flooded by the bridge to bridge ports which are in the same hardware domain as the ingress port. Currently, mlxsw does not add VXLAN bridge ports to the same hardware domain as physical bridge ports despite the fact that the device is able to forward packets to and from VXLAN tunnels in hardware. In some scenarios (as mentioned above) this can result in remote VTEPs receiving duplicate packets. The packets are first flooded by hardware and after an encapsulation failure, they are flooded again to all remote VTEPs by software. Solve this by adding VXLAN bridge ports to the same hardware domain as physical bridge ports, so then nbp_switchdev_allowed_egress() will return false also for VXLAN, and packets will not be sent twice from VXLAN device. switchdev_bridge_port_offload() should get vxlan_dev not as const, so some changes are required. Call switchdev API from mlxsw_sp_bridge_vxlan_{join,leave}() which handle offload configurations. Reported-by: Vladimir Oltean <olteanv@gmail.com> Closes: https://lore.kernel.org/all/20250210152246.4ajumdchwhvbarik@skbuf/ Reported-by: Vladyslav Mykhaliuk <vmykhaliuk@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7279056843140fae3a72c2d204c7886b79d03899.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: spectrum_switchdev: Move mlxsw_sp_bridge_vxlan_join()Amit Cohen
Next patch will call __mlxsw_sp_bridge_vxlan_leave() from mlxsw_sp_bridge_vxlan_join() as part of error flow, move the function to be able to call the second one. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/64750a0965536530482318578bada30fac372b8a.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: spectrum_switchdev: Add an internal API for VXLAN leaveAmit Cohen
There is asymmetry in how the VXLAN join and leave functions are used. The join function (mlxsw_sp_bridge_vxlan_join()) is only called in response to netdev events (e.g., VXLAN device joining a bridge), but the leave function is also called in response to switchdev events (e.g., VLAN configuration on top of the VXLAN device) in order to invalidate VNI to FID mappings. This asymmetry will cause problems when the functions will be later extended to mark VXLAN bridge ports as offloaded or not. Therefore, create an internal function (__mlxsw_sp_bridge_vxlan_leave()) that is used to invalidate VNI to FID mappings and call it from mlxsw_sp_bridge_vxlan_leave() which will only be invoked in response to netdev events, like mlxsw_sp_bridge_vxlan_join(). No functional changes intended. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/f3a32bd2d87a0b7ac4d2bb98a427dc6d95a01cd0.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: spectrum: Call mlxsw_sp_bridge_vxlan_{join, leave}() for VLAN-aware ↵Amit Cohen
bridge mlxsw_sp_bridge_vxlan_{join,leave}() are not called when a VXLAN device joins or leaves a VLAN-aware bridge. As mentioned in the comment - when the bridge is VLAN-aware, the VNI of the VXLAN device needs to be mapped to a VLAN, but at this point no VLANs are configured on the VxLAN device. This means that we can call the APIs, but there is no point to do that, as they do not configure anything in such cases. Next patch will extend mlxsw_sp_bridge_vxlan_{join,leave}() to set hardware domain for VXLAN, this should be done also when a VXLAN device joins or leaves a VLAN-aware bridge. Call the APIs, which for now do not do anything in these flows. Align the call to mlxsw_sp_bridge_vxlan_leave() to be called like mlxsw_sp_bridge_vxlan_join(), only in case that the VXLAN device is up, so move the check to be done before calling mlxsw_sp_bridge_vxlan_{join,leave}(). This does not change the existing behavior, as there is a similar check inside mlxsw_sp_bridge_vxlan_leave(). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/994c1ea93520f9ea55d1011cd47dc2180d526484.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: Trap ARP packets at layer 2 instead of layer 3Amit Cohen
Next patch will set the same hardware domain for all bridge ports, including VXLAN, to prevent packets from being forwarded by software when they were already forwarded by hardware. ARP packets are not flooded by hardware to VXLAN, so software should handle such flooding. When hardware domain of VXLAN device will be changed, ARP packets which are trapped and marked with offload_fwd_mark will not be flooded to VXLAN also in software, which will break VXLAN traffic. To prevent such breaking, trap ARP packets at layer 2 and don't mark them as L2-forwarded in hardware, then flooding ARP packets will be done only in software, and VXLAN will send ARP packets. Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are trapped when they enter the device. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: introduce per netns packet chainsPaolo Abeni
Currently network taps unbound to any interface are linked in the global ptype_all list, affecting the performance in all the network namespaces. Add per netns ptypes chains, so that in the mentioned case only the netns owning the packet socket(s) is affected. While at that drop the global ptype_all list: no in kernel user registers a tap on "any" type without specifying either the target device or the target namespace (and IMHO doing that would not make any sense). Note that this adds a conditional in the fast path (to check for per netns ptype_specific list) and increases the dataset size by a cacheline (owing the per netns lists). Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumaze@google.com> Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24tty: caif: removed unused function debugfs_tx()Simon Horman
Remove debugfs_tx() which was added when the caif driver was added in commit 9b27105b4a44 ("net-caif-driver: add CAIF serial driver (ldisc)") but it has never been used. Flagged by LLVM 19.1.7 W=1 builds. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250320-caif-debugfs-tx-v1-1-be5654770088@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: ethernet: Drop unused of_gpio.hPeng Fan
of_gpio.h is deprecated. Since there is no of_gpio_x API, drop unused of_gpio.h. While at here, drop gpio.h and gpio/consumer.h if no user in driver. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250320031542.3960381-1-peng.fan@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: phy: fixed_phy: transition to the faux device interfaceSudeep Holla
The net fixed phy driver does not require the creation of a platform device. Originally, this approach was chosen for simplicity when the driver was first implemented. With the introduction of the lightweight faux device interface, we now have a more appropriate alternative. Migrate the device to utilize the faux bus, given that the platform device it previously created was not a real one anyway. This will get rid of the fake platform device. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://patch.msgid.link/20250319135209.2734594-1-sudeep.holla@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch 'mlx5-cleanups-2025-03-19'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 cleanups 2025-03-19 This series contains small cleanups to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/1742412199-159596-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5e: Always select CONFIG_PAGE_POOL_STATSTariq Toukan
Always set PAGE_POOL_STATS in mlx5 Eth driver. Cleanup the corresponding #ifdefs. Page pool stats are essential to monitor and analyze RX performance. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5e: Use right API to free bitmap memoryMark Zhang
Use bitmap_free() to free memory allocated with bitmap_zalloc_node(). This fixes memtrack error: mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0 Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5: Remove NULL check before dev_{put, hold}Gal Pressman
Fix coccinelle warnings: WARNING: NULL check before dev_{put, hold} functions is not needed. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: phylink: Remove unused function pointer from phylink structureAlexander Duyck
From what I can tell the .get_fixed_state pointer in the phylink structure hasn't been used since commit 5c05c1dbb177 ("net: phylink, dsa: eliminate phylink_fixed_state_cb()") . Since I can't find any users for it we might as well just drop the pointer. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/174240634772.1745174.5690351737682751849.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24netpoll: Eliminate redundant assignmentBreno Leitao
The assignment of zero to udph->check is unnecessary as it is immediately overwritten in the subsequent line. Remove the redundant assignment. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250319-netpoll_nit-v1-1-a7faac5cbd92@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: stmmac: Call xpcs_config_eee_mult_fact() only when xpcs is presentMaxime Chevallier
Some dwmac variants such as dwmac_socfpga don't use xpcs but lynx_pcs. Don't call xpcs_config_eee_mult_fact() in this case, as this causes a crash at init : Unable to handle kernel NULL pointer dereference at virtual address 00000039 when write [...] Call trace: xpcs_config_eee_mult_fact from stmmac_pcs_setup+0x40/0x10c stmmac_pcs_setup from stmmac_dvr_probe+0xc0c/0x1244 stmmac_dvr_probe from socfpga_dwmac_probe+0x130/0x1bc socfpga_dwmac_probe from platform_probe+0x5c/0xb0 Fixes: 060fb27060e8 ("net: stmmac: call xpcs_config_eee_mult_fact()") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250321103502.1303539-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24selftests: drv-net: rss_ctx: Don't assume indirection table is presentGal Pressman
The test_rss_context_dump() test assumes the indirection table is always supported, which is not true for all drivers, e.g., virtio_net when VIRTIO_NET_F_RSS is disabled. Skip the check if 'indir' is not present. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24docs/kcm: Fix typo "BFP"Ryohei Kinugawa
'BFP' should be 'BPF'. Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com> Link: https://patch.msgid.link/20250318095154.4187952-1-ryohei.kinugawa@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge branch 'r8169-enable-more-devices-aspm-support'Jakub Kicinski
ChunHao Lin says: ==================== r8169: enable more devices ASPM support This series of patches will enable more devices ASPM support. It also fix a RTL8126 cannot enter L1 substate issue when ASPM is enabled. ==================== Link: https://patch.msgid.link/20250318083721.4127-1-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24r8169: disable RTL8126 ZRX-DC timeoutChunHao Lin
Disable it due to it dose not meet ZRX-DC specification. If it is enabled, device will exit L1 substate every 100ms. Disable it for saving more power in L1 substate. Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20250318083721.4127-3-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24r8169: enable RTL8168H/RTL8168EP/RTL8168FP ASPM supportChunHao Lin
This patch will enable RTL8168H/RTL8168EP/RTL8168FP ASPM support on the platforms that have tested with ASPM enabled. Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20250318083721.4127-2-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24docs: networking: strparser: Fix a typoWangYuli
The context indicates that 'than' is the correct word instead of 'then', as a comparison is being performed. Given that 'then' is also a valid English word, checkpatch.pl wouldn't have picked up on this spelling error. This typo was caught by AI during code review. Suggested-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Link: https://patch.msgid.link/A43BEA49ED5CC6E5+20250318074656.644391-1-wangyuli@uniontech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24docs: fix the path of example code and example commands for device memory TCPYui Washizu
This updates the old path and fixes the description of unavailable options. Signed-off-by: Yui Washizu <yui.washidu@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250318061251.775191-1-yui.washidu@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24tcp/dccp: Remove inet_connection_sock_af_ops.addr2sockaddr().Kuniyuki Iwashima
inet_connection_sock_af_ops.addr2sockaddr() hasn't been used at all in the git era. $ git grep addr2sockaddr $(git rev-list HEAD | tail -n 1) Let's remove it. Note that there was a 4 bytes hole after sockaddr_len and now it's 6 bytes, so the binary layout is not changed. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250318060112.3729-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24selftest: net: update proc_net_pktgen (add more imix_weights test cases)Peter Seiderer
Add more imix_weights test cases (for incomplete input). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: pktgen: add strict buffer parsing index checkPeter Seiderer
Add strict buffer parsing index check to avoid the following Smatch warning: net/core/pktgen.c:877 get_imix_entries() warn: check that incremented offset 'i' is capped Checking the buffer index i after every get_user/i++ step and returning with error code immediately avoids the current indirect (but correct) error handling. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/netdev/36cf3ee2-38b1-47e5-a42a-363efeb0ace3@stanley.mountain/ Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250317090401.1240704-1-ps.report@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24MAINTAINERS: adjust the file entry in INTEL PMC CORE DRIVERLukas Bulwahn
Commit 7e2f7e25f6ff ("arch: x86: add IPC mailbox accessor function and add SoC register access") adds a new file entry referring to the non-existent file linux/platform_data/x86/intel_pmc_ipc.h in section INTEL PMC CORE DRIVER rather than referring to the file include/linux/platform_data/x86/intel_pmc_ipc.h added with this commit. Note that it was missing 'include' in the beginning. Adjust the file reference to the intended file. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250317092717.322862-1-lukas.bulwahn@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24tcp: move icsk_clean_acked to a better locationEric Dumazet
As a followup of my presentation in Zagreb for netdev 0x19: icsk_clean_acked is only used by TCP when/if CONFIG_TLS_DEVICE is enabled from tcp_ack(). Rename it to tcp_clean_acked, move it to tcp_sock structure in the tcp_sock_read_rx for better cache locality in TCP fast path. Define this field only when CONFIG_TLS_DEVICE is enabled saving 8 bytes on configs not using it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250317085313.2023214-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: phy: dp83822: fix transmit amplitude if CONFIG_OF_MDIO not definedDimitri Fedrau
When CONFIG_OF_MDIO is not defined the index for selecting the transmit amplitude voltage for 100BASE-TX is set to 0, but it should be -1, if there is no need to modify the transmit amplitude voltage. Move initialization of the index from dp83822_of_init to dp8382x_probe. Fixes: 4f3735e82d8a ("net: phy: dp83822: Add support for changing the transmit amplitude voltage") Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Link: https://patch.msgid.link/20250317-dp83822-fix-transceiver-mdio-v2-1-fb09454099a4@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: ena: resolve WARN_ON when freeing IRQsDavid Arinzon
When IRQs are freed, a WARN_ON is triggered as the affinity notifier is not released. This results in the below stack trace: [ 484.544586] ? __warn+0x84/0x130 [ 484.544843] ? free_irq+0x5c/0x70 [ 484.545105] ? report_bug+0x18a/0x1a0 [ 484.545390] ? handle_bug+0x53/0x90 [ 484.545664] ? exc_invalid_op+0x14/0x70 [ 484.545959] ? asm_exc_invalid_op+0x16/0x20 [ 484.546279] ? free_irq+0x5c/0x70 [ 484.546545] ? free_irq+0x10/0x70 [ 484.546807] ena_free_io_irq+0x5f/0x70 [ena] [ 484.547138] ena_down+0x250/0x3e0 [ena] [ 484.547435] ena_destroy_device+0x118/0x150 [ena] [ 484.547796] __ena_shutoff+0x5a/0xe0 [ena] [ 484.548110] pci_device_remove+0x3b/0xb0 [ 484.548412] device_release_driver_internal+0x193/0x200 [ 484.548804] driver_detach+0x44/0x90 [ 484.549084] bus_remove_driver+0x69/0xf0 [ 484.549386] pci_unregister_driver+0x2a/0xb0 [ 484.549717] ena_cleanup+0xc/0x130 [ena] [ 484.550021] __do_sys_delete_module.constprop.0+0x176/0x310 [ 484.550438] ? syscall_trace_enter+0xfb/0x1c0 [ 484.550782] do_syscall_64+0x5b/0x170 [ 484.551067] entry_SYSCALL_64_after_hwframe+0x76/0x7e Adding a call to `netif_napi_set_irq` with -1 as the IRQ index, which frees the notifier. Fixes: de340d8206bf ("net: ena: use napi's aRFS rmap notifers") Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250317071147.1105-1-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: openvswitch: fix kernel-doc warnings in internal headersIlya Maximets
Some field descriptions were missing, some were not very accurate. Not touching the uAPI header or .c files for now. Formatting of those comments isn't great in general, but at least they are not missing anything now. Before: $ ./scripts/kernel-doc -none -Wall net/openvswitch/*.h 2>&1 | wc -l 16 After: $ ./scripts/kernel-doc -none -Wall net/openvswitch/*.h 2>&1 | wc -l 0 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250320224431.252489-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: phy: phy_interface_t: Fix RGMII_TXID code commentIhor Matushchak
Fix copy-paste error in the code comment for Interface Mode definitions. The code refers to Internal TX delay, not Internal RX delay. It was likely copied from the line above this one. Signed-off-by: Ihor Matushchak <ihor.matushchak@foobox.net> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250316071551.9794-1-ihor.matushchak@foobox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-21net: phy: realtek: disable PHY-mode EEERussell King (Oracle)
Realtek RTL8211F has a "PHY-mode" EEE support which interferes with an IEEE 802.3 compliant implementation. This mode defaults to enabled, and results in the MAC receive path not seeing the link transition to LPI state. Fix this by disabling PHY-mode EEE. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21net: phy: fix genphy_c45_eee_is_active() for disabled EEERussell King (Oracle)
Commit 809265fe96fe ("net: phy: c45: remove local advertisement parameter from genphy_c45_eee_is_active") stopped reading the local advertisement from the PHY earlier in this development cycle, which broke "ethtool --set-eee ethX eee off". When ethtool is used to set EEE off, genphy_c45_eee_is_active() indicates that EEE was active if the link partner reported an advertisement, which causes phylib to set phydev->enable_tx_lpi on link up, despite our local advertisement in hardware being empty. However, phydev->advertising_eee is preserved while EEE is turned off, which leads to genphy_c45_eee_is_active() incorrectly reporting that EEE is active. Fix it by checking phydev->eee_cfg.eee_enabled, and if clear, immediately indicate that EEE is not active. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21Merge branch 'mlx5e-support-recovery-counter-in-reset'Paolo Abeni
Tariq Toukan says: ==================== mlx5e: Support recovery counter in reset This series by Yael adds a recovery counter in ethtool, for any recovery type during port reset cycle. Series starts with some cleanup and refactoring patches. New counter is added and exposed to ethtool stats in patch #4. ==================== Link: https://patch.msgid.link/1742112876-2890-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21net/mlx5e: Expose port reset cycle recovery counter via ethtoolYael Chemla
Display recovery event of PPCNT recovery counters group. Counts (per link) the number of total successful recovery events of any recovery types during port reset cycle. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21net/mlx5e: Get counter group size by FW capabilityYael Chemla
Retrieve the number of fields supported by each PPCNT counter group based on the FW capability for this group. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742112876-2890-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21net/mlx5e: Access PHY layer counter group as other counter groupsYael Chemla
Adjust the way physical layer counters group is accessed to match the generic method used for accessing other PPCNT counter groups. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1742112876-2890-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21net/mlx5e: Ensure each counter group uses its PCAM bitYael Chemla
The code was incorrectly relying on PCAM bit of ppcnt_statistical_group for accessing per_lane_error_counters. If ppcnt_statistical_group PCAM bit was not set, we would not read per_lane_error_counters, even when its PCAM bit is set. Given the existing device capabilities, it seems to cause no harm, so this change primarily serves as cleanup. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742112876-2890-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21mptcp: sockopt: fix getting freebind & transparentMatthieu Baerts (NGI0)
When adding a socket option support in MPTCP, both the get and set parts are supposed to be implemented. IP(V6)_FREEBIND and IP(V6)_TRANSPARENT support for the setsockopt part has been added a while ago, but it looks like the get part got forgotten. It should have been present as a way to verify a setting has been set as expected, and not to act differently from TCP or any other socket types. Everything was in place to expose it, just the last step was missing. Only new code is added to cover these specific getsockopt(), that seems safe. Fixes: c9406a23c116 ("mptcp: sockopt: add SOL_IP freebind & transparent options") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-3-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21mptcp: sockopt: fix getting IPV6_V6ONLYMatthieu Baerts (NGI0)
When adding a socket option support in MPTCP, both the get and set parts are supposed to be implemented. IPV6_V6ONLY support for the setsockopt part has been added a while ago, but it looks like the get part got forgotten. It should have been present as a way to verify a setting has been set as expected, and not to act differently from TCP or any other socket types. Not supporting this getsockopt(IPV6_V6ONLY) blocks some apps which want to check the default value, before doing extra actions. On Linux, the default value is 0, but this can be changed with the net.ipv6.bindv6only sysctl knob. On Windows, it is set to 1 by default. So supporting the get part, like for all other socket options, is important. Everything was in place to expose it, just the last step was missing. Only new code is added to cover this specific getsockopt(), that seems safe. Fixes: c9b95a135987 ("mptcp: support IPV6_V6ONLY setsockopt") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/550 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-2-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21Merge branch 'netconsole-add-support-for-userdata-release'Paolo Abeni
Breno Leitao says: ==================== netconsole: Add support for userdata release I am submitting a series of patches that introduce a new feature for the netconsole subsystem, specifically the addition of the 'release' field to the sysdata structure. This feature allows the kernel release/version to be appended to the userdata dictionary in every message sent, enhancing the information available for debugging and monitoring purposes. This complements the already supported release prepend feature, which was added some time ago. The release prepend appends the release information at the message header, which is not ideal for two reasons: 1) It is difficult to determine if a message includes this information, making it hard and resource-intensive to parse. 2) When a message is fragmented, the release information is appended to every message fragment, consuming valuable space in the packet. The "release prepend" feature was created before the concept of userdata and sysdata. Now that this format has proven successful, we are implementing the release feature as part of this enhanced structure. This patch series aims to improve the netconsole subsystem by providing a more efficient and user-friendly way to include kernel release information in messages. I believe these changes will significantly aid in system analysis and troubleshooting. Suggested-by: Manu Bretelle <chantr4@gmail.com> Signed-off-by: Breno Leitao <leitao@debian.org> ==================== Link: https://patch.msgid.link/20250314-netcons_release-v1-0-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>