summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-18Merge tag 'hyperv-fixes-signed-20241217' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Various fixes to Hyper-V tools in the kernel tree (Dexuan Cui, Olaf Hering, Vitaly Kuznetsov) - Fix a bug in the Hyper-V TSC page based sched_clock() (Naman Jain) - Two bug fixes in the Hyper-V utility functions (Michael Kelley) - Convert open-coded timeouts to secs_to_jiffies() in Hyper-V drivers (Easwar Hariharan) * tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: tools/hv: reduce resource usage in hv_kvp_daemon tools/hv: add a .gitignore file tools/hv: reduce resouce usage in hv_get_dns_info helper hv/hv_kvp_daemon: Pass NIC name to hv_get_dns_info as well Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet Drivers: hv: util: Don't force error code to ENODEV in util_probe() tools/hv: terminate fcopy daemon if read from uio fails drivers: hv: Convert open-coded timeouts to secs_to_jiffies() tools: hv: change permissions of NetworkManager configuration file x86/hyperv: Fix hv tsc page based sched_clock for hibernation tools: hv: Fix a complier warning in the fcopy uio daemon
2024-12-18x86/static-call: fix 32-bit buildJuergen Gross
In 32-bit x86 builds CONFIG_STATIC_CALL_INLINE isn't set, leading to static_call_initialized not being available. Define it as "0" in that case. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 0ef8047b737d ("x86/static-call: provide a way to do very early static-call updates") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-18net: Remove bouncing hippi listDr. David Alan Gilbert
linux-hippi is bouncing with: <linux-hippi@sunsite.dk>: Sorry, no mailbox here by that name. (#5.1.1) Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-18net: dsa: qca8k: Fix inconsistent use of jiffies vs millisecondsAndrew Lunn
wait_for_complete_timeout() expects a timeout in jiffies. With the driver, some call sites converted QCA8K_ETHERNET_TIMEOUT to jiffies, others did not. Make the code consistent by changes the #define to include a call to msecs_to_jiffies, and remove all other calls to msecs_to_jiffies. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: from Christian would be very welcome. Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-18pwm: stm32: Fix complementary output in round_waveform_tohw()Fabrice Gasnier
When the timer supports complementary output, the CCxNE bit must be set additionally to the CCxE bit. So to not overwrite the latter use |= instead of = to set the former. Fixes: deaba9cff809 ("pwm: stm32: Implementation of the waveform callbacks") Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Link: https://lore.kernel.org/r/20241217150021.2030213-1-fabrice.gasnier@foss.st.com [ukleinek: Slightly improve commit log] Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2024-12-18Merge patch series "can: m_can: set init flag earlier in probe"Marc Kleine-Budde
This series fixes problems in the m_can_pci driver found on the Intel Elkhart Lake processor. Link: https://patch.msgid.link/e247f331cb72829fcbdfda74f31a59cbad1a6006.1728288535.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-12-18can: m_can: fix missed interrupts with m_can_pciMatthias Schiffer
The interrupt line of PCI devices is interpreted as edge-triggered, however the interrupt signal of the m_can controller integrated in Intel Elkhart Lake CPUs appears to be generated level-triggered. Consider the following sequence of events: - IR register is read, interrupt X is set - A new interrupt Y is triggered in the m_can controller - IR register is written to acknowledge interrupt X. Y remains set in IR As at no point in this sequence no interrupt flag is set in IR, the m_can interrupt line will never become deasserted, and no edge will ever be observed to trigger another run of the ISR. This was observed to result in the TX queue of the EHL m_can to get stuck under high load, because frames were queued to the hardware in m_can_start_xmit(), but m_can_finish_tx() was never run to account for their successful transmission. On an Elkhart Lake based board with the two CAN interfaces connected to each other, the following script can reproduce the issue: ip link set can0 up type can bitrate 1000000 ip link set can1 up type can bitrate 1000000 cangen can0 -g 2 -I 000 -L 8 & cangen can0 -g 2 -I 001 -L 8 & cangen can0 -g 2 -I 002 -L 8 & cangen can0 -g 2 -I 003 -L 8 & cangen can0 -g 2 -I 004 -L 8 & cangen can0 -g 2 -I 005 -L 8 & cangen can0 -g 2 -I 006 -L 8 & cangen can0 -g 2 -I 007 -L 8 & cangen can1 -g 2 -I 100 -L 8 & cangen can1 -g 2 -I 101 -L 8 & cangen can1 -g 2 -I 102 -L 8 & cangen can1 -g 2 -I 103 -L 8 & cangen can1 -g 2 -I 104 -L 8 & cangen can1 -g 2 -I 105 -L 8 & cangen can1 -g 2 -I 106 -L 8 & cangen can1 -g 2 -I 107 -L 8 & stress-ng --matrix 0 & To fix the issue, repeatedly read and acknowledge interrupts at the start of the ISR until no interrupt flags are set, so the next incoming interrupt will also result in an edge on the interrupt line. While we have received a report that even with this patch, the TX queue can become stuck under certain (currently unknown) circumstances on the Elkhart Lake, this patch completely fixes the issue with the above reproducer, and it is unclear whether the remaining issue has a similar cause at all. Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake") Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com> Link: https://patch.msgid.link/fdf0439c51bcb3a46c21e9fb21c7f1d06363be84.1728288535.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-12-18can: m_can: set init flag earlier in probeMatthias Schiffer
While an m_can controller usually already has the init flag from a hardware reset, no such reset happens on the integrated m_can_pci of the Intel Elkhart Lake. If the CAN controller is found in an active state, m_can_dev_setup() would fail because m_can_niso_supported() calls m_can_cccr_update_bits(), which refuses to modify any other configuration bits when CCCR_INIT is not set. To avoid this issue, set CCCR_INIT before attempting to modify any other configuration flags. Fixes: cd5a46ce6fa6 ("can: m_can: don't enable transceiver when probing") Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com> Link: https://patch.msgid.link/e247f331cb72829fcbdfda74f31a59cbad1a6006.1728288535.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-12-17Merge branch 'support-some-features-for-the-hibmcge-driver'Jakub Kicinski
Jijie Shao says: ==================== Support some features for the HIBMCGE driver In this patch series, The HIBMCGE driver implements some functions such as dump register, unicast MAC address filtering, debugfs and reset. ==================== Link: https://patch.msgid.link/20241216040532.1566229-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: hibmcge: Add nway_reset supported in this moduleJijie Shao
Add nway_reset supported in this module Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241216040532.1566229-8-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: hibmcge: Add reset supported in this moduleJijie Shao
Sometimes, if the port doesn't work, we can try to fix it by resetting it. This patch supports reset triggered by ethtool or FLR of PCIe, For example: ethtool --reset eth0 dedicated echo 1 > /sys/bus/pci/devices/0000\:83\:00.1/reset We hope that the reset can be performed only when the port is down, and the port cannot be up during the reset. Therefore, the entire reset process is protected by the rtnl lock. After the reset is complete, the hardware registers are restored to their default values. Therefore, some rebuild operations are required to rewrite the user configuration to the registers. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241216040532.1566229-7-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: hibmcge: Add pauseparam supported in this moduleJijie Shao
The MAC can automatically send or respond to pause frames. This patch supports the function of enabling pause frames by using ethtool. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241216040532.1566229-6-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: hibmcge: Add register dump supported in this moduleJijie Shao
The dump register is an effective way to analyze problems. To ensure code flexibility, each register contains the type, offset, and value information. The ethtool does the pretty print based on these information. The driver can dynamically add or delete registers that need to be dumped in the future because information such as type and offset is contained. ethtool always can do pretty print. With the ethtool of a specific version, the following effects are achieved: [root@localhost sjj]# ./ethtool -d enp131s0f1 [SPEC] VALID [0x0000]: 0x00000001 [SPEC] EVENT_REQ [0x0004]: 0x00000000 [SPEC] MAC_ID [0x0008]: 0x00000002 [SPEC] PHY_ADDR [0x000c]: 0x00000002 [SPEC] MAC_ADDR_L [0x0010]: 0x00000808 [SPEC] MAC_ADDR_H [0x0014]: 0x08080802 [SPEC] UC_MAX_NUM [0x0018]: 0x00000004 [SPEC] MAX_MTU [0x0028]: 0x00000fc2 [SPEC] MIN_MTU [0x002c]: 0x00000100 [SPEC] TX_FIFO_NUM [0x0030]: 0x00000040 [SPEC] RX_FIFO_NUM [0x0034]: 0x0000007f [SPEC] VLAN_LAYERS [0x0038]: 0x00000002 [MDIO] COMMAND_REG [0x0000]: 0x0000185f [MDIO] ADDR_REG [0x0004]: 0x00000000 [MDIO] WDATA_REG [0x0008]: 0x0000a000 [MDIO] RDATA_REG [0x000c]: 0x00000000 [MDIO] STA_REG [0x0010]: 0x00000000 [GMAC] DUPLEX_TYPE [0x0008]: 0x00000001 [GMAC] FD_FC_TYPE [0x000c]: 0x00008808 [GMAC] FC_TX_TIMER [0x001c]: 0x000000ff [GMAC] FD_FC_ADDR_LOW [0x0020]: 0xc2000001 [GMAC] FD_FC_ADDR_HIGH [0x0024]: 0x00000180 [GMAC] MAX_FRM_SIZE [0x003c]: 0x000005f6 [GMAC] PORT_MODE [0x0040]: 0x00000002 [GMAC] PORT_EN [0x0044]: 0x00000006 ... Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241216040532.1566229-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: hibmcge: Add unicast frame filter supported in this moduleJijie Shao
MAC supports filtering unmatched unicast packets according to the MAC address table. This patch adds the support for unicast frame filtering. To support automatic restoration of MAC entries after reset, the driver saves a copy of MAC entries in the driver. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20241216040532.1566229-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: hibmcge: Add irq_info file to debugfsJijie Shao
the driver requested three interrupts: "tx", "rx", "err". The err interrupt is a summary interrupt. We distinguish different errors based on the status register and mask. With "cat /proc/interrupts | grep hibmcge", we can't distinguish the detailed cause of the error, so we added this file to debugfs. the following effects are achieved: [root@localhost sjj]# cat /sys/kernel/debug/hibmcge/0000\:83\:00.1/irq_info RX : enabled: true , logged: false, count: 0 TX : enabled: true , logged: false, count: 0 MAC_MII_FIFO_ERR : enabled: false, logged: true , count: 0 MAC_PCS_RX_FIFO_ERR : enabled: false, logged: true , count: 0 MAC_PCS_TX_FIFO_ERR : enabled: false, logged: true , count: 0 MAC_APP_RX_FIFO_ERR : enabled: false, logged: true , count: 0 MAC_APP_TX_FIFO_ERR : enabled: false, logged: true , count: 0 SRAM_PARITY_ERR : enabled: true , logged: true , count: 0 TX_AHB_ERR : enabled: true , logged: true , count: 0 RX_BUF_AVL : enabled: true , logged: false, count: 0 REL_BUF_ERR : enabled: true , logged: true , count: 0 TXCFG_AVL : enabled: true , logged: false, count: 0 TX_DROP : enabled: true , logged: false, count: 0 RX_DROP : enabled: true , logged: false, count: 0 RX_AHB_ERR : enabled: true , logged: true , count: 0 MAC_FIFO_ERR : enabled: true , logged: false, count: 0 RBREQ_ERR : enabled: true , logged: false, count: 0 WE_ERR : enabled: true , logged: false, count: 0 The irq framework of hibmcge driver also includes tx/rx interrupts. Therefore, TX and RX are not moved separately form this file. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241216040532.1566229-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: hibmcge: Add debugfs supported in this moduleJijie Shao
This patch initializes debugfs and creates root directory for each device. The tx_ring and rx_ring debugfs files are implemented together. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241216040532.1566229-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17Merge branch 'lan78xx-preparations-for-phylink'Jakub Kicinski
Oleksij Rempel says: ==================== lan78xx: Preparations for PHYlink This patch set is a third part of the preparatory work for migrating the lan78xx USB Ethernet driver to the PHYlink framework. During extensive testing, I observed that resetting the USB adapter can lead to various read/write errors. While the errors themselves are acceptable, they generate excessive log messages, resulting in significant log spam. This set improves error handling to reduce logging noise by addressing errors directly and returning early when necessary. ==================== Link: https://patch.msgid.link/20241216120941.1690908-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: usb: lan78xx: Improve error handling in WoL operationsOleksij Rempel
Enhance error handling in Wake-on-LAN (WoL) operations: - Log a warning in `lan78xx_get_wol` if `lan78xx_read_reg` fails. - Check and handle errors from `device_set_wakeup_enable` and `phy_ethtool_set_wol` in `lan78xx_set_wol`. - Ensure proper cleanup with a unified error handling path. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241216120941.1690908-7-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: usb: lan78xx: remove PHY register access from ethtool get_regsOleksij Rempel
Remove PHY register handling from `lan78xx_get_regs` and `lan78xx_get_regs_len`. Since the controller can have different PHYs attached, the first 32 registers are not universally relevant or the most interesting. Simplify the implementation to focus on MAC and device registers. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241216120941.1690908-6-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: usb: lan78xx: rename phy_mutex to mdiobus_mutexOleksij Rempel
Rename `phy_mutex` to `mdiobus_mutex` for clarity, as the mutex protects MDIO bus access rather than PHY-specific operations. Update all references to ensure consistency. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241216120941.1690908-5-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: usb: lan78xx: Use action-specific label in lan78xx_mac_resetOleksij Rempel
Rename the generic `done` label to the action-specific `exit_unlock` label in `lan78xx_mac_reset`. This improves clarity by indicating the specific cleanup action (mutex unlock) and aligns with best practices for error handling and cleanup labels. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20241216120941.1690908-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: usb: lan78xx: Use ETIMEDOUT instead of ETIME in lan78xx_stop_hwOleksij Rempel
Update lan78xx_stop_hw to return -ETIMEDOUT instead of -ETIME when a timeout occurs. While -ETIME indicates a general timer expiration, -ETIMEDOUT is more commonly used for signaling operation timeouts and provides better consistency with standard error handling in the driver. The -ETIME checks in tx_complete() and rx_complete() are unrelated to this error handling change. In these functions, the error values are derived from urb->status, which reflects USB transfer errors. The error value from lan78xx_stop_hw will be exposed in the following cases: - usb_driver::suspend - net_device_ops::ndo_stop (potentially, though currently the return value is not used). Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20241216120941.1690908-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: usb: lan78xx: Add error handling to lan78xx_get_regsOleksij Rempel
Update `lan78xx_get_regs` to handle errors during register and PHY reads. Log warnings for failed reads and exit the function early if an error occurs. Drop all previously logged registers to signal inconsistent readings to the user space. This ensures that invalid data is not returned to users. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241216120941.1690908-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17niu: Use page->private instead of page->indexMatthew Wilcox (Oracle)
We are close to removing page->index. Use page->private instead, which is least likely to be removed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://patch.msgid.link/20241216155124.3114-1-willy@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17mlxsw: Switch to napi_gro_receive()Ido Schimmel
Benefit from the recent conversion of the driver to NAPI and enable GRO support through the use of napi_gro_receive(). Pass the NAPI pointer from the bus driver (mlxsw_pci) to the switch driver (mlxsw_spectrum) through the skb control block where various packet metadata is already encoded. The main motivation is to improve forwarding performance through the use of GRO fraglist [1]. In my testing, when the forwarding data path is simple (routing between two ports) there is not much difference in forwarding performance between GRO disabled and GRO enabled with fraglist. The improvement becomes more noticeable as the data path becomes more complex since it is traversed less times with GRO enabled. For example, with 10 ingress and 10 egress flower filters with different priorities on the two ports between which routing is performed, there is an improvement of about 140% in forwarded bandwidth. [1] https://lore.kernel.org/netdev/20200125102645.4782-1-steffen.klassert@secunet.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/21258fe55f608ccf1ee2783a5a4534220af28903.1734354812.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17Merge branch 'inetpeer-reduce-false-sharing-and-atomic-operations'Jakub Kicinski
Eric Dumazet says: ==================== inetpeer: reduce false sharing and atomic operations After commit 8c2bd38b95f7 ("icmp: change the order of rate limits"), there is a risk that a host receiving packets from an unique source targeting closed ports is using a common inet_peer structure from many cpus. All these cpus have to acquire/release a refcount and update the inet_peer timestamp (p->dtime) Switch to pure RCU to avoid changing the refcount, and update p->dtime only once per jiffy. Tested: DUT : 128 cores, 32 hw rx queues. receiving 8,400,000 UDP packets per second, targeting closed ports. Before the series: - napi poll can not keep up, NIC drops 1,200,000 packets per second. - We use 20 % of cpu cycles After this series: - All packets are received (no more hw drops) - We use 12 % of cpu cycles. v1: https://lore.kernel.org/20241213130212.1783302-1-edumazet@google.com ==================== Link: https://patch.msgid.link/20241215175629.1248773-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17inetpeer: do not get a refcount in inet_getpeer()Eric Dumazet
All inet_getpeer() callers except ip4_frag_init() don't need to acquire a permanent refcount on the inetpeer. They can switch to full RCU protection. Move the refcount_inc_not_zero() into ip4_frag_init(), so that all the other callers no longer have to perform a pair of expensive atomic operations on a possibly contended cache line. inet_putpeer() no longer needs to be exported. After this patch, my DUT can receive 8,400,000 UDP packets per second targeting closed ports, using 50% less cpu cycles than before. Also change two calls to l3mdev_master_ifindex() by l3mdev_master_ifindex_rcu() (Ido ideas) Fixes: 8c2bd38b95f7 ("icmp: change the order of rate limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241215175629.1248773-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17inetpeer: update inetpeer timestamp in inet_getpeer()Eric Dumazet
inet_putpeer() will be removed in the following patch, because we will no longer use refcounts. Update inetpeer timestamp (p->dtime) at lookup time. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241215175629.1248773-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17inetpeer: remove create argument of inet_getpeer()Eric Dumazet
All callers of inet_getpeer() want to create an inetpeer. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241215175629.1248773-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17inetpeer: remove create argument of inet_getpeer_v[46]()Eric Dumazet
All callers of inet_getpeer_v4() and inet_getpeer_v6() want to create an inetpeer. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241215175629.1248773-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17Merge branch 'net-constify-struct-bin_attribute'Jakub Kicinski
Thomas Weißschuh says: ==================== net: constify 'struct bin_attribute' The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. ==================== Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-0-ec460b91f274@weissschuh.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17netxen_nic: constify 'struct bin_attribute'Thomas Weißschuh
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-4-ec460b91f274@weissschuh.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: phy: ks8995: constify 'struct bin_attribute'Thomas Weißschuh
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-2-ec460b91f274@weissschuh.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: bridge: constify 'struct bin_attribute'Thomas Weißschuh
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-1-ec460b91f274@weissschuh.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().Kuniyuki Iwashima
Xiao Liang reported that the cited commit changed netns handling in newlink() of netkit, veth, and vxcan. Before the patch, if we don't find a netns attribute in the peer device attributes, we tried to find another netns attribute in the outer netlink attributes by passing it to rtnl_link_get_net(). Let's restore the original behaviour. Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()") Reported-by: Xiao Liang <shaw.leon@gmail.com> Closes: https://lore.kernel.org/netdev/CABAhCORBVVU8P6AHcEkENMj+gD2d3ce9t=A_o48E0yOQp8_wUQ@mail.gmail.com/#t Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Xiao Liang <shaw.leon@gmail.com> Link: https://patch.msgid.link/20241216110432.51488-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: netdevsim: fix nsim_pp_hold_write()Eric Dumazet
nsim_pp_hold_write() has two problems: 1) It may return with rtnl held, as found by syzbot. 2) Its return value does not propagate an error if any. Fixes: 1580cbcbfe77 ("net: netdevsim: add some fake page pool use") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241216083703.1859921-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17net: page_pool: rename page_pool_is_last_ref()Jakub Kicinski
page_pool_is_last_ref() releases a reference while the name, to me at least, suggests it just checks if the refcount is 1. The semantics of the function are the same as those of atomic_dec_and_test() and refcount_dec_and_test(), so just use the _and_test() suffix. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20241215212938.99210-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17hexagon: Disable constant extender optimization for LLVM prior to 19.1.0Nathan Chancellor
The Hexagon-specific constant extender optimization in LLVM may crash on Linux kernel code [1], such as fs/bcache/btree_io.c after commit 32ed4a620c54 ("bcachefs: Btree path tracepoints") in 6.12: clang: llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp:745: bool (anonymous namespace)::HexagonConstExtenders::ExtRoot::operator<(const HCE::ExtRoot &) const: Assertion `ThisB->getParent() == OtherB->getParent()' failed. Stack dump: 0. Program arguments: clang --target=hexagon-linux-musl ... fs/bcachefs/btree_io.c 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'fs/bcachefs/btree_io.c'. 4. Running pass 'Hexagon constant-extender optimization' on function '@__btree_node_lock_nopath' Without assertions enabled, there is just a hang during compilation. This has been resolved in LLVM main (20.0.0) [2] and backported to LLVM 19.1.0 but the kernel supports LLVM 13.0.1 and newer, so disable the constant expander optimization using the '-mllvm' option when using a toolchain that is not fixed. Cc: stable@vger.kernel.org Link: https://github.com/llvm/llvm-project/issues/99714 [1] Link: https://github.com/llvm/llvm-project/commit/68df06a0b2998765cb0a41353fcf0919bbf57ddb [2] Link: https://github.com/llvm/llvm-project/commit/2ab8d93061581edad3501561722ebd5632d73892 [3] Reviewed-by: Brian Cain <bcain@quicinc.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-17idpf: trigger SW interrupt when exiting wb_on_itr modeJoshua Hay
There is a race condition between exiting wb_on_itr and completion write backs. For example, we are in wb_on_itr mode and a Tx completion is generated by HW, ready to be written back, as we are re-enabling interrupts: HW SW | | | | idpf_tx_splitq_clean_all | | napi_complete_done | | | tx_completion_wb | idpf_vport_intr_update_itr_ena_irq That tx_completion_wb happens before the vector is fully re-enabled. Continuing with this example, it is a UDP stream and the tx_completion_wb is the last one in the flow (there are no rx packets). Because the HW generated the completion before the interrupt is fully enabled, the HW will not fire the interrupt once the timer expires and the write back will not happen. NAPI poll won't be called. We have indicated we're back in interrupt mode but nothing else will trigger the interrupt. Therefore, the completion goes unprocessed, triggering a Tx timeout. To mitigate this, fire a SW triggered interrupt upon exiting wb_on_itr. This interrupt will catch the rogue completion and avoid the timeout. Add logic to set the appropriate bits in the vector's dyn_ctl register. Fixes: 9c4a27da0ecc ("idpf: enable WB_ON_ITR") Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-17idpf: add support for SW triggered interruptsJoshua Hay
SW triggered interrupts are guaranteed to fire after their timer expires, unlike Tx and Rx interrupts which will only fire after the timer expires _and_ a descriptor write back is available to be processed by the driver. Add the necessary fields, defines, and initializations to enable a SW triggered interrupt in the vector's dyn_ctl register. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-17ice: Add MDD logging via devlink healthBen Shelton
Add a devlink health reporter for MDD events. The 'dump' handler will return the information captured in each call to ice_handle_mdd_event(). A device reset (CORER/PFR) will put the reporter back in healthy state. Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Co-developed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-17ice: add Tx hang devlink health reporterPrzemek Kitszel
Add Tx hang devlink health reporter, see struct ice_tx_hang_event to see what exactly is reported. For now dump descriptors with little metadata and skb diagnostic information. Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-17btrfs: tree-checker: reject inline extent items with 0 ref countQu Wenruo
[BUG] There is a bug report in the mailing list where btrfs_run_delayed_refs() failed to drop the ref count for logical 25870311358464 num_bytes 2113536. The involved leaf dump looks like this: item 166 key (25870311358464 168 2113536) itemoff 10091 itemsize 50 extent refs 1 gen 84178 flags 1 ref#0: shared data backref parent 32399126528000 count 0 <<< ref#1: shared data backref parent 31808973717504 count 1 Notice the count number is 0. [CAUSE] There is no concrete evidence yet, but considering 0 -> 1 is also a single bit flipped, it's possible that hardware memory bitflip is involved, causing the on-disk extent tree to be corrupted. [FIX] To prevent us reading such corrupted extent item, or writing such damaged extent item back to disk, enhance the handling of BTRFS_EXTENT_DATA_REF_KEY and BTRFS_SHARED_DATA_REF_KEY keys for both inlined and key items, to detect such 0 ref count and reject them. CC: stable@vger.kernel.org # 5.4+ Link: https://lore.kernel.org/linux-btrfs/7c69dd49-c346-4806-86e7-e6f863a66f48@app.fastmail.com/ Reported-by: Frankie Fisher <frankie@terrorise.me.uk> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-12-17btrfs: split bios to the fs sector size boundaryChristoph Hellwig
Btrfs like other file systems can't really deal with I/O not aligned to it's internal block size (which strangely is called sector size in btrfs, for historical reasons), but the block layer split helper doesn't even know about that. Round down the split boundary so that all I/Os are aligned. Fixes: d5e4377d5051 ("btrfs: split zone append bios in btrfs_submit_bio") CC: stable@vger.kernel.org # 6.12 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-12-17btrfs: use bio_is_zone_append() in the completion handlerChristoph Hellwig
Otherwise it won't catch bios turned into regular writes by the block level zone write plugging. The additional test it adds is for emulated zone append. Fixes: 9b1ce7f0c6f8 ("block: Implement zone append emulation") CC: stable@vger.kernel.org # 6.12 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-12-17btrfs: fix improper generation check in snapshot deleteJosef Bacik
We have been using the following check if (generation <= root->root_key.offset) to make decisions about whether or not to visit a node during snapshot delete. This is because for normal subvolumes this is set to 0, and for snapshots it's set to the creation generation. The idea being that if the generation of the node is less than or equal to our creation generation then we don't need to visit that node, because it doesn't belong to us, we can simply drop our reference and move on. However reloc roots don't have their generation stored in root->root_key.offset, instead that is the objectid of their corresponding fs root. This means we can incorrectly not walk into nodes that need to be dropped when deleting a reloc root. There are a variety of consequences to making the wrong choice in two distinct areas. visit_node_for_delete() 1. False positive. We think we are newer than the block when we really aren't. We don't visit the node and drop our reference to the node and carry on. This would result in leaked space. 2. False negative. We do decide to walk down into a block that we should have just dropped our reference to. However this means that the child node will have refs > 1, so we will switch to UPDATE_BACKREF, and then the subsequent walk_down_proc() will notice that btrfs_header_owner(node) != root->root_key.objectid and it'll break out of the loop, and then walk_up_proc() will drop our reference, so this appears to be ok. do_walk_down() 1. False positive. We are in UPDATE_BACKREF and incorrectly decide that we are done and don't need to update the backref for our lower nodes. This is another case that simply won't happen with relocation, as we only have to do UPDATE_BACKREF if the node below us was shared and didn't have FULL_BACKREF set, and since we don't own that node because we're a reloc root we actually won't end up in this case. 2. False negative. Again this is tricky because as described above, we simply wouldn't be here from relocation, because we don't own any of the nodes because we never set btrfs_header_owner() to the reloc root objectid, and we always use FULL_BACKREF, we never actually need to set FULL_BACKREF on any children. Having spent a lot of time stressing relocation/snapshot delete recently I've not seen this pop in practice. But this is objectively incorrect, so fix this to get the correct starting generation based on the root we're dropping to keep me from thinking there's a problem here. CC: stable@vger.kernel.org Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-12-17ice: rename devlink_port.[ch] to port.[ch]Przemek Kitszel
Drop "devlink_" prefix from files that sit in devlink/. I'm going to add more files there, and repeating "devlink" does not feel good. This is also the scheme used in most other places, most notably the devlink core files are named like that. devlink.[ch] stays as is. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-17devlink: add devlink_fmsg_dump_skb() functionMateusz Polchlopek
Add devlink_fmsg_dump_skb() function that adds some diagnostic information about skb (like length, pkt type, MAC, etc) to devlink fmsg mechanism using bunch of devlink_fmsg_put() function calls. Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-17devlink: add devlink_fmsg_put() macroPrzemek Kitszel
Add devlink_fmsg_put() that dispatches based on the type of the value to put, example: bool -> devlink_fmsg_bool_pair_put(). Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-12-17checkpatch: don't complain on _Generic() usePrzemek Kitszel
Improve CamelCase recognition logic to avoid reporting on _Generic() use. Other C keywords, such as _Bool, are intentionally omitted, as those should be rather avoided in new source code. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>