summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-18net: phy: marvell-88q2xxx: align definesDimitri Fedrau
Align some defines. Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18Merge branch 'vxlan-join-leave-mc-group-when-reconfigured'Paolo Abeni
Petr Machata says: ==================== vxlan: Join / leave MC group when reconfigured When a vxlan netdevice is brought up, if its default remote is a multicast address, the device joins the indicated group. Therefore when the multicast remote address changes, the device should leave the current group and subscribe to the new one. Similarly when the interface used for endpoint communication is changed in a situation when multicast remote is configured. This is currently not done. Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is possible that with such fix, the netdevice will end up in an inconsistent situation where the old group is not joined anymore, but joining the new group fails. Should we join the new group first, and leave the old one second, we might end up in the opposite situation, where both groups are joined. Undoing any of this during rollback is going to be similarly problematic. One solution would be to just forbid the change when the netdevice is up. However in vnifilter mode, changing the group address is allowed, and these problems are simply ignored (see vxlan_vni_update_group()): # ip link add name br up type bridge vlan_filtering 1 # ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789 # bridge vni add dev vx1 vni 200 group 224.0.0.1 # tcpdump -i lo & # bridge vni add dev vx1 vni 200 group 224.0.0.2 18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s) 18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s) # bridge vni dev vni group/remote vx1 200 224.0.0.2 Having two different modes of operation for conceptually the same interface is silly, so in this patchset, just do what the vnifilter code does and deal with the errors by crossing fingers real hard. ==================== Link: https://patch.msgid.link/cover.1739548836.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18selftests: test_vxlan_fdb_changelink: Add a test for MC remote changePetr Machata
Changes to MC remote need to be reflected in actual group memberships. Add a test to verify that it is the case. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18selftests: test_vxlan_fdb_changelink: Convert to lib.shPetr Machata
Instead of inlining equivalents, use lib.sh-provided primitives. Use defer to manage vx lifetime. This will make it easier to extend the test in the next patch. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18selftests: forwarding: lib: Move require_command to net, generalizePetr Machata
This helper could be useful to more than just forwarding tests. Move it upstairs and port over to log_test_skip(). Split the function into two parts: the bit that actually checks and reports skip, which is in a new function check_command(). And a bit that exits the test script if the check fails. This allows users consistent checking behavior while giving an option to bail out from a single test without bailing out of the whole script. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18vxlan: Join / leave MC group after remote changesPetr Machata
When a vxlan netdevice is brought up, if its default remote is a multicast address, the device joins the indicated group. Therefore when the multicast remote address changes, the device should leave the current group and subscribe to the new one. Similarly when the interface used for endpoint communication is changed in a situation when multicast remote is configured. This is currently not done. Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is possible that with such fix, the netdevice will end up in an inconsistent situation where the old group is not joined anymore, but joining the new group fails. Should we join the new group first, and leave the old one second, we might end up in the opposite situation, where both groups are joined. Undoing any of this during rollback is going to be similarly problematic. One solution would be to just forbid the change when the netdevice is up. However in vnifilter mode, changing the group address is allowed, and these problems are simply ignored (see vxlan_vni_update_group()): # ip link add name br up type bridge vlan_filtering 1 # ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789 # bridge vni add dev vx1 vni 200 group 224.0.0.1 # tcpdump -i lo & # bridge vni add dev vx1 vni 200 group 224.0.0.2 18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s) 18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s) # bridge vni dev vni group/remote vx1 200 224.0.0.2 Having two different modes of operation for conceptually the same interface is silly, so in this patch, just do what the vnifilter code does and deal with the errors by crossing fingers real hard. The vnifilter code leaves old before joining new, and in case of join / leave failures does not roll back the configuration changes that have already been applied, but bails out of joining if it could not leave. Do the same here: leave before join, apply changes unconditionally and do not attempt to join if we couldn't leave. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18vxlan: Drop 'changelink' parameter from vxlan_dev_configure()Petr Machata
vxlan_dev_configure() only has a single caller that passes false for the changelink parameter. Drop the parameter and inline the sole value. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18page_pool: avoid infinite loop to schedule delayed workerJason Xing
We noticed the kworker in page_pool_release_retry() was waken up repeatedly and infinitely in production because of the buggy driver causing the inflight less than 0 and warning us in page_pool_inflight()[1]. Since the inflight value goes negative, it means we should not expect the whole page_pool to get back to work normally. This patch mitigates the adverse effect by not rescheduling the kworker when detecting the inflight negative in page_pool_release_retry(). [1] [Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------ [Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages ... [Mon Feb 10 20:36:11 2025] Call Trace: [Mon Feb 10 20:36:11 2025] page_pool_release_retry+0x23/0x70 [Mon Feb 10 20:36:11 2025] process_one_work+0x1b1/0x370 [Mon Feb 10 20:36:11 2025] worker_thread+0x37/0x3a0 [Mon Feb 10 20:36:11 2025] kthread+0x11a/0x140 [Mon Feb 10 20:36:11 2025] ? process_one_work+0x370/0x370 [Mon Feb 10 20:36:11 2025] ? __kthread_cancel_work+0x40/0x40 [Mon Feb 10 20:36:11 2025] ret_from_fork+0x35/0x40 [Mon Feb 10 20:36:11 2025] ---[ end trace ebffe800f33e7e34 ]--- Note: before this patch, the above calltrace would flood the dmesg due to repeated reschedule of release_dw kworker. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250214064250.85987-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18Merge branch 'add-af_xdp-support-for-cn10k'Paolo Abeni
Suman Ghosh says: ==================== Add af_xdp support for cn10k This patchset includes changes to support AF_XDP for cn10k chipsets. Both non-zero copy and zero copy will be supported after these changes. Also, the RSS will be reconfigured once a particular receive queue is added/removed to/from AF_XDP support. Patch #1: octeontx2-pf: use xdp_return_frame() to free xdp buffers Patch #2: octeontx2-pf: Add AF_XDP non-zero copy support Patch #3: octeontx2-pf: AF_XDP zero copy receive support Patch #4: octeontx2-pf: Reconfigure RSS table after enabling AF_XDP zerocopy on rx queue Patch #5: octeontx2-pf: Prepare for AF_XDP transmit Patch #6: octeontx2-pf: AF_XDP zero copy transmit support ==================== Link: https://patch.msgid.link/20250213053141.2833254-1-sumang@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18octeontx2-pf: AF_XDP zero copy transmit supportSuman Ghosh
This patch implements below changes, 1. To avoid concurrency with normal traffic uses XDP queues. 2. Since there are chances that XDP and AF_XDP can fall under same queue uses separate flags to handle dma buffers. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18octeontx2-pf: Prepare for AF_XDPSuman Ghosh
Implement necessary APIs required for AF_XDP transmit. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18octeontx2-pf: Reconfigure RSS table after enabling AF_XDP zerocopy on rx queueSuman Ghosh
RSS table needs to be reconfigured once a rx queue is enabled or disabled for AF_XDP zerocopy support. After enabling UMEM on a rx queue, that queue should not be part of RSS queue selection algorithm. Similarly the queue should be considered again after UMEM is disabled. Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18octeontx2-pf: AF_XDP zero copy receive supportSuman Ghosh
This patch adds support to AF_XDP zero copy for CN10K. This patch specifically adds receive side support. In this approach once a xdp program with zero copy support on a specific rx queue is enabled, then that receive quse is disabled/detached from the existing kernel queue and re-assigned to the umem memory. Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18octeontx2-pf: Add AF_XDP non-zero copy supportSuman Ghosh
Set xdp rx ring memory type as MEM_TYPE_PAGE_POOL for af-xdp to work. This is needed since xdp_return_frame internally will use page pools. Fixes: 06059a1a9a4a ("octeontx2-pf: Add XDP support to netdev PF") Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18octeontx2-pf: use xdp_return_frame() to free xdp buffersSuman Ghosh
xdp_return_frames() will help to free the xdp frames and their associated pages back to page pool. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-17Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice, iavf: Add support for Rx timestamping Mateusz Polchlopek says: Initially, during VF creation it registers the PTP clock in the system and negotiates with PF it's capabilities. In the meantime the PF enables the Flexible Descriptor for VF. Only this type of descriptor allows to receive Rx timestamps. Enabling virtual clock would be possible, though it would probably perform poorly due to the lack of direct time access. Enable timestamping should be done using userspace tools, e.g. hwstamp_ctl -i $VF -r 14 In order to report the timestamps to userspace, the VF extends timestamp to 40b. To support this feature the flexible descriptors and PTP part in iavf driver have been introduced. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: add support for Rx timestamps to hotpath iavf: handle set and get timestamps ops iavf: Implement checking DD desc field iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors iavf: define Rx descriptors as qwords libeth: move idpf_rx_csum_decoded and idpf_rx_extracted iavf: periodically cache PHC time iavf: add support for indirect access to PHC time iavf: add initial framework for registering PTP clock iavf: negotiate PTP capabilities iavf: add support for negotiating flexible RXDID format virtchnl: add enumeration for the rxdid format ice: support Rx timestamp on flex descriptor virtchnl: add support for enabling PTP on iAVF ==================== Link: https://patch.msgid.link/20250214192739.1175740-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17eth: fbnic: support TCP segmentation offloadJakub Kicinski
Add TSO support to the driver. Device can handle unencapsulated or IPv6-in-IPv6 packets. Any other tunnel stacks are handled with GSO partial. Validate that the packet can be offloaded in ndo_features_check. Main thing we need to check for is that the header geometry can be expressed in the decriptor fields (offsets aren't too large). Report number of TSO super-packets via the qstat API. Link: https://patch.msgid.link/20250216174109.2808351-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17netdev: clarify GSO vs csum in qstatsJakub Kicinski
Could be just me, but I had to pause and double check that the Tx csum counter in qstat should include GSO'd packets. GSO pretty much implies csum so one could possibly interpret the csum counter as pure csum offload. But the counters are based on virtio: [tx_needs_csum] The number of packets which require checksum calculation by the device. [rx_needs_csum] The number of packets with VIRTIO_NET_HDR_F_NEEDS_CSUM. and VIRTIO_NET_HDR_F_NEEDS_CSUM gets set on GSO packets virtio sends. Clarify this in the spec to avoid any confusion. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250214224601.2271201-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: move stale comment about ntuple validationJakub Kicinski
Gal points out that the comment now belongs further down, since the original if condition was split into two in commit de7f7582dff2 ("net: ethtool: prevent flow steering to RSS contexts which don't exist") Link: https://lore.kernel.org/de4a2a8a-1eb9-4fa8-af87-7526e58218e9@nvidia.com Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250214224340.2268691-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17Merge branch 'netdev-genl-add-an-xsk-attribute-to-queues'Jakub Kicinski
Joe Damato says: ==================== netdev-genl: Add an xsk attribute to queues This is an attempt to followup on something Jakub asked me about [1], adding an xsk attribute to queues and more clearly documenting which queues are linked to NAPIs... After the RFC [2], Jakub suggested creating an empty nest for queues which have a pool, so I've adjusted this version to work that way. The nest can be extended in the future to express attributes about XSK as needed. Queues which are not used for AF_XDP do not have the xsk attribute present. I've run the included test on: - my mlx5 machine (via NETIF=) - without setting NETIF And the test seems to pass in both cases. [1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/ [2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/ ==================== Link: https://patch.msgid.link/20250214211255.14194-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests: drv-net: Test queue xsk attributeJoe Damato
Test that queues which are used for AF_XDP have the xsk nest attribute. The attribute is currently empty, but its existence means the AF_XDP is being used for the queue. Enable CONFIG_XDP_SOCKETS for selftests/drivers/net tests, as well. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-4-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17netdev-genl: Add an XSK attribute to queuesJoe Damato
Expose a new per-queue nest attribute, xsk, which will be present for queues that are being used for AF_XDP. If the queue is not being used for AF_XDP, the nest will not be present. In the future, this attribute can be extended to include more data about XSK as it is needed. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17netlink: Add nla_put_empty_nest helperJoe Damato
Creating empty nests is helpful when the exact attributes to be exposed in the future are not known. Encapsulate the logic in a helper. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITYAnna Emese Nyiri
Introduce tests to verify the correct functionality of the SO_RCVMARK and SO_RCVPRIORITY socket options. Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250214205828.48503-1-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: use napi_id_valid helperStefano Jordhani
In commit 6597e8d35851 ("netdev-genl: Elide napi_id when not present"), napi_id_valid function was added. Use the helper to refactor open-coded checks in the source. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Stefano Jordhani <sjordhani@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> # for iouring Link: https://patch.msgid.link/20250214181801.931-1-sjordhani@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17Merge branch ↵Jakub Kicinski
'net-phy-dp83822-add-support-for-changing-the-transmit-amplitude-voltage' Dimitri Fedrau via says: ==================== net: phy: dp83822: Add support for changing the transmit amplitude voltage Add support for changing the transmit amplitude voltage in 100BASE-TX mode. Add support for configuration via DT. v4: https://lore.kernel.org/20250211-dp83822-tx-swing-v4-0-1e8ebd71ad54@liebherr.com v3: https://lore.kernel.org/20250204-dp83822-tx-swing-v3-0-9798e96500d9@liebherr.com v2: https://lore.kernel.org/20250120-dp83822-tx-swing-v2-0-07c99dc42627@liebherr.com v1: https://lore.kernel.org/20250113-dp83822-tx-swing-v1-0-7ed5a9d80010@liebherr.com ==================== Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-0-02ca72620599@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: phy: dp83822: Add support for changing the transmit amplitude voltageDimitri Fedrau
Add support for changing the transmit amplitude voltage in 100BASE-TX mode. Modifying it can be necessary to compensate losses on the PCB and connector, so the voltages measured on the RJ45 pins are conforming. Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-3-02ca72620599@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: phy: Add helper for getting tx amplitude gainDimitri Fedrau
Add helper which returns the tx amplitude gain defined in device tree. Modifying it can be necessary to compensate losses on the PCB and connector, so the voltages measured on the RJ45 pins are conforming. Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-2-02ca72620599@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17dt-bindings: net: ethernet-phy: add property tx-amplitude-100base-tx-percentDimitri Fedrau
Add property tx-amplitude-100base-tx-percent in the device tree bindings for configuring the tx amplitude of 100BASE-TX PHYs. Modifying it can be necessary to compensate losses on the PCB and connector, so the voltages measured on the RJ45 pins are conforming. Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-1-02ca72620599@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests: net: fix grammar in reuseaddr_ports_exhausted.c log messagePranav Tyagi
This patch fixes a grammatical error in a test log message in reuseaddr_ports_exhausted.c for better clarity. Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250213152612.4434-1-pranav.tyagi03@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17Merge branch 'mlx5-add-sensor-name-in-temperature-message'Jakub Kicinski
Tariq Toukan says: ==================== mlx5: Add sensor name in temperature message This small series from Shahar adds the sensors names to the temperature event messages, in addition to the existing bitmap indicators. This improves human readability. Series starts with simple refactoring and modifications. The top patch adds the sensors names. ==================== Link: https://patch.msgid.link/20250213094641.226501-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net/mlx5: Add sensor name to temperature event messageShahar Shitrit
Previously, a temperature event message included a bitmap indicating which sensors detect high temperatures. To enhance clarity, we modify the message format to explicitly list the names of the overheating sensors, alongside the sensors bitmap. If HWMON is not configured, the event message remains unchanged. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250213094641.226501-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net/mlx5: Modify LSB bitmask in temperature event to include only the first bitShahar Shitrit
In the sensor_count field of the MTEWE register, bits 1-62 are supported only for unmanaged switches, not for NICs, and bit 63 is reserved for internal use. To prevent confusing output that may include set bits that are not relevant to NIC sensors, we update the bitmask to retain only the first bit, which corresponds to the sensor ASIC. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250213094641.226501-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net/mlx5: Prefix temperature event bitmap with '0x' for clarityShahar Shitrit
Prepend '0x' to the sensor bitmap in the warning message to clearly indicate that the bitmap is in hexadecimal format. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250213094641.226501-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net/mlx5: Apply rate-limiting to high temperature warningShahar Shitrit
Wrap the high temperature warning in a temperature event with a call to net_ratelimit() to prevent flooding the kernel log with repeated warning messages when temperature exceeds the threshold multiple times within a short duration. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250213094641.226501-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17Merge branch 'net-phy-mediatek-add-token-ring-helper-functions'Jakub Kicinski
Sky Huang says: ==================== net: phy: mediatek: Add token-ring helper functions This patchset add token-ring helper functions and moves some macros from mtk-ge.c into mtk-phy-lib.c. v2: https://lore.kernel.org/20250116012159.3816135-2-SkyLake.Huang@mediatek.com ==================== Link: https://patch.msgid.link/20250213080553.921434-1-SkyLake.Huang@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: phy: mediatek: Move some macros to phy-lib for later useSky Huang
Move some macros to phy-lib because MediaTek's 2.5G built-in ethernet PHY will also use them. Signed-off-by: Sky Huang <skylake.huang@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250213080553.921434-6-SkyLake.Huang@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: phy: mediatek: Add token ring clear bit operation supportSky Huang
Similar to __mtk_tr_set_bits() support. Previously in mtk-ge-soc.c, we clear some register bits via token ring, which were also implemented in three __phy_write(). Now we can do the same thing via __mtk_tr_clr_bits() helper. Signed-off-by: Sky Huang <skylake.huang@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250213080553.921434-5-SkyLake.Huang@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: phy: mediatek: Add token ring set bit operation supportSky Huang
Previously in mtk-ge-soc.c, we set some register bits via token ring, which were implemented in three __phy_write(). Now we can do the same thing via __mtk_tr_set_bits() helper. Signed-off-by: Sky Huang <skylake.huang@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250213080553.921434-4-SkyLake.Huang@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: phy: mediatek: Add token ring access helper functions in mtk-phy-libSky Huang
This patch adds TR(token ring) manipulations and adds correct macro names for those magic numbers. TR is a way to access proprietary registers on page 52b5. Use these helper functions so we can see which fields we're going to modify/set/clear. TR functions with __* prefix mean that the operations inside aren't wrapped by page select/restore functions. This patch doesn't really change registers' settings but just enhances readability and maintainability. Signed-off-by: Sky Huang <skylake.huang@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250213080553.921434-3-SkyLake.Huang@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: phy: mediatek: Change to more meaningful macrosSky Huang
Replace magic number with more meaningful macros in mtk-ge.c. Also, move some common macros into mtk-phy-lib.c. Signed-off-by: Sky Huang <skylake.huang@mediatek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250213080553.921434-2-SkyLake.Huang@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17net: xpcs: rearrange register definitionsRussell King (Oracle)
Place register number definitions immediately above their field definitions and order by register number. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1tjblS-00448F-8v@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-15ndisc: ndisc_send_redirect() cleanupEric Dumazet
ndisc_send_redirect() is always called under rcu_read_lock(). It can use dev_net_rcu() and avoid one redundant rcu_read_lock()/rcu_read_unlock() pair. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214140705.2105890-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14Merge branch 'bnxt_en-add-npar-1-2-and-tph-support'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: Add NPAR 1.2 and TPH support The first patch adds NPAR 1.2 support. Patches 2 to 11 add TPH (TLP Processing Hints) support. These TPH driver patches are new revisions originally posted as part of the TPH PCI patch series. Additional driver refactoring has been done so that we can free and allocate RX completion ring and the TX rings if the channel is a combined channel. We also add napi_disable() and napi_enable() during queue_stop() and queue_start() respectively, and reset for error handling in queue_start(). v4: https://lore.kernel.org/20250208202916.1391614-1-michael.chan@broadcom.com v3: https://lore.kernel.org/20250204004609.1107078-1-michael.chan@broadcom.com v2: https://lore.kernel.org/20250116192343.34535-1-michael.chan@broadcom.com v1: https://lore.kernel.org/20250113063927.4017173-1-michael.chan@broadcom.com Discussion about adding napi_disable()/napi_enable(): https://lore.kernel.org/netdev/5336d624-8d8b-40a6-b732-b020e4a119a2@davidwei.uk/#t Previous driver series fixing rtnl_lock and empty release function: https://lore.kernel.org/netdev/20241115200412.1340286-1-wei.huang2@amd.com/ v5 of the PCI series using netdev_rx_queue_restart(): https://lore.kernel.org/netdev/20240916205103.3882081-5-wei.huang2@amd.com/ v1 of the PCI series using open/close: https://lore.kernel.org/netdev/20240509162741.1937586-9-wei.huang2@amd.com/ ==================== Link: https://patch.msgid.link/20250213011240.1640031-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Add TPH support in BNXT driverManoj Panicker
Add TPH support to the Broadcom BNXT device driver. This allows the driver to utilize TPH functions for retrieving and configuring Steering Tags when changing interrupt affinity. With compatible NIC firmware, network traffic will be tagged correctly with Steering Tags, resulting in significant memory bandwidth savings and other advantages as demonstrated by real network benchmarks on TPH-capable platforms. Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Co-developed-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-12-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Extend queue stop/start for TX ringsSomnath Kotur
In order to use queue_stop/queue_start to support the new Steering Tags, we need to free the TX ring and TX completion ring if it is a combined channel with TX/RX sharing the same NAPI. Otherwise TX completions will not have the updated Steering Tag. If TPH is not enabled, we just stop the TX ring without freeing the TX/TX cmpl rings. With that we can now add napi_disable() and napi_enable() during queue_stop()/ queue_start(). This will guarantee that NAPI will stop processing the completion entries in case there are additional pending entries in the completion rings after queue_stop(). There could be some NQEs sitting unprocessed while NAPI is disabled thereby leaving the NQ unarmed. Explicitly re-arm the NQ after napi_enable() in queue start so that NAPI will resume properly. Error handling in bnxt_queue_start() requires a reset. If a TX ring cannot be allocated or initialized properly, it will cause TX timeout. The reset will also free any partially allocated rings. We don't expect to hit this error path because re-allocating previously reserved and allocated rings with the same parameters should never fail. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-11-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Refactor TX ring free logicMichael Chan
Add a new bnxt_hwrm_tx_ring_free() function to handle freeing a HW transmit ring. The new function will also be used in the next patch to free the TX ring in queue_stop. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Reallocate RX completion ring for TPH supportSomnath Kotur
In order to program the correct Steering Tag during an IRQ affinity change, we need to free/re-allocate the RX completion ring during queue_restart. If TPH is enabled, call FW to free the Rx completion ring and clear the ring entries in queue_stop(). Re-allocate it in queue_start() if TPH is enabled. Note that TPH mode is not enabled in this patch and will be enabled later in the patch series. While modifying bnxt_queue_start(), remove the unnecessary zeroing of rxr->rx_next_cons. It gets overwritten by the clone in bnxt_queue_start(). Remove the rx_reset counter increment since restart is not reset. Add comment to clarify that the ring allocations in queue_start should never fail. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG ringsMichael Chan
Newer firmware can use the NQ ring ID associated with each RX/RX AGG ring to enable PCIe Steering Tags on P5_PLUS chips. When allocating RX/RX AGG rings, pass along NQ ring ID for the firmware to use. This information helps optimize DMA writes by directing them to the cache closer to the CPU consuming the data, potentially improving the processing speed. This change is backward-compatible with older firmware, which will simply disregard the information. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Refactor RX/RX AGG ring parameters setup for P5_PLUSMichael Chan
There is some common code for setting up RX and RX AGG ring allocation parameters for P5_PLUS chips. Refactor the logic into a new function. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>