summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-01net: ethtool: Fix the panic caused by dev being null when dumping coalesceHeng Qi
syzbot reported a general protection fault caused by a null pointer dereference in coalesce_fill_reply(). The issue occurs when req_base->dev is null, leading to an invalid memory access. This panic occurs if dumping coalesce when no device name is specified. Fixes: f750dfe825b9 ("ethtool: provide customized dim profile management") Reported-by: syzbot+e77327e34cdc8c36b7d3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e77327e34cdc8c36b7d3 Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue into main Tony nguyen says: ==================== Intel Wired LAN Driver Updates 2024-06-28 (MAINTAINERS, ice) This series contains updates to MAINTAINERS file and ice driver. Jesse replaces himself with Przemek in the maintainers file. Karthik Sundaravel adds support for VF get/set MAC address via devlink. Eric checks for errors from ice_vsi_rebuild() during queue reconfiguration. Paul adjusts FW API version check for E830 devices. Piotr adds differentiation of unload type when shutting down AdminQ. Przemek changes ice_adapter initialization to occur once per physical card. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01Merge branch 'bnxt_en-ptp' into mainDavid S. Miller
Michael Chan says: ==================== bnxt_en: PTP updates for net-next The first 5 patches implement the PTP feature on the new BCM5760X chips. The main new hardware feature is the new TX timestamp completion which enables the driver to retrieve the TX timestamp in NAPI without deferring to the PTP worker. The last 5 patches increase the number of TX PTP packets in-flight from 1 to 4 on the older BCM5750X chips. On these older chips, we need to call firmware in the PTP worker to retrieve the timestamp. We use an arry to keep track of the in-flight TX PTP packets. v2: Patch #2: Fix the unwind of txr->is_ts_pkt when bnxt_start_xmit() aborts. Patch #4: Set the SKBTX_IN_PROGRESS flag for timestamp packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Remove atomic operations on ptp->tx_availPavan Chebbi
Now that we require the spinlock to protect ptp->txts_prod, change ptp->tx_avail to non-atomic and protect it under the same spinlock. Add a new helper function bnxt_ptp_get_txts_prod() to decrement ptp->tx_avail under spinlock and return the producer. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Increase the max total outstanding PTP TX packets to 4Pavan Chebbi
Start accepting up to 4 TX TS requests on BCM5750X (P5) chips. These PTP TX packets will be queued in the ptp->txts_req[] array waiting for the TX timestamp to complete. The entries in the array will be managed by a producer and consumer index. The producer index is updated under spinlock since multiple TX rings can try to send PTP packets at the same time. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Let bnxt_stamp_tx_skb() return error codePavan Chebbi
Change the function bnxt_stamp_tx_skb() to return 0 for suceess or -EAGAIN if the timestamp is still pending in firmware. The calling PTP aux worker will reschedule based on the return code. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Remove an impossible condition check for PTP TX pending SKBPavan Chebbi
In the current 5750X PTP code paths, there is always at most one TX SKB requested for timestamp and we won't accept another one until we have retrieved the timestamp or it has timed out. Remove the unnecessary check in bnxt_get_tx_ts_p5() for a pending SKB and change the function to void. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Refactor all PTP TX timestamp fields into a structPavan Chebbi
On the older 5750X (P5) chips, we currently support only 1 TX PTP packet in-flight waiting for the timestamp. Refactor the datastructures to prepare to support up to 4 TX PTP packets. Combine all fields required for PTP TX timestamp query into one structure. An array of this structure will be added in follow-on patches to support multiple outstanding TX timestamps. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Add BCM5760X specific PHC registers mappingPavan Chebbi
BCM5760X firmware will advertise direct 64-bit PHC registers access for the driver from BAR0. Make the necessary changes in handling HWRM_PORT_MAC_PTP_QCFG's response and PHC register mapping for 5760X chips. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Add TX timestamp completion logicMichael Chan
The new BCM5760X chips will return the timestamp of TX packets in a new completion. Add logic in __bnxt_poll_work() to handle this completion type to retrieve the timestamp. This feature eliminates the limit on the number of in-flight PTP TX packets. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Allow some TX packets to be unprocessed in NAPIMichael Chan
The driver's current logic will always free all the TX SKBs up to txr->tx_hw_cons within NAPI. In the next patches, we'll be adding logic to handle TX timestamp completion and we may need to hold some remaining TX SKBs if we don't have the timestamp completions yet. Modify __bnxt_poll_work_done() to clear each event bit separately to allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or not. bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX SKBs are held waiting for TX timestamps. Note that legacy chips will never hold any SKBs this way. The SKB is always deferred to the PTP worker slow path to retrieve the timestamp from firmware. On the new P7 chips, the timestamp is returned by the hardware directly and we can retrieve it directly from NAPI. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bdMichael Chan
Remove the unused is_gso field and add the is_ts_pkt field to struct bnxt_sw_tx_bd. This field will mark the TX BD that has requested HW TX timestamp. The field needs to be cleared if the timestamp packet is later aborted. This field will be useful when processing the new TX timestamp completion from the hardware in the next patches. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Add new TX timestamp completion definitionsMichael Chan
The new BCM5760X chips will generate this new TX timestamp completion when a TX packet's timestamp has been taken right before transmission. The driver logic to retrieve the timestamp will be added in the next few patches. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAMNithin Dabilpuram
Octeontx2 hardware uses Near Data Cache(NDC) block to cache contexts in it so that access to LLC/DRAM can be avoided. It is recommended in HRM to sync the NDC contents before releasing/resetting LF resources. Hence implement NDC_SYNC mailbox and sync contexts during driver teardown. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01net: tn40xx: add initial ethtool_ops supportFUJITA Tomonori
Call phylink_ethtool_ksettings_get() for get_link_ksettings method and ethtool_op_get_link() for get_link method. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01Merge tag 'nf-next-24-06-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: Patch #1 to #11 to shrink memory consumption for transaction objects: struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */ struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */ struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */ struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */ struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */ struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */ struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */ struct nft_trans_elem can now be allocated from kmalloc-96 instead of kmalloc-128 slab. Series from Florian Westphal. For the record, I have mangled patch #1 to add nft_trans_container_*() and use if for every transaction object. I have also added BUILD_BUG_ON to ensure struct nft_trans always comes at the beginning of the container transaction object. And few minor cleanups, any new bugs are of my own. Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno. Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang. Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting default conntrack timeouts via nfnetlink_cttimeout API, from Lin Ma. Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use larger secctx names than the existing 256 bytes length. Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving nfnetlink_queue, from Florian Westphal. Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01Merge branch 'tcp_metrics-netlink-specs' into mainDavid S. Miller
Jakub Kicinski says: ==================== tcp_metrics: add netlink protocol spec in YAML Add a netlink protocol spec for the tcp_metrics generic netlink family. First patch adjusts the uAPI header guards to make it easier to build tools/ with non-system headers. v1: https://lore.kernel.org/all/20240626201133.2572487-1-kuba@kernel.org ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01tcp_metrics: add netlink protocol spec in YAMLJakub Kicinski
Add a protocol spec for tcp_metrics, so that it's accessible via YNL. Useful at the very least for testing fixes. In this episode of "10,000 ways to complicate netlink" the metric nest has defines which are off by 1. iproute2 does: struct rtattr *m[TCP_METRIC_MAX + 1 + 1]; parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a); for (i = 0; i < TCP_METRIC_MAX + 1; i++) { // ... attr = m[i + 1]; This is too weird to support in YNL, add a new set of defines with _correct_ values to the official kernel header. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01tcp_metrics: add UAPI to the header guardJakub Kicinski
tcp_metrics' header lacks the customary _UAPI in the header guard. This makes YNL build rules work less seamlessly. We can easily fix that on YNL side, but this could also be problematic if we ever needed to create a kernel-only tcp_metrics.h. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01net: phy: realtek: Add support for PHY LEDs on RTL8211FMarek Vasut
Realtek RTL8211F Ethernet PHY supports 3 LED pins which are used to indicate link status and activity. Add minimal LED controller driver supporting the most common uses with the 'netdev' trigger. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28Merge branch 'ethtool-track-custom-rss-contexts-in-the-core'Jakub Kicinski
Edward Cree says: ==================== ethtool: track custom RSS contexts in the core Make the core responsible for tracking the set of custom RSS contexts, their IDs, indirection tables, hash keys, and hash functions; this lets us get rid of duplicative code in drivers, and will allow us to support netlink dumps later. This series only moves the sfc EF10 & EF100 driver over to the new API; other drivers (mvpp2, octeontx2, mlx5, sfc/siena, bnxt_en) can be converted afterwards and the legacy API removed. ==================== Link: https://patch.msgid.link/cover.1719502239.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28sfc: remove get_rxfh_context dead codeEdward Cree
The core now always satisfies 'ethtool -x context nonzero' from its own tracking, so our lookup code for that case is never called. Remove it. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/b426fcc416dedc8f203e52eebef6891eccebe4c1.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28net: ethtool: use the tracking array for get_rxfh on custom RSS contextsEdward Cree
On 'ethtool -x' with rss_context != 0, instead of calling the driver to read the RSS settings for the context, just get the settings from the rss_ctx xarray, and return them to the user with no driver involvement. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/2d0190fa29638f307ea720f882ebd41f6f867694.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28sfc: use new rxfh_context APIEdward Cree
The core is now responsible for allocating IDs and a memory region for us to store our state (struct efx_rss_context_priv), so we no longer need efx_alloc_rss_context_entry() and friends. Since the contexts are now maintained by the core, use the core's lock (net_dev->ethtool->rss_lock), rather than our own mutex (efx->rss_lock), to serialise access against changes; and remove the now-unused efx->rss_lock from struct efx_nic. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/150274740ea8cc137fef5502541ce573d32fb319.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28net: ethtool: add a mutex protecting RSS contextsEdward Cree
While this is not needed to serialise the ethtool entry points (which are all under RTNL), drivers may have cause to asynchronously access dev->ethtool->rss_ctx; taking dev->ethtool->rss_lock allows them to do this safely without needing to take the RTNL. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/7f9c15eb7525bf87af62c275dde3a8570ee8bf0a.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28net: ethtool: add an extack parameter to new rxfh_context APIsEdward Cree
Currently passed as NULL, but will allow drivers to report back errors when ethnl support for these ops is added. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/6e0012347d175fdd1280363d7bfa76a2f2777e17.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28net: ethtool: let the core choose RSS context IDsEdward Cree
Add a new API to create/modify/remove RSS contexts, that passes in the newly-chosen context ID (not as a pointer) rather than leaving the driver to choose it on create. Also pass in the ctx, allowing drivers to easily use its private data area to store their hardware-specific state. Keep the existing .set_rxfh API for now as a fallback, but deprecate it for custom contexts (rss_context != 0). Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/45f1fe61df2163c091ec394c9f52000c8b16cc3b.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28net: ethtool: record custom RSS contexts in the XArrayEdward Cree
Since drivers are still choosing the context IDs, we have to force the XArray to use the ID they've chosen rather than picking one ourselves, and handle the case where they give us an ID that's already in use. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/801f5faa4cec87c65b2c6e27fb220c944bce593a.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28net: ethtool: attach an XArray of custom RSS contexts to a netdeviceEdward Cree
Each context stores the RXFH settings (indir, key, and hfunc) as well as optionally some driver private data. Delete any still-existing contexts at netdev unregister time. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/cbd1c402cec38f2e03124f2ab65b4ae4e08bd90d.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28net: move ethtool-related netdev state into its own structEdward Cree
net_dev->ethtool is a pointer to new struct ethtool_netdev_state, which currently contains only the wol_enabled field. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/293a562278371de7534ed1eb17531838ca090633.1719502239.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28Merge branch 'selftests-drv-net-add-ability-to-schedule-cleanup-with-defer'Jakub Kicinski
Jakub Kicinski says: ==================== selftests: drv-net: add ability to schedule cleanup with defer() Introduce a defer / cleanup mechanism for driver selftests. More detailed info in the second patch. ==================== Link: https://patch.msgid.link/20240627185502.3069139-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28selftests: drv-net: rss_ctx: convert to defer()Jakub Kicinski
Use just added defer(). Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20240627185502.3069139-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28selftests: drv-net: add ability to schedule cleanup with defer()Jakub Kicinski
This implements what I was describing in [1]. When writing a test author can schedule cleanup / undo actions right after the creation completes, eg: cmd("touch /tmp/file") defer(cmd, "rm /tmp/file") defer() takes the function name as first argument, and the rest are arguments for that function. defer()red functions are called in inverse order after test exits. It's also possible to capture them and execute earlier (in which case they get automatically de-queued). undo = defer(cmd, "rm /tmp/file") # ... some unsafe code ... undo.exec() As a nice safety all exceptions from defer()ed calls are captured, printed, and ignored (they do make the test fail, however). This addresses the common problem of exceptions in cleanup paths often being unhandled, leading to potential leaks. There is a global action queue, flushed by ksft_run(). We could support function level defers too, I guess, but there's no immediate need.. Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1] Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20240627185502.3069139-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28selftests: net: ksft: avoid continue when handling resultsJakub Kicinski
Exception handlers print the result and use continue to skip the non-exception result printing. This makes inserting common post-test code hard. Refactor to avoid the continues and have only one ktap_result() call. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20240627185502.3069139-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28enic: add ethtool get_channel supportJon Kohler
Add .get_channel to enic_ethtool_ops to enable basic ethtool -l support to get the current channel configuration. Note that the driver does not support dynamically changing queue configuration, so .set_channel is intentionally unused. Instead, users should use Cisco's hardware management tools (UCSM/IMC) to modify virtual interface card configuration out of band. Signed-off-by: Jon Kohler <jon@nutanix.com> Link: https://patch.msgid.link/20240627202013.2398217-1-jon@nutanix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28Merge branch ↵Jakub Kicinski
'lift-udp_segment-restriction-for-egress-via-device-w-o-csum-offload' Jakub Sitnicki says: ==================== Lift UDP_SEGMENT restriction for egress via device w/o csum offload This is a follow-up to an earlier question [1] if we can make UDP GSO work with any egress device, even those with no checksum offload capability. That's the default setup for TUN/TAP. Because there is a change in behavior - sendmsg() does no longer return EIO error - I'm submitting through net-next tree, rather than net, as per Willem's advice. [1] https://lore.kernel.org/netdev/87jzqsld6q.fsf@cloudflare.com/ v1: https://lore.kernel.org/r/20240622-linux-udpgso-v1-0-d2344157ab2a@cloudflare.com ==================== Link: https://patch.msgid.link/20240626-linux-udpgso-v2-0-422dfcbd6b48@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28selftests/net: Add test coverage for UDP GSO software fallbackJakub Sitnicki
Extend the existing test to exercise UDP GSO egress through devices with various offload capabilities, including lack of checksum offload, which is the default case for TUN/TAP devices. Test against a dummy device because it is simpler to set up then TUN/TAP. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626-linux-udpgso-v2-2-422dfcbd6b48@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28udp: Allow GSO transmit from devices with no checksum offloadJakub Sitnicki
Today sending a UDP GSO packet from a TUN device results in an EIO error: import fcntl, os, struct from socket import * TUNSETIFF = 0x400454CA IFF_TUN = 0x0001 IFF_NO_PI = 0x1000 UDP_SEGMENT = 103 tun_fd = os.open("/dev/net/tun", os.O_RDWR) ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI) fcntl.ioctl(tun_fd, TUNSETIFF, ifr) os.system("ip addr add 192.0.2.1/24 dev tun0") os.system("ip link set dev tun0 up") s = socket(AF_INET, SOCK_DGRAM) s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200) s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO This is due to a check in the udp stack if the egress device offers checksum offload. While TUN/TAP devices, by default, don't advertise this capability because it requires support from the TUN/TAP reader. However, the GSO stack has a software fallback for checksum calculation, which we can use. This way we don't force UDP_SEGMENT users to handle the EIO error and implement a segmentation fallback. Lift the restriction so that UDP_SEGMENT can be used with any egress device. We also need to adjust the UDP GSO code to match the GSO stack expectation about ip_summed field, as set in commit 8d63bee643f1 ("net: avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit the bad offload check. Users should, however, expect a potential performance impact when batch-sending packets with UDP_SEGMENT without checksum offload on the egress device. In such case the packet payload is read twice: first during the sendmsg syscall when copying data from user memory, and then in the GSO stack for checksum computation. This double memory read can be less efficient than a regular sendmsg where the checksum is calculated during the initial data copy from user memory. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626-linux-udpgso-v2-1-422dfcbd6b48@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-28ice: do not init struct ice_adapter more times than neededPrzemek Kitszel
Allocate and initialize struct ice_adapter object only once per physical card instead of once per port. This is not a big deal by now, but we want to extend this struct more and more in the near future. Our plans include PTP stuff and a devlink instance representing whole-device/physical card. Transactions requiring to be sleep-able (like those doing user (here ice) memory allocation) must be performed with an additional (on top of xarray) mutex. Adding it here removes need to xa_lock() manually. Since this commit is a reimplementation of ice_adapter_get(), a rather new scoped_guard() wrapper for locking is used to simplify the logic. It's worth to mention that xa_insert() use gives us both slot reservation and checks if it is already filled, what simplifies code a tiny bit. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28ice: Distinguish driver reset and removal for AQ shutdownPiotr Gardocki
Admin queue command for shutdown AQ contains a flag to indicate driver unload. However, the flag is always set in the driver, even for resets. It can cause the firmware to consider driver as unloaded once the PF reset is triggered on all ports of device, which could lead to unexpected results. Add an additional function parameter to functions that shutdown AQ, indicating whether the driver is actually unloading. Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28ice: Allow different FW API versions based on MAC typePaul Greenwalt
Allow the driver to be compatible with different FW API versions based on the device's MAC type. Currently, E810 is only compatible with one FW API version. Now the driver can be compatible with different FW API versions for both E810 and E830. For example, E810 FW API version is 1.5.0 and E830 is 1.7.0. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28ice: Check all ice_vsi_rebuild() errors in functionEric Joyner
Check the return value from ice_vsi_rebuild() and prevent the usage of incorrectly configured VSI. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28ice: Add get/set hw address for VFs using devlink commandsKarthik Sundaravel
Changing the MAC address of the VFs is currently unsupported via devlink. Add the function handlers to set and get the HW address for the VFs. Signed-off-by: Karthik Sundaravel <ksundara@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28MAINTAINERS: update Intel Ethernet maintainersJesse Brandeburg
Since Jesse has moved to a new role, replace him with a new maintainer to work with Tony on representing Intel networking drivers in the kernel. Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28netfilter: xt_recent: Lift restrictions on max hitcount valuePhil Sutter
Support tracking of up to 65535 packets per table entry instead of just 255 to better facilitate longer term tracking or higher throughput scenarios. Note how this aligns sizes of struct recent_entry's 'nstamps' and 'index' fields when 'nstamps' was larger before. This is unnecessary as the value of 'nstamps' grows along with that of 'index' after being initialized to 1 (see recent_entry_update()). Its value will thus never exceed that of 'index' and therefore does not need to provide space for larger values. Requested-by: Fabio <pedretti.fabio@gmail.com> Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1745 Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-28selftests: netfilter: nft_queue.sh: add test for disappearing listenerFlorian Westphal
If userspace program exits while the queue its subscribed to has packets those need to be discarded. commit dc21c6cc3d69 ("netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be triggered in this case. Add a test case to cover this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-28Merge branch 'net-selftests-mirroring-cleanup' into mainDavid S. Miller
Petr Machata says: ==================== selftest: Clean-up and stabilize mirroring tests The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy. mirror_test() accommodated this noisiness by giving the counters an allowance of several packets. But that only works up to a point, and on busy systems won't be always enough. In this patch set, clean up and stabilize the mirroring selftests. The original intention was to port the tests over to UDP, but the logic of ICMP ends up being so entangled in the mirroring selftests that the changes feel overly invasive. Instead, ICMP is kept, but where possible, we match on ICMP message type, thus filtering out hits by other ICMP messages. Where this is not practical (where the counter tap is put on a device that carries encapsulated packets), switch the counter condition to _at least_ X observed packets. This is less robust, but barely so -- probably the only scenario that this would not catch is something like erroneous packet duplication, which would hopefully get caught by the numerous other tests in this extensive suite. - Patches #1 to #3 clean up parameters at various helpers. - Patches #4 to #6 stabilize the mirroring selftests as described above. - Mirroring tests currently allow testing SW datapath even on HW netdevices by trapping traffic to the SW datapath. This complicates the tests a bit without a good reason: to test SW datapath, just run the selftests on the veth topology. Thus in patch #7, drop support for this dual SW/HW testing. - At this point, some cleanups were either made possible by the previous patches, or were always possible. In patches #8 to #11, realize these cleanups. - In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28selftests: mlxsw: mirror_gre: Obey TESTSPetr Machata
This test is unusual in that overriding TESTS does not change the tests to be run. Split the individual tests into several functions and invoke them through tests_run() as appropriate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28selftests: libs: Drop unused functionsPetr Machata
Nothing calls these. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28selftests: libs: Drop slow_path_trap_install()/_uninstall()Petr Machata
These functions are not used anymore. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>