summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/idpf
AgeCommit message (Collapse)Author
10 daysidpf: access ->pp through netmem_desc instead of pageByungchul Park
To eliminate the use of struct page in page pool, the page pool users should use netmem descriptor and APIs instead. Make idpf access ->pp through netmem_desc instead of page. Signed-off-by: Byungchul Park <byungchul@sk.com> Link: https://patch.msgid.link/20250721021835.63939-10-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18idpf: preserve coalescing settings across resetsAhmed Zaki
The IRQ coalescing config currently reside only inside struct idpf_q_vector. However, all idpf_q_vector structs are de-allocated and re-allocated during resets. This leads to user-set coalesce configuration to be lost. Add new fields to struct idpf_vport_user_config_data to save the user settings and re-apply them after reset. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18idpf: add cross timestampingMilena Olech
Add cross timestamp support through virtchnl mailbox messages and directly, through PCIe BAR registers. Cross timestamping assumes that both system time and device clock time values are cached simultaneously, what is triggered by HW. Feature is enabled for both ARM and x86 archs. Signed-off-by: Milena Olech <milena.olech@intel.com> Reviewed-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18idpf: add flow steering supportAhmed Zaki
Use the new virtchnl2 OP codes to communicate with the Control Plane to add flow steering filters. We add the basic functionality for add/delete with TCP/UDP IPv4 only. Support for other OP codes and protocols will be added later. Standard 'ethtool -N|--config-ntuple' should be used, for example: # ethtool -N ens801f0d1 flow-type tcp4 src-ip 10.0.0.1 action 6 to route all IPv4/TCP traffic from IP 10.0.0.1 to queue 6. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18virtchnl2: add flow steering supportSudheer Mogilappagari
Add opcodes and corresponding message structure to add and delete flow steering rules. Flow steering enables configuration of rules to take an action or subset of actions based on a match criteria. Actions could be redirect to queue, redirect to queue group, drop packet or mark. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Co-developed-by: Dinesh Kumar <dinesh.kumar@intel.com> Signed-off-by: Dinesh Kumar <dinesh.kumar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18virtchnl2: rename enum virtchnl2_cap_rssAhmed Zaki
The "enum virtchnl2_cap_rss" will be used for negotiating flow steering capabilities. Instead of adding a new enum, rename virtchnl2_cap_rss to virtchnl2_flow_types. Also rename the enum's constants. Flow steering will use this enum in the next patches. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-17Merge branch 'for-next' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Tony Nguyen says: ==================== Add RDMA support for Intel IPU E2000 in idpf Tatyana Nikolova says: This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ IWL reviews: v3: https://lore.kernel.org/all/20250708210554.1662-1-tatyana.e.nikolova@intel.com/ v2: https://lore.kernel.org/all/20250612220002.1120-1-tatyana.e.nikolova@intel.com/ v1 (split from previous series): https://lore.kernel.org/all/20250523170435.668-1-tatyana.e.nikolova@intel.com/ v3: https://lore.kernel.org/all/20250207194931.1569-1-tatyana.e.nikolova@intel.com/ RFC v2: https://lore.kernel.org/all/20240824031924.421-1-tatyana.e.nikolova@intel.com/ RFC: https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: idpf: implement get LAN MMIO memory regions idpf: implement IDC vport aux driver MTU change handler idpf: implement remaining IDC RDMA core callbacks and handlers idpf: implement RDMA vport auxiliary dev create, init, and destroy idpf: implement core RDMA auxiliary dev create, init, and destroy idpf: use reserved RDMA vectors from control plane ==================== Link: https://patch.msgid.link/20250714181002.2865694-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-14idpf: implement get LAN MMIO memory regionsJoshua Hay
The RDMA driver needs to map its own MMIO regions for the sake of performance, meaning the IDPF needs to avoid mapping portions of the BAR space. However, to be HW agnostic, the IDPF cannot assume where these are and must avoid mapping hard coded regions as much as possible. The IDPF maps the bare minimum to load and communicate with the control plane, i.e., the mailbox registers and the reset state registers. Because of how and when mailbox register offsets are initialized, it is easier to adjust the existing defines to be relative to the mailbox region starting address. Use a specific mailbox register write function that uses these relative offsets. The reset state register addresses are calculated the same way as for other registers, described below. The IDPF then calls a new virtchnl op to fetch a list of MMIO regions that it should map. The addresses for the registers in these regions are calculated by determining what region the register resides in, adjusting the offset to be relative to that region, and then adding the register's offset to that region's mapped address. If the new virtchnl op is not supported, the IDPF will fallback to mapping the whole bar. However, it will still map them as separate regions outside the mailbox and reset state registers. This way we can use the same logic in both cases to access the MMIO space. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement IDC vport aux driver MTU change handlerJoshua Hay
The only event an RDMA vport aux driver cares about right now is an MTU change on its underlying vport. Implement and plumb the handler to signal the pre MTU change event and post MTU change events to the RDMA vport aux driver. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement remaining IDC RDMA core callbacks and handlersJoshua Hay
Implement the idpf_idc_request_reset and idpf_idc_rdma_vc_send_sync callbacks for the rdma core auxiliary driver to issue reset events to the idpf and send (synchronous) virtchnl messages to the control plane respectively. Implement and plumb the reset handler for the opposite flow as well, i.e. when the idpf is resetiing and needs to notify the rdma core auxiliary driver. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement RDMA vport auxiliary dev create, init, and destroyJoshua Hay
Implement the functions to create, initialize, and destroy an RDMA vport auxiliary device. The vport aux dev creation is dependent on the core aux device to call idpf_idc_vport_dev_ctrl to signal that it is ready for vport aux devices. Implement that core callback to either create and initialize the vport aux dev or deinitialize. RDMA vport aux dev creation is also dependent on the control plane to tell us the vport is RDMA enabled. Add a flag in the create vport message to signal individual vport RDMA capabilities. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement core RDMA auxiliary dev create, init, and destroyJoshua Hay
Add the initial idpf_idc.c file with the functions to kick off the IDC initialization, create and initialize a core RDMA auxiliary device, and destroy said device. The RDMA core has a dependency on the vports being created by the control plane before it can be initialized. Therefore, once all the vports are up after a hard reset (either during driver load a function level reset), the core RDMA device info will be created. It is populated with the function type (as distinguished by the IDC initialization function pointer), the core idc_ops function points (just stubs for now), the reserved RDMA MSIX table, and various other info the core RDMA auxiliary driver will need. It is then plugged on to the bus. During a function level reset or driver unload, the device will be unplugged from the bus and destroyed. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: use reserved RDMA vectors from control planeJoshua Hay
Fetch the number of reserved RDMA vectors from the control plane. Adjust the number of reserved LAN vectors if necessary. Adjust the minimum number of vectors the OS should reserve to include RDMA; and fail if the OS cannot reserve enough vectors for the minimum number of LAN and RDMA vectors required. Create a separate msix table for the reserved RDMA vectors, which will just get handed off to the RDMA core device to do with what it will. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR (net-6.16-rc5). No conflicts. No adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-01idpf: convert control queue mutex to a spinlockAhmed Zaki
With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated on module load: [ 324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578 [ 324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager [ 324.701689] preempt_count: 201, expected: 0 [ 324.701693] RCU nest depth: 0, expected: 0 [ 324.701697] 2 locks held by NetworkManager/1582: [ 324.701702] #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0 [ 324.701730] #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870 [ 324.701749] Preemption disabled at: [ 324.701752] [<ffffffff9cd23b9d>] __dev_open+0x3dd/0x870 [ 324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary) [ 324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 [ 324.701774] Call Trace: [ 324.701777] <TASK> [ 324.701779] dump_stack_lvl+0x5d/0x80 [ 324.701788] ? __dev_open+0x3dd/0x870 [ 324.701793] __might_resched.cold+0x1ef/0x23d <..> [ 324.701818] __mutex_lock+0x113/0x1b80 <..> [ 324.701917] idpf_ctlq_clean_sq+0xad/0x4b0 [idpf] [ 324.701935] ? kasan_save_track+0x14/0x30 [ 324.701941] idpf_mb_clean+0x143/0x380 [idpf] <..> [ 324.701991] idpf_send_mb_msg+0x111/0x720 [idpf] [ 324.702009] idpf_vc_xn_exec+0x4cc/0x990 [idpf] [ 324.702021] ? rcu_is_watching+0x12/0xc0 [ 324.702035] idpf_add_del_mac_filters+0x3ed/0xb50 [idpf] <..> [ 324.702122] __hw_addr_sync_dev+0x1cf/0x300 [ 324.702126] ? find_held_lock+0x32/0x90 [ 324.702134] idpf_set_rx_mode+0x317/0x390 [idpf] [ 324.702152] __dev_open+0x3f8/0x870 [ 324.702159] ? __pfx___dev_open+0x10/0x10 [ 324.702174] __dev_change_flags+0x443/0x650 <..> [ 324.702208] netif_change_flags+0x80/0x160 [ 324.702218] do_setlink.isra.0+0x16a0/0x3960 <..> [ 324.702349] rtnl_newlink+0x12fd/0x21e0 The sequence is as follows: rtnl_newlink()-> __dev_change_flags()-> __dev_open()-> dev_set_rx_mode() - > # disables BH and grabs "dev->addr_list_lock" idpf_set_rx_mode() -> # proceed only if VIRTCHNL2_CAP_MACFILTER is ON __dev_uc_sync() -> idpf_add_mac_filter -> idpf_add_del_mac_filters -> idpf_send_mb_msg() -> idpf_mb_clean() -> idpf_ctlq_clean_sq() # mutex_lock(cq_lock) Fix by converting cq_lock to a spinlock. All operations under the new lock are safe except freeing the DMA memory, which may use vunmap(). Fix by requesting a contiguous physical memory for the DMA mapping. Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-01idpf: return 0 size for RSS key if not supportedMichal Swiatkowski
Returning -EOPNOTSUPP from function returning u32 is leading to cast and invalid size value as a result. -EOPNOTSUPP as a size probably will lead to allocation fail. Command: ethtool -x eth0 It is visible on all devices that don't have RSS caps set. [ 136.615917] Call Trace: [ 136.615921] <TASK> [ 136.615927] ? __warn+0x89/0x130 [ 136.615942] ? __alloc_frozen_pages_noprof+0x322/0x330 [ 136.615953] ? report_bug+0x164/0x190 [ 136.615968] ? handle_bug+0x58/0x90 [ 136.615979] ? exc_invalid_op+0x17/0x70 [ 136.615987] ? asm_exc_invalid_op+0x1a/0x20 [ 136.616001] ? rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616016] ? __alloc_frozen_pages_noprof+0x322/0x330 [ 136.616028] __alloc_pages_noprof+0xe/0x20 [ 136.616038] ___kmalloc_large_node+0x80/0x110 [ 136.616072] __kmalloc_large_node_noprof+0x1d/0xa0 [ 136.616081] __kmalloc_noprof+0x32c/0x4c0 [ 136.616098] ? rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616105] rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616114] ethnl_default_doit+0x107/0x3d0 [ 136.616131] genl_family_rcv_msg_doit+0x100/0x160 [ 136.616147] genl_rcv_msg+0x1b8/0x2c0 [ 136.616156] ? __pfx_ethnl_default_doit+0x10/0x10 [ 136.616168] ? __pfx_genl_rcv_msg+0x10/0x10 [ 136.616176] netlink_rcv_skb+0x58/0x110 [ 136.616186] genl_rcv+0x28/0x40 [ 136.616195] netlink_unicast+0x19b/0x290 [ 136.616206] netlink_sendmsg+0x222/0x490 [ 136.616215] __sys_sendto+0x1fd/0x210 [ 136.616233] __x64_sys_sendto+0x24/0x30 [ 136.616242] do_syscall_64+0x82/0x160 [ 136.616252] ? __sys_recvmsg+0x83/0xe0 [ 136.616265] ? syscall_exit_to_user_mode+0x10/0x210 [ 136.616275] ? do_syscall_64+0x8e/0x160 [ 136.616282] ? __count_memcg_events+0xa1/0x130 [ 136.616295] ? count_memcg_events.constprop.0+0x1a/0x30 [ 136.616306] ? handle_mm_fault+0xae/0x2d0 [ 136.616319] ? do_user_addr_fault+0x379/0x670 [ 136.616328] ? clear_bhb_loop+0x45/0xa0 [ 136.616340] ? clear_bhb_loop+0x45/0xa0 [ 136.616349] ? clear_bhb_loop+0x45/0xa0 [ 136.616359] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 136.616369] RIP: 0033:0x7fd30ba7b047 [ 136.616376] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d bd d5 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 71 c3 55 48 83 ec 30 44 89 4c 24 2c 4c 89 44 [ 136.616381] RSP: 002b:00007ffde1796d68 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 136.616388] RAX: ffffffffffffffda RBX: 000055d7bd89f2a0 RCX: 00007fd30ba7b047 [ 136.616392] RDX: 0000000000000028 RSI: 000055d7bd89f3b0 RDI: 0000000000000003 [ 136.616396] RBP: 00007ffde1796e10 R08: 00007fd30bb4e200 R09: 000000000000000c [ 136.616399] R10: 0000000000000000 R11: 0000000000000202 R12: 000055d7bd89f340 [ 136.616403] R13: 000055d7bd89f3b0 R14: 000055d78943f200 R15: 0000000000000000 Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16libeth: convert to netmemAlexander Lobakin
Back when the libeth Rx core was initially written, devmem was a draft and netmem_ref didn't exist in the mainline. Now that it's here, make libeth MP-agnostic before introducing any new code or any new library users. When it's known that the created PP/FQ is for header buffers, use faster "unsafe" underscored netmem <--> virt accessors as netmem_is_net_iov() is always false in that case, but consumes some cycles (bit test + true branch). Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-30idpf: avoid mailbox timeout delays during resetEmil Tantilov
Mailbox operations are not possible while the driver is in reset. Operations that require MBX exchange with the control plane will result in long delays if executed while a reset is in progress: ethtool -L <inf> combined 8& echo 1 > /sys/class/net/<inf>/device/reset idpf 0000:83:00.0: HW reset detected idpf 0000:83:00.0: Device HW Reset initiated idpf 0000:83:00.0: Transaction timed-out (op:504 cookie:be00 vc_op:504 salt:be timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:508 cookie:bf00 vc_op:508 salt:bf timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:512 cookie:c000 vc_op:512 salt:c0 timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:510 cookie:c100 vc_op:510 salt:c1 timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c200 vc_op:509 salt:c2 timeout:60000ms) idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c300 vc_op:509 salt:c3 timeout:60000ms) idpf 0000:83:00.0: Transaction timed-out (op:505 cookie:c400 vc_op:505 salt:c4 timeout:60000ms) idpf 0000:83:00.0: Failed to configure queues for vport 0, -62 Disable mailbox communication in case of a reset, unless it's done during a driver load, where the virtchnl operations are needed to configure the device. Fixes: 8077c727561aa ("idpf: add controlq init and reset checks") Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-30idpf: fix a race in txq wakeupBrian Vazquez
Add a helper function to correctly handle the lockless synchronization when the sender needs to block. The paradigm is if (no_resources()) { stop_queue(); barrier(); if (!no_resources()) restart_queue(); } netif_subqueue_maybe_stop already handles the paradigm correctly, but the code split the check for resources in three parts, the first one (descriptors) followed the protocol, but the other two (completions and tx_buf) were only doing the first part and so race prone. Luckily netif_subqueue_maybe_stop macro already allows you to use a function to evaluate the start/stop conditions so the fix only requires the right helper function to evaluate all the conditions at once. The patch removes idpf_tx_maybe_stop_common since it's no longer needed and instead adjusts separately the conditions for singleq and splitq. Note that idpf_tx_buf_hw_update doesn't need to check for resources since that will be covered in idpf_tx_splitq_frame. To reproduce: Reduce the threshold for pending completions to increase the chances of hitting this pause by changing your kernel: drivers/net/ethernet/intel/idpf/idpf_txrx.h -#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 1) +#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 4) Use pktgen to force the host to push small pkts very aggressively: ./pktgen_sample02_multiqueue.sh -i eth1 -s 100 -6 -d $IP -m $MAC \ -p 10000-10000 -t 16 -n 0 -v -x -c 64 Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Luigi Rizzo <lrizzo@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.15-rc8). Conflicts: 80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X") 4bcc063939a5 ("ice, irdma: fix an off by one in error handling code") c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers") https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au No extra adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21idpf: fix idpf_vport_splitq_napi_poll()Eric Dumazet
idpf_vport_splitq_napi_poll() can incorrectly return @budget after napi_complete_done() has been called. This violates NAPI rules, because after napi_complete_done(), current thread lost napi ownership. Move the test against POLL_MODE before the napi_complete_done(). Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Reported-by: Peter Newman <peternewman@google.com> Closes: https://lore.kernel.org/netdev/20250520121908.1805732-1-edumazet@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Joshua Hay <joshua.a.hay@intel.com> Cc: Alan Brady <alan.brady@intel.com> Cc: Madhu Chittim <madhu.chittim@intel.com> Cc: Phani Burra <phani.r.burra@intel.com> Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Link: https://patch.msgid.link/20250520124030.1983936-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-19idpf: fix null-ptr-deref in idpf_features_checkPavan Kumar Linga
idpf_features_check is used to validate the TX packet. skb header length is compared with the hardware supported value received from the device control plane. The value is stored in the adapter structure and to access it, vport pointer is used. During reset all the vports are released and the vport pointer that the netdev private structure points to is NULL. To avoid null-ptr-deref, store the max header length value in netdev private structure. This also helps to cache the value and avoid accessing adapter pointer in hot path. BUG: kernel NULL pointer dereference, address: 0000000000000068 ... RIP: 0010:idpf_features_check+0x6d/0xe0 [idpf] Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x154/0x520 ? exc_page_fault+0x76/0x190 ? asm_exc_page_fault+0x26/0x30 ? idpf_features_check+0x6d/0xe0 [idpf] netif_skb_features+0x88/0x310 validate_xmit_skb+0x2a/0x2b0 validate_xmit_skb_list+0x4c/0x70 sch_direct_xmit+0x19d/0x3a0 __dev_queue_xmit+0xb74/0xe70 ... Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops") Reviewed-by: Madhu Chititm <madhu.chittim@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add support for Rx timestampingMilena Olech
Add Rx timestamp function when the Rx timestamp value is read directly from the Rx descriptor. In order to extend the Rx timestamp value to 64 bit in hot path, the PHC time is cached in the receive groups. Add supported Rx timestamp modes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: YiFei Zhu <zhuyifei@google.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add Tx timestamp flowsMilena Olech
Add functions to request Tx timestamp for the PTP packets, read the Tx timestamp when the completion tag for that packet is being received, extend the Tx timestamp value and set the supported timestamping modes. Tx timestamp is requested for the PTP packets by setting a TSYN bit and index value in the Tx context descriptor. The driver assumption is that the Tx timestamp value is ready to be read when the completion tag is received. Then the driver schedules delayed work and the Tx timestamp value read is requested through virtchnl message. At the end, the Tx timestamp value is extended to 64-bit and provided back to the skb. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add Tx timestamp capabilities negotiationMilena Olech
Tx timestamp capabilities are negotiated for the uplink Vport. Driver receives information about the number of available Tx timestamp latches, the size of Tx timestamp value and the set of indexes used for Tx timestamping. Add function to get the Tx timestamp capabilities and parse the uplink vport flag. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add PTP clock configurationMilena Olech
PTP clock configuration operations - set time, adjust time and adjust frequency are required to control the clock and maintain synchronization process. Extend get PTP capabilities function to request for the clock adjustments and add functions to enable these actions using dedicated virtchnl messages. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add mailbox access to read PTP clock timeMilena Olech
When the access to read PTP clock is specified as mailbox, the driver needs to send virtchnl message to perform PTP actions. Message is sent using idpf_mbq_opc_send_msg_to_peer_drv mailbox opcode, with the parameters received during PTP capabilities negotiation. Add functions to recognize PTP messages, move them to dedicated secondary mailbox, read the PTP clock time and cross timestamp using mailbox messages. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: negotiate PTP capabilities and get PTP clockMilena Olech
PTP capabilities are negotiated using virtchnl command. Add get capabilities function, direct access to read the PTP clock. Set initial PTP capabilities exposed to the stack. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Willem de Bruijn <willemb@google.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: move virtchnl structures to the header fileMilena Olech
Move virtchnl structures to the header file to expose them for the PTP virtchnl file. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16virtchnl: add PTP virtchnl definitionsMilena Olech
PTP capabilities are negotiated using virtchnl commands. There are two available modes of the PTP support: direct and mailbox. When the direct access to PTP resources is negotiated, virtchnl messages returns a set of registers that allow read/write directly. When the mailbox access to PTP resources is negotiated, virtchnl messages are used to access PTP clock and to read the timestamp values. Virtchnl API covers both modes and exposes a set of PTP capabilities. Using virtchnl API, the driver recognizes also HW abilities - maximum adjustment of the clock and the basic increment value. Additionally, API allows to configure the secondary mailbox, dedicated exclusively for PTP purposes. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add initial PTP supportMilena Olech
PTP feature is supported if the VIRTCHNL2_CAP_PTP is negotiated during the capabilities recognition. Initial PTP support includes PTP initialization and registration of the clock. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: change the method for mailbox workqueue allocationMilena Olech
Since workqueues are created per CPU, the works scheduled to this workqueues are run on the CPU they were assigned. It may result in overloaded CPU that is not able to handle virtchnl messages in relatively short time. Allocating workqueue with WQ_UNBOUND and WQ_HIGHPRI flags allows scheduler to queue virtchl messages on less loaded CPUs, what eliminates delays. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-01Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-04-29 (igb, igc, ixgbe, idpf) For igb: Kurt Kanzenbach adds linking of IRQs and queues to NAPI instances and adds persistent NAPI config. Lastly, he removes undesired IRQs that occur while busy polling. For igc: Kurt Kanzenbach switches the Tx mode for MQPRIO offload to harmonize the current implementation with TAPRIO. For ixgbe: Jedrzej adds separate ethtool ops for E610 devices to account for device differences. Slawomir adds devlink region support for E610 devices. For idpf: Mateusz assigns and utilizes the ptype field out of libeth_rqe_info. Michal removes unreachable code. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: idpf: remove unreachable code from setting mailbox idpf: assign extracted ptype to struct libeth_rqe_info field ixgbe: devlink: add devlink region support for E610 ixgbe: add E610 .set_phys_id() callback implementation ixgbe: apply different rules for setting FC on E610 ixgbe: add support for ACPI WOL for E610 ixgbe: create E610 specific ethtool_ops structure igc: Change Tx mode for MQPRIO offloading igc: Limit netdev_tc calls to MQPRIO igb: Get rid of spurious interrupts igb: Add support for persistent NAPI config igb: Link queues to NAPI instances igb: Link IRQs to NAPI instances ==================== Link: https://patch.msgid.link/20250429234651.3982025-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29idpf: remove unreachable code from setting mailboxMichal Swiatkowski
Remove code that isn't reached. There is no need to check for adapter->req_vec_chunks, because if it isn't set idpf_set_mb_vec_id() won't be called. Only one path when idpf_set_mb_vec_id() is called: idpf_intr_req() -> idpf_send_alloc_vectors_msg() -> adapter->req_vec_chunk is allocated here, otherwise an error is returned and idpf_intr_req() exits with an error. The idpf_set_mb_vec_id() becomes one-liner and it is called only once. Remove it and set mailbox vector index directly. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29idpf: assign extracted ptype to struct libeth_rqe_info fieldMateusz Polchlopek
Assign the ptype extracted from qword to the ptype field of struct libeth_rqe_info. Remove the now excess ptype param of idpf_rx_singleq_extract_fields(), idpf_rx_singleq_extract_base_fields() and idpf_rx_singleq_extract_flex_fields(). Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29idpf: protect shutdown from resetLarysa Zaremba
Before the referenced commit, the shutdown just called idpf_remove(), this way IDPF_REMOVE_IN_PROG was protecting us from the serv_task rescheduling reset. Without this flag set the shutdown process is vulnerable to HW reset or any other triggering conditions (such as default mailbox being destroyed). When one of conditions checked in idpf_service_task becomes true, vc_event_task can be rescheduled during shutdown, this leads to accessing freed memory e.g. idpf_req_rel_vector_indexes() trying to read vport->q_vector_idxs. This in turn causes the system to become defunct during e.g. systemctl kexec. Considering using IDPF_REMOVE_IN_PROG would lead to more heavy shutdown process, instead just cancel the serv_task before cancelling adapter->serv_task before cancelling adapter->vc_event_task to ensure that reset will not be scheduled while we are doing a shutdown. Fixes: 4c9106f4906a ("idpf: fix adapter NULL pointer dereference on reboot") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29idpf: fix potential memory leak on kcalloc() failureMichal Swiatkowski
In case of failing on rss_data->rss_key allocation the function is freeing vport without freeing earlier allocated q_vector_idxs. Fix it. Move from freeing in error branch to goto scheme. Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport") Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Suggested-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-28idpf: fix offloads support for encapsulated packetsMadhu Chittim
Split offloads into csum, tso and other offloads so that tunneled packets do not by default have all the offloads enabled. Stateless offloads for encapsulated packets are not yet supported in firmware/software but in the driver we were setting the features same as non encapsulated features. Fixed naming to clarify CSUM bits are being checked for Tx. Inherit netdev features to VLAN interfaces as well. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Zachary Goldstein <zachmgoldstein@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250425222636.3188441-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02idpf: fix adapter NULL pointer dereference on rebootEmil Tantilov
With SRIOV enabled, idpf ends up calling into idpf_remove() twice. First via idpf_shutdown() and then again when idpf_remove() calls into sriov_disable(), because the VF devices use the idpf driver, hence the same remove routine. When that happens, it is possible for the adapter to be NULL from the first call to idpf_remove(), leading to a NULL pointer dereference. echo 1 > /sys/class/net/<netif>/device/sriov_numvfs reboot BUG: kernel NULL pointer dereference, address: 0000000000000020 ... RIP: 0010:idpf_remove+0x22/0x1f0 [idpf] ... ? idpf_remove+0x22/0x1f0 [idpf] ? idpf_remove+0x1e4/0x1f0 [idpf] pci_device_remove+0x3f/0xb0 device_release_driver_internal+0x19f/0x200 pci_stop_bus_device+0x6d/0x90 pci_stop_and_remove_bus_device+0x12/0x20 pci_iov_remove_virtfn+0xbe/0x120 sriov_disable+0x34/0xe0 idpf_sriov_configure+0x58/0x140 [idpf] idpf_remove+0x1b9/0x1f0 [idpf] idpf_shutdown+0x12/0x30 [idpf] pci_device_shutdown+0x35/0x60 device_shutdown+0x156/0x200 ... Replace the direct idpf_remove() call in idpf_shutdown() with idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform the bulk of the cleanup, such as stopping the init task, freeing IRQs, destroying the vports and freeing the mailbox. This avoids the calls to sriov_disable() in addition to a small netdev cleanup, and destroying workqueues, which don't seem to be required on shutdown. Reported-by: Yuying Ma <yuma@redhat.com> Fixes: e850efed5e15 ("idpf: add module register and probe functionality") Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.15 net-next PR. No conflicts, adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c 919f9f497dbc ("eth: bnxt: fix out-of-range access of vnic_info array") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-18idpf: check error for register_netdev() on initEmil Tantilov
Current init logic ignores the error code from register_netdev(), which will cause WARN_ON() on attempt to unregister it, if there was one, and there is no info for the user that the creation of the netdev failed. WARNING: CPU: 89 PID: 6902 at net/core/dev.c:11512 unregister_netdevice_many_notify+0x211/0x1a10 ... [ 3707.563641] unregister_netdev+0x1c/0x30 [ 3707.563656] idpf_vport_dealloc+0x5cf/0xce0 [idpf] [ 3707.563684] idpf_deinit_task+0xef/0x160 [idpf] [ 3707.563712] idpf_vc_core_deinit+0x84/0x320 [idpf] [ 3707.563739] idpf_remove+0xbf/0x780 [idpf] [ 3707.563769] pci_device_remove+0xab/0x1e0 [ 3707.563786] device_release_driver_internal+0x371/0x530 [ 3707.563803] driver_detach+0xbf/0x180 [ 3707.563816] bus_remove_driver+0x11b/0x2a0 [ 3707.563829] pci_unregister_driver+0x2a/0x250 Introduce an error check and log the vport number and error code. On removal make sure to check VPORT_REG_NETDEV flag prior to calling unregister and free on the netdev. Add local variables for idx, vport_config and netdev for readability. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27idpf: fix checksums set in idpf_rx_rsc()Eric Dumazet
idpf_rx_rsc() uses skb_transport_offset(skb) while the transport header is not set yet. This triggers the following warning for CONFIG_DEBUG_NET=y builds. DEBUG_NET_WARN_ON_ONCE(!skb_transport_header_was_set(skb)) [ 69.261620] WARNING: CPU: 7 PID: 0 at ./include/linux/skbuff.h:3020 idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261629] Modules linked in: vfat fat dummy bridge intel_uncore_frequency_tpmi intel_uncore_frequency_common intel_vsec_tpmi idpf intel_vsec cdc_ncm cdc_eem cdc_ether usbnet mii xhci_pci xhci_hcd ehci_pci ehci_hcd libeth [ 69.261644] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Tainted: G S W 6.14.0-smp-DEV #1697 [ 69.261648] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN [ 69.261650] RIP: 0010:idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261677] ? __warn (kernel/panic.c:242 kernel/panic.c:748) [ 69.261682] ? idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261687] ? report_bug (lib/bug.c:?) [ 69.261690] ? handle_bug (arch/x86/kernel/traps.c:285) [ 69.261694] ? exc_invalid_op (arch/x86/kernel/traps.c:309) [ 69.261697] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) [ 69.261700] ? __pfx_idpf_vport_splitq_napi_poll (drivers/net/ethernet/intel/idpf/idpf_txrx.c:4011) idpf [ 69.261704] ? idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261708] ? idpf_vport_splitq_napi_poll (drivers/net/ethernet/intel/idpf/idpf_txrx.c:3072) idpf [ 69.261712] __napi_poll (net/core/dev.c:7194) [ 69.261716] net_rx_action (net/core/dev.c:7265) [ 69.261718] ? __qdisc_run (net/sched/sch_generic.c:293) [ 69.261721] ? sched_clock (arch/x86/include/asm/preempt.h:84 arch/x86/kernel/tsc.c:288) [ 69.261726] handle_softirqs (kernel/softirq.c:561) Fixes: 3a8845af66edb ("idpf: add RX splitq napi poll support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alan Brady <alan.brady@intel.com> Cc: Joshua Hay <joshua.a.hay@intel.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250226221253.1927782-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26idpf: use napi's irq affinityAhmed Zaki
Delete the driver CPU affinity info and use the core's napi config instead. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20250224232228.990783-6-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14libeth: move idpf_rx_csum_decoded and idpf_rx_extractedMateusz Polchlopek
Structs idpf_rx_csum_decoded and idpf_rx_extracted are used both in idpf and iavf Intel drivers. Change the prefix from idpf_* to libeth_* and move mentioned structs to libeth's rx.h header file. Adjust usage in idpf driver. Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-11idpf: call set_real_num_queues in idpf_openJoshua Hay
On initial driver load, alloc_etherdev_mqs is called with whatever max queue values are provided by the control plane. However, if the driver is loaded on a system where num_online_cpus() returns less than the max queues, the netdev will think there are more queues than are actually available. Only num_online_cpus() will be allocated, but skb_get_queue_mapping(skb) could possibly return an index beyond the range of allocated queues. Consequently, the packet is silently dropped and it appears as if TX is broken. Set the real number of queues during open so the netdev knows how many queues will be allocated. Fixes: 1c325aac10a8 ("idpf: configure resources for TX queues") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-11idpf: record rx queue in skb for RSC packetsSridhar Samudrala
Move the call to skb_record_rx_queue in idpf_rx_process_skb_fields() so that RX queue is recorded for RSC packets too. Fixes: 90912f9f4f2d ("idpf: convert header split mode to libeth + napi_build_skb()") Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-11idpf: fix handling rsc packet with a single segmentSridhar Samudrala
Handle rsc packet with a single segment same as a multi segment rsc packet so that CHECKSUM_PARTIAL is set in the skb->ip_summed field. The current code is passing CHECKSUM_NONE resulting in TCP GRO layer doing checksum in SW and hiding the issue. This will fail when using dmabufs as payload buffers as skb frag would be unreadable. Fixes: 3a8845af66ed ("idpf: add RX splitq napi poll support") Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24idpf: add more info during virtchnl transaction timeout/salt mismatchManoj Vishwanathan
Add more information related to the transaction like cookie, vc_op, salt when transaction times out and include similar information when transaction salt does not match. Info output for transaction timeout: ------------------- (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms) ------------------- before it was: ------------------- (op 5015, 60000ms) ------------------- Signed-off-by: Manoj Vishwanathan <manojvishy@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24idpf: convert workqueues to unboundMarco Leogrande
When a workqueue is created with `WQ_UNBOUND`, its work items are served by special worker-pools, whose host workers are not bound to any specific CPU. In the default configuration (i.e. when `queue_delayed_work` and friends do not specify which CPU to run the work item on), `WQ_UNBOUND` allows the work item to be executed on any CPU in the same node of the CPU it was enqueued on. While this solution potentially sacrifices locality, it avoids contention with other processes that might dominate the CPU time of the processor the work item was scheduled on. This is not just a theoretical problem: in a particular scenario misconfigured process was hogging most of the time from CPU0, leaving less than 0.5% of its CPU time to the kworker. The IDPF workqueues that were using the kworker on CPU0 suffered large completion delays as a result, causing performance degradation, timeouts and eventual system crash. Tested: * I have also run a manual test to gauge the performance improvement. The test consists of an antagonist process (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This process is run under `taskset 01` to bind it to CPU0, and its priority is changed with `chrt -pQ 9900 10000 ${pid}` and `renice -n -20 ${pid}` after start. Then, the IDPF driver is forced to prefer CPU0 by editing all calls to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0. Finally, `ktraces` for the workqueue events are collected. Without the current patch, the antagonist process can force arbitrary delays between `workqueue_queue_work` and `workqueue_execute_start`, that in my tests were as high as `30ms`. With the current patch applied, the workqueue can be migrated to another unloaded CPU in the same node, and, keeping everything else equal, the maximum delay I could see was `6us`. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Signed-off-by: Marco Leogrande <leogrande@google.com> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>