summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)Author
8 daysMerge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== libie: commonize adminq structure Michal Swiatkowski says: It is a prework to allow reusing some specific Intel code (eq. fwlog). Move common *_aq_desc structure to libie header and changing it in ice, ixgbe, i40e and iavf. Only generic adminq commands can be easily moved to common header, as rest is slightly different. Format remains the same. It will be better to correctly move it when it will be needed to commonize other part of the code. Move *_aq_str() to new libie module (libie_adminq) and use it across drivers. The functions are exactly the same in each driver. Some more adminq helpers/functions can be moved to libie_adminq when needed. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: use libie_aq_str iavf: use libie_aq_str ice: use libie_aq_str libie: add adminq helper for converting err to str iavf: use libie adminq descriptors i40e: use libie adminq descriptors ixgbe: use libie adminq descriptors ice, libie: move generic adminq descriptors to lib ==================== Link: https://patch.msgid.link/20250724182826.3758850-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysnet: Fix typosBjorn Helgaas
Fix typos in comments and error messages. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc8). Conflicts: drivers/net/ethernet/microsoft/mana/gdma_main.c 9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion") 755391121038 ("net: mana: Allocate MSI-X vectors dynamically") https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/icssg/icssg_prueth.h 6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG") ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysice: use libie_aq_strMichal Swiatkowski
Simple: s/ice_aq_str/libie_aq_str Add libie_aminq module in ice Kconfig. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 daysi40e: use libie adminq descriptorsMichal Swiatkowski
Use libie_aq_desc instead of i40e_aq_desc. Do needed changes to allow clean build. Get version descriptor is a little less detailed on i40e. To not mess up with shifting or union inside libie desc use get version descriptor from i40e. Move additional caps for i40e to libie. Fix RCT in declaration that is using libie_aq_desc; Use libie_aq_raw() wherever it can be used. The libie aq error is extended, cover it in ice driver just to clean build. In next patches the libie code for that will be used in each of intel driver. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
9 daysice, libie: move generic adminq descriptors to libMichal Swiatkowski
The descriptor structure is the same in ice, ixgbe and i40e. Move it to common libie header to use it across different driver. Leave device specific adminq commands in separate folders. This lead to a change that need to be done in filling/getting descriptor: - previous: struct specific_desc *cmd; cmd = &desc.params.specific_desc; - now: struct specific_desc *cmd; cmd = libie_aq_raw(&desc); Do this changes across the driver to allow clean build. The casting only have to be done in case of specific descriptors, for generic one union can still be used. Changes beside code moving: - change ICE_ prefix to LIBIE_ prefix (ice_ and libie_ too) - remove shift variables not otherwise needed (in libie_aq_flags) - fill/get descriptor data based on desc.params.raw whenever the descriptor isn't defined in libie - move defines from the libie_aq_sth structure outside - add libie_aq_raw helper and use it instead of explicit casting Reviewed by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
12 daysice: Fix a null pointer dereference in ice_copy_and_init_pkg()Haoxiang Li
Add check for the return value of devm_kmemdup() to prevent potential null pointer dereference. Fixes: c76488109616 ("ice: Implement Dynamic Device Personalization (DDP) download") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18ice: breakout common LAG code into helpersDave Ertman
In the VF handling code, parts of the code for lag can be broken out into helper functions to reduce code duplication. Break this code out into helper functions Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18ice: convert ice_add_prof() to bitmapJesse Brandeburg
Previously the ice_add_prof() took an array of u8 and looped over it with for_each_set_bit(), examining each 8 bit value as a bitmap. This was just hard to understand and unnecessary, and was triggering undefined behavior sanitizers with unaligned accesses within bitmap fields (on our internal tools/builds). Since the @ptype being passed in was already declared as a bitmap, refactor this to use native types with the advantage of simplifying the code to use a single loop. Co-developed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> CC: Jesse Brandeburg <jbrandeburg@cloudflare.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18ice: add E835 device IDsDawid Osuchowski
E835 is an enhanced version of the E830. It continues to use the same set of commands, registers and interfaces as other devices in the 800 Series. Following device IDs are added: - 0x1248: Intel(R) Ethernet Controller E835-CC for backplane - 0x1249: Intel(R) Ethernet Controller E835-CC for QSFP - 0x124A: Intel(R) Ethernet Controller E835-CC for SFP - 0x1261: Intel(R) Ethernet Controller E835-C for backplane - 0x1262: Intel(R) Ethernet Controller E835-C for QSFP - 0x1263: Intel(R) Ethernet Controller E835-C for SFP - 0x1265: Intel(R) Ethernet Controller E835-L for backplane - 0x1266: Intel(R) Ethernet Controller E835-L for QSFP - 0x1267: Intel(R) Ethernet Controller E835-L for SFP Reviewed-by: Konrad Knitter <konrad.knitter@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18ice: add 40G speed to Admin Command GET PORT OPTIONAleksandr Loktionov
Introduce the ICE_AQC_PORT_OPT_MAX_LANE_40G constant and update the code to process this new option in both the devlink and the Admin Queue Command GET PORT OPTION (opcode 0x06EA) message, similar to existing constants like ICE_AQC_PORT_OPT_MAX_LANE_50G, ICE_AQC_PORT_OPT_MAX_LANE_100G, and so on. This feature allows the driver to correctly report configuration options for 2x40G on E823 and other cards in the future via devlink. Example command: devlink port split pci/0000:01:00.0/0 count 2 Example dmesg: ice 0000:01:00.0: Available port split options and max port speeds (Gbps): ice 0000:01:00.0: Status Split Quad 0 Quad 1 ice 0000:01:00.0: count L0 L1 L2 L3 L4 L5 L6 L7 ice 0000:01:00.0: 2 40 - - - 40 - - - ice 0000:01:00.0: 2 50 - 50 - - - - - ice 0000:01:00.0: 4 25 25 25 25 - - - - ice 0000:01:00.0: 4 25 25 - - 25 25 - - ice 0000:01:00.0: Active 8 10 10 10 10 10 10 10 10 ice 0000:01:00.0: 1 100 - - - - - - - Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") af52020fc599 ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c a44312d58e78 ("net: phy: Don't register LEDs for genphy") f0f2b992d818 ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c 5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15ice: check correct pointer in fwlog debugfsMichal Swiatkowski
pf->ice_debugfs_pf_fwlog should be checked for an error here. Fixes: 96a9a9341cda ("ice: configure FW logging") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-15ice: add NULL check in eswitch lag checkDave Ertman
The function ice_lag_is_switchdev_running() is being called from outside of the LAG event handler code. This results in the lag->upper_netdev being NULL sometimes. To avoid a NULL-pointer dereference, there needs to be a check before it is dereferenced. Fixes: 776fe19953b0 ("ice: block default rule setting on LAG interface") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: introduce ice_get_vf_by_dev() wrapperJacob Keller
The ice_get_vf_by_id() function is used to obtain a reference to a VF structure based on its ID. The ice_sriov_set_msix_vec_count() function needs to get a VF reference starting from the VF PCI device, and uses pci_iov_vf_id() to get the VF ID. This pattern is currently uncommon in the ice driver. However, the live migration module will introduce many more such locations. Add a helper wrapper ice_get_vf_by_dev() which takes the VF PCI device and calls ice_get_vf_by_id() using pci_iov_vf_id() to get the VF ID. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: avoid rebuilding if MSI-X vector count is unchangedJacob Keller
Commit 05c16687e0cc ("ice: set MSI-X vector count on VF") added support to change the vector count for VFs as part of ice_sriov_set_msix_vec_count(). This function modifies and rebuilds the target VF with the requested number of MSI-X vectors. Future support for live migration will add a call to ice_sriov_set_msix_vec_count() to ensure that a migrated VF has the proper MSI-X vector count. In most cases, this request will be to set the MSI-X vector count to its current value. In that case, no work is necessary. Rather than requiring the caller to check this, update the function to check and exit early if the vector count is already at the requested value. This avoids an unnecessary VF rebuild. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: use pci_iov_vf_id() to get VF IDJacob Keller
The ice_sriov_set_msix_vec_count() obtains the VF device ID in a strange way by iterating over the possible VF IDs and calling pci_iov_virtfn_devfn to calculate the device and function combos and compare them to the pdev->devfn. This is unnecessary. The pci_iov_vf_id() helper already exists which does the reverse calculation of pci_iov_virtfn_devfn(), which is much simpler and avoids the loop construction. Use this instead. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: expose VF functions used by live migrationJacob Keller
The live migration process will require configuring the target VF with the data provided from the source host. A few helper functions in ice_sriov.c and ice_virtchnl.c will be needed for this process, but are currently static. Expose these functions in their respective headers so that the live migration module can use them during the migration process. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: move ice_vsi_update_l2tsel to ice_lib.cJacob Keller
A future change is going to need to call ice_vsi_update_l2tsel from a new context outside of ice_virtchnl.c Since this function deals with a generic VSI, move it into ice_lib.c to enable calling it from other places in the ice driver. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: save RSS hash configuration for migrationJacob Keller
The VF can program the RSS hash configuration over virtchnl. It does this by sending a u64 bitmask which represents the current hash configuration. It is not trivial to reverse the hardware configuration back to this hash set for migration. Instead, save the value to the ice_vf structure when its modified by the VF. The rss_hashcfg value is an 8-byte field. Make room for it in ice_vf by re-arranging some of the existing fields. There is a 4-byte gap after the first_vector_idx, and a 4-byte gap between max_tx_rate and vf_states. Move first_vector_idx into the later 4-byte gap, creating an 8 byte area where rss_hashcfg can be placed. Also move the num_msix field near min_tx_rate, filling 2 bytes of a 3 byte hole. The end result of these changes enables placing the rss_hashcfg field into the structure while also saving 8 bytes in size. It looks like there are a handful of more possible cleanups to reduce the size even further, but those have been left as a future cleanup. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: add functions to get and set Tx queue contextJacob Keller
The live migration driver will need to save and restore the Tx queue context state from the hardware registers. This state contains both static fields which do not change during Tx traffic as well as dynamic fields which may change during Tx traffic. Unlike the Rx context, the Tx queue context is accessed indirectly from GLCOMM_QTX_CNTX_CTL and GLCOMM_QTX_CNTX_DATA registers. These registers are shared by multiple PFs on the same PCIe card. Multiple PFs cannot safely access the registers simultaneously, and there is no hardware semaphore or logic to control access. To handle this, introduce the txq_ctx_lock to the ice_adapter structure. This is similar to the ptp_gltsyn_time_lock. All PFs on the same adapter share this structure, and use it to serialize access to the registers to prevent error. Add a new functions to get and set the Tx queue context through the GLCOMM_QTX_CNTX_CTL interface. The hardware context values are stored in the registers using the same packed format as the Admin Queue buffer. The hardware buffer is 40 bytes wide, as it contains an additional 18 bytes of internal state not sent with the Admin Queue buffer. For this reason, a separate typedef and packing function must be used. We can share the same packed fields definitions because we never need to unpack the internal state. This is preferred, as it ensures the internal state is zero'd when writing into HW, and avoids issues with reading by u32 registers into a buffer of 22 bytes in length. Thanks to the typedefs, misuse of the API with the wrong size buffer can easily be caught at compile time. Note reading this data from hardware is essential because the current Tx queue context may be different from the context as initially programmed by the driver during VF initialization. When migrating a VF we must ensure the target VF has identical context as the source VF did. Co-developed-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-10ice: add support for reading and unpacking Rx queue contextJacob Keller
In order to support live migration, the ice driver will need to read certain data from the Rx queue context. This is stored in the hardware in a packed format. Since we use <linux/packing.h> for the mapping between the packed hardware format and the unpacked structure, it is trivial to enable unpacking support via the unpack_fields() function. Add the ice_unpack_rxq_ctx() function based on the unpack_fields() API. Re-use the same field definitions from the packing implementation. Add ice_copy_rxq_ctx_from_hw() to copy the Rx queue context data from the hardware registers. Use these to implement ice_read_rxq_ctx() which will return the Rx queue context to the caller in its unpacked ice_rlan_ctx struct. This will enable the migration logic access to the relevant data about the Rx device queues. It can easily be copied to the target system as part of the migration payload, where it will be used to configure the Rx queues. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-08eth: ice: drop the dead code related to rss_contextsJakub Kicinski
ICE appears to have some odd form of rss_context use plumbed in for .get_rxfh. The .set_rxfh side does not support creating contexts, however, so this must be dead code. For at least a year now (since commit 7964e7884643 ("net: ethtool: use the tracking array for get_rxfh on custom RSS contexts")) we have not been calling .get_rxfh with a non-zero rss_context. We just get the info from the RSS XArray under dev->ethtool. Remove what must be dead code in the driver, clear the support flags. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250707184115.2285277-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-03ice: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the Intel ice driver to the new API, so that timestamping configuration can be removed from the ndo_eth_ioctl() path completely. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Milena Olech <milena.olech@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-27ice: add ref-sync dpll pinsArkadiusz Kubalewski
Implement reference sync input pin get/set callbacks, allow user space control over dpll pin pairs capable of reference sync support. Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20250626135219.1769350-4-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ice: default to TIME_REF instead of TXCO on E825-CJacob Keller
The driver currently defaults to the internal oscillator as the clock source for E825-C hardware. While this clock source is labeled TCXO, indicating a temperature compensated oscillator, this is only true for some board designs. Many board designs have a less capable oscillator. The E825-C hardware may also have its clock source set to the TIME_REF pin. This pin is connected to the DPLL and is often a more stable clock source. The choice of the internal oscillator is not suitable for all systems, especially those which want to enable SyncE support. There is currently no interface available for users to configure the clock source. Other variants of the E82x board have the clock source configured in the NVM, but E825-C lacks this capability, so different board designs cannot select a different default clock via firmware. In most setups, the TIME_REF is a suitable default clock source. Additionally, we now fall back to the internal oscillator automatically if the TIME_REF clock source cannot be locked. Change the default clock source for E825-C to TIME_REF. Note that the driver logs a dev_dbg message upon configuring the TSPLL which includes the clock source and frequency. This can be enabled to confirm which clock source is in use. Longterm a proper interface to dynamically introspect and change the clock source will be designed (perhaps some extension of the DPLL subsystem?) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-26ice: move TSPLL init calls to ice_ptp.cKarol Kolacinski
Initialize TSPLL after initializing PHC in ice_ptp.c instead of calling for each product in PHC init in ice_ptp_hw.c. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-26ice: fall back to TCXO on TSPLL lock failKarol Kolacinski
TSPLL can fail when trying to lock to TIME_REF as a clock source, e.g. when the external clock source is not stable or connected to the board. To continue operation after failure, try to lock again to internal TCXO and inform user about this. Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-26ice: wait before enabling TSPLLKarol Kolacinski
To ensure proper operation, wait for 10 to 20 microseconds before enabling TSPLL. Adjust wait time after enabling TSPLL from 1-5 ms to 1-2 ms. Those values are empirical and tested on multiple HW configurations. Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-26ice: add multiple TSPLL helpersKarol Kolacinski
Add helpers for checking TSPLL params, disabling sticky bits, configuring TSPLL and getting default clock frequency to simplify the code flows. Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-26ice: use bitfields instead of unions for CGU regsKarol Kolacinski
Switch from unions with bitfield structs to definitions with bitfield masks. This is necessary, because some registers have different field definitions or even use a different register for the same fields based on HW type. Remove unused register fields. Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-26ice: read TSPLL registers again before reporting statusJacob Keller
After programming the TSPLL, re-read the registers before reporting status. This ensures the debug log message will show what was actually programmed, rather than relying on a cached value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-26ice: clear time_sync_en field for E825-C during reprogrammingJacob Keller
When programming the Clock Generation Unit for E285-C hardware, we need to clear the time_sync_en bit of the DWORD 9 before we set the frequency. Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-21Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Separate TSPLL from PTP and clean up [part] Jake Keller says: Separate TSPLL related functions and definitions from all PTP-related files and clean up the code by implementing multiple helpers. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: add TSPLL log config helper ice: use designated initializers for TSPLL consts ice: remove ice_tspll_params_e825 definitions ice: fix E825-C TSPLL register definitions ice: rename TSPLL and CGU functions and definitions ice: move TSPLL functions to a separate file ==================== Link: https://patch.msgid.link/20250618174231.3100231-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18udp_tunnel: remove rtnl_lock dependencyStanislav Fomichev
Drivers that are using ops lock and don't depend on RTNL lock still need to manage it because udp_tunnel's RTNL dependency. Introduce new udp_tunnel_nic_lock and use it instead of rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to grab udp_tunnel_nic_lock mutex and might sleep). Cover more places in v4: - netlink - udp_tunnel_notify_add_rx_port (ndo_open) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_notify_del_rx_port (ndo_stop) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_get_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - udp_tunnel_drop_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_DROP_INFO - udp_tunnel_nic_reset_ntf (ndo_open) - notifiers - udp_tunnel_nic_netdevice_event, depending on the event: - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - triggers NETDEV_UDP_TUNNEL_DROP_INFO - ethnl_tunnel_info_reply_size - udp_tunnel_nic_set_port_priv (two intel drivers) Cc: Michael Chan <michael.chan@broadcom.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18ice: add TSPLL log config helperKarol Kolacinski
Add a helper function to print new/current TSPLL config. This helps avoid unnecessary casts from u8 to enums. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-18ice: use designated initializers for TSPLL constsKarol Kolacinski
Instead of multiple comments, use designated initializers for TSPLL consts. Adjust ice_tspll_params_e82x fields sizes. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-18ice: remove ice_tspll_params_e825 definitionsKarol Kolacinski
Remove ice_tspll_params_e825 definitions as according to EDS (Electrical Design Specification) doc, E825 devices support only 156.25 MHz TSPLL frequency for both TCXO and TIME_REF clock source. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-18ice: fix E825-C TSPLL register definitionsJacob Keller
The E825-C hardware has a slightly different register layout for register 19 of the Clock Generation Unit and TSPLL. The fbdiv_intgr value can be 10 bits wide. Additionally, most of the fields that were in register 24 are made available in register 23 instead. The programming logic already has a corrected definition for register 23, but it incorrectly still used the 8-bit definition of fbdiv_intgr. This results in truncating some of the values of fbdiv_intgr, including the value used for the 156.25MHz signal. The driver only used register 24 to obtain the enable status, which we should read from register 23. This results in an incorrect output for the log messages, but does not change any functionality besides disabled-by-default dynamic debug messages. Fix the register definitions, and adjust the code to properly reflect the enable/disable status in the log messages. Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-18ice: rename TSPLL and CGU functions and definitionsKarol Kolacinski
Rename TSPLL and CGU functions, definitions etc. to match the file name and have consistent naming scheme. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-18ice: move TSPLL functions to a separate fileKarol Kolacinski
Collect TSPLL related functions and definitions and move them to a separate file to have all TSPLL functionality in one place. Move CGU related functions and definitions to ice_common.* Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17ice: fix eswitch code memory leak in reset scenarioGrzegorz Nitka
Add simple eswitch mode checker in attaching VF procedure and allocate required port representor memory structures only in switchdev mode. The reset flows triggers VF (if present) detach/attach procedure. It might involve VF port representor(s) re-creation if the device is configured is switchdev mode (not legacy one). The memory was blindly allocated in current implementation, regardless of the mode and not freed if in legacy mode. Kmemeleak trace: unreferenced object (percpu) 0x7e3bce5b888458 (size 40): comm "bash", pid 1784, jiffies 4295743894 hex dump (first 32 bytes on cpu 45): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): pcpu_alloc_noprof+0x4c4/0x7c0 ice_repr_create+0x66/0x130 [ice] ice_repr_create_vf+0x22/0x70 [ice] ice_eswitch_attach_vf+0x1b/0xa0 [ice] ice_reset_all_vfs+0x1dd/0x2f0 [ice] ice_pci_err_resume+0x3b/0xb0 [ice] pci_reset_function+0x8f/0x120 reset_store+0x56/0xa0 kernfs_fop_write_iter+0x120/0x1b0 vfs_write+0x31c/0x430 ksys_write+0x61/0xd0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e Testing hints (ethX is PF netdev): - create at least one VF echo 1 > /sys/class/net/ethX/device/sriov_numvfs - trigger the reset echo 1 > /sys/class/net/ethX/device/reset Fixes: 415db8399d06 ("ice: make representor code generic") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-17net: ice: Perform accurate aRFS flow matchKrishna Kumar
This patch fixes an issue seen in a large-scale deployment under heavy incoming pkts where the aRFS flow wrongly matches a flow and reprograms the NIC with wrong settings. That mis-steering causes RX-path latency spikes and noisy neighbor effects when many connections collide on the same hash (some of our production servers have 20-30K connections). set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by hashing the skb sized by the per rx-queue table size. This results in multiple connections (even across different rx-queues) getting the same hash value. The driver steer function modifies the wrong flow to use this rx-queue, e.g.: Flow#1 is first added: Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10 Later when a new flow needs to be added: Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20 The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This results in both flows getting un-optimized - packets for Flow#1 goes to q#20, and then reprogrammed back to q#10 later and so on; and Flow #2 programming is never done as Flow#1 is matched first for all misses. Many flows may wrongly share the same hash and reprogram rules of the original flow each with their own q#. Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s 72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS, enable aRFS (global value is 144 * rps_flow_cnt). Test notes about results from ice_rx_flow_steer(): --------------------------------------------------- 1. "Skip:" counter increments here: if (fltr_info->q_index == rxq_idx || arfs_entry->fltr_state != ICE_ARFS_ACTIVE) goto out; 2. "Add:" counter increments here: ret = arfs_entry->fltr_info.fltr_id; INIT_HLIST_NODE(&arfs_entry->list_entry); 3. "Update:" counter increments here: /* update the queue to forward to on an already existing flow */ Runtime comparison: original code vs with the patch for different rps_flow_cnt values. +-------------------------------+--------------+--------------+ | rps_flow_cnt | 512 | 2048 | +-------------------------------+--------------+--------------+ | Ratio of Pkts on Good:Bad q's | 214 vs 822K | 1.1M vs 980K | | Avoid wrong aRFS programming | 0 vs 310K | 0 vs 30K | | CPU User | 216 vs 183 | 216 vs 206 | | CPU System | 1441 vs 1171 | 1447 vs 1320 | | CPU Softirq | 1245 vs 920 | 1238 vs 961 | | CPU Total | 29 vs 22.7 | 29 vs 24.9 | | aRFS Update | 533K vs 59 | 521K vs 32 | | aRFS Skip | 82M vs 77M | 7.2M vs 4.5M | +-------------------------------+--------------+--------------+ A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections showed no performance degradation. Some points on the patch/aRFS behavior: 1. Enabling full tuple matching ensures flows are always correctly matched, even with smaller hash sizes. 2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs and fewer calls to driver for programming on misses. 3. Larger hash tables reduces mis-steering due to more unique flow hashes, but still has clashes. However, with larger per-device rps_flow_cnt, old flows take more time to expire and new aRFS flows cannot be added if h/w limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt pkts have been processed by this cpu that are not part of the flow). Fixes: 28bf26724fdb0 ("ice: Implement aRFS") Signed-off-by: Krishna Kumar <krikku@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-16eth: ice: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). I'm deleting all the boilerplate kdoc from the affected functions. It is somewhere between pointless and incorrect, just a burden for people refactoring the code. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-13ice: add phase offset monitor for all PPS dpll inputsArkadiusz Kubalewski
Implement a new admin command and helper function to handle and obtain CGU measurements for input pins. Add new callback operations to control the dpll device-level feature "phase offset monitor," allowing it to be enabled or disabled. If the feature is enabled, provide users with measured phase offsets and notifications. Initialize PPS DPLL with new callback operations if the feature is supported by the firmware. Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250612152835.1703397-4-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc2). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12Merge tag 'net-6.16-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth and wireless. Current release - regressions: - af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD Current release - new code bugs: - eth: airoha: correct enable mask for RX queues 16-31 - veth: prevent NULL pointer dereference in veth_xdp_rcv when peer disappears under traffic - ipv6: move fib6_config_validate() to ip6_route_add(), prevent invalid routes Previous releases - regressions: - phy: phy_caps: don't skip better duplex match on non-exact match - dsa: b53: fix untagged traffic sent via cpu tagged with VID 0 - Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused transient packet loss, exact reason not fully understood, yet Previous releases - always broken: - net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6) - sched: sfq: fix a potential crash on gso_skb handling - Bluetooth: intel: improve rx buffer posting to avoid causing issues in the firmware - eth: intel: i40e: make reset handling robust against multiple requests - eth: mlx5: ensure FW pages are always allocated on the local NUMA node, even when device is configure to 'serve' another node - wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850, prevent kernel crashes - wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request() for 3 sec if fw_stats_done is not set" * tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits) selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context net: ethtool: Don't check if RSS context exists in case of context 0 af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD. ipv6: Move fib6_config_validate() to ip6_route_add(). net: drv: netdevsim: don't napi_complete() from netpoll net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get() veth: prevent NULL pointer dereference in veth_xdp_rcv net_sched: remove qdisc_tree_flush_backlog() net_sched: ets: fix a race in ets_qdisc_change() net_sched: tbf: fix a race in tbf_change() net_sched: red: fix a race in __red_change() net_sched: prio: fix a race in prio_tune() net_sched: sch_sfq: reject invalid perturb period net: phy: phy_caps: Don't skip better duplex macth on non-exact match MAINTAINERS: Update Kuniyuki Iwashima's email address. selftests: net: add test case for NAT46 looping back dst net: clear the dst when changing skb protocol net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper net/mlx5e: Fix leak of Geneve TLV option object net/mlx5: HWS, make sure the uplink is the last destination ...
2025-06-10ice/ptp: fix crosstimestamp reportingAnton Nadezhdin
Set use_nsecs=true as timestamp is reported in ns. Lack of this result in smaller timestamp error window which cause error during phc2sys execution on E825 NICs: phc2sys[1768.256]: ioctl PTP_SYS_OFFSET_PRECISE: Invalid argument This problem was introduced in the cited commit which omitted setting use_nsecs to true when converting the ice driver to use convert_base_to_cs(). Testing hints (ethX is PF netdev): phc2sys -s ethX -c CLOCK_REALTIME -O 37 -m phc2sys[1769.256]: CLOCK_REALTIME phc offset -5 s0 freq -0 delay 0 Fixes: d4bea547ebb57 ("ice/ptp: Remove convert_art_to_tsc()") Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-09ice: add a separate Rx handler for flow director commandsMichal Kubiak
The "ice" driver implementation uses the control VSI to handle the flow director configuration for PFs and VFs. Unfortunately, although a separate VSI type was created to handle flow director queues, the Rx queue handler was shared between the flow director and a standard NAPI Rx handler. Such a design approach was not very flexible. First, it mixed hotpath and slowpath code, blocking their further optimization. It also created a huge overkill for the flow director command processing, which is descriptor-based only, so there is no need to allocate Rx data buffers. For the above reasons, implement a separate Rx handler for the control VSI. Also, remove from the NAPI handler the code dedicated to configuring the flow director rules on VFs. Do not allocate Rx data buffers to the flow director queues because their processing is descriptor-based only. Finally, allow Rx data queues to be allocated only for VSIs that have netdev assigned to them. This handler splitting approach is the first step in converting the driver to use the Page Pool (which can only be used for data queues). Test hints: 1. Create a VF for any PF managed by the ice driver. 2. In a loop, add and delete flow director rules for the VF, e.g.: for i in {1..128}; do q=$(( i % 16 )) ethtool -N ens802f0v0 flow-type tcp4 dst-port "$i" action "$q" done for i in {0..127}; do ethtool -N ens802f0v0 delete "$i" done Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Suggested-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>