summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice/ice_main.c
AgeCommit message (Collapse)Author
2025-03-07ice: add E830 HW VF mailbox message limit supportPaul Greenwalt
[ Upstream commit 59f4d59b25aec39a015c0949f4ec235c7a839c44 ] E830 adds hardware support to prevent the VF from overflowing the PF mailbox with VIRTCHNL messages. E830 will use the hardware feature (ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf(). To prevent a VF from overflowing the PF, the PF sets the number of messages per VF that can be in the PF's mailbox queue (ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF, the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG register. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 79990cf5e7ad ("ice: Fix deinitializing VF in error path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07ice: Add E830 device IDs, MAC type and registersPaul Greenwalt
[ Upstream commit ba1124f58afd37d9ff155d4ab7c9f209346aaed9 ] E830 is the 200G NIC family which uses the ice driver. Add specific E830 registers. Embed macros to use proper register based on (hw)->mac_type & name those macros to [ORIGINAL]_BY_MAC(hw). Registers only available on one of the macs will need to be explicitly referred to as E800_NAME instead of just NAME. PTP is not yet supported. Co-developed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Co-developed-by: Scott Taylor <scott.w.taylor@intel.com> Signed-off-by: Scott Taylor <scott.w.taylor@intel.com> Co-developed-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231025214157.1222758-2-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 79990cf5e7ad ("ice: Fix deinitializing VF in error path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17ice: Flush FDB entries before resetWojciech Drewek
[ Upstream commit fbcb968a98ac0b71f5a2bda2751d7a32d201f90d ] Triggering the reset while in switchdev mode causes errors[1]. Rules are already removed by this time because switch content is flushed in case of the reset. This means that rules were deleted from HW but SW still thinks they exist so when we get SWITCHDEV_FDB_DEL_TO_DEVICE notification we try to delete not existing rule. We can avoid these errors by clearing the rules early in the reset flow before they are removed from HW. Switchdev API will get notified that the rule was removed so we won't get SWITCHDEV_FDB_DEL_TO_DEVICE notification. Remove unnecessary ice_clear_sw_switch_recipes. [1] ice 0000:01:00.0: Failed to delete FDB forward rule, err: -2 ice 0000:01:00.0: Failed to delete FDB guard rule, err: -2 Fixes: 7c945a1a8e5f ("ice: Switchdev FDB events support") Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17ice: Fix netif_is_ice() in Safe ModeMarcin Szycik
[ Upstream commit 8e60dbcbaaa177dacef55a61501790e201bf8c88 ] netif_is_ice() works by checking the pointer to netdev ops. However, it only checks for the default ice_netdev_ops, not ice_netdev_safe_mode_ops, so in Safe Mode it always returns false, which is unintuitive. While it doesn't look like netif_is_ice() is currently being called anywhere in Safe Mode, this could change and potentially lead to unexpected behaviour. Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12ice: do not bring the VSI up, if it was down before the XDP setupLarysa Zaremba
[ Upstream commit 04c7e14e5b0b6227e7b00d7a96ca2f2426ab9171 ] After XDP configuration is completed, we bring the interface up unconditionally, regardless of its state before the call to .ndo_bpf(). Preserve the information whether the interface had to be brought down and later bring it up only in such case. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12ice: protect XDP configuration with a mutexLarysa Zaremba
[ Upstream commit 2504b8405768a57a71e660dbfd5abd59f679a03f ] The main threat to data consistency in ice_xdp() is a possible asynchronous PF reset. It can be triggered by a user or by TX timeout handler. XDP setup and PF reset code access the same resources in the following sections: * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked * ice_vsi_rebuild() for the PF VSI - not protected * ice_vsi_open() - already rtnl-locked With an unfortunate timing, such accesses can result in a crash such as the one below: [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18 [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset successful [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [ +0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0 P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036] Workqueue: ice ice_service_task [ice] [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace: [ +0.000016] <TASK> [ +0.000014] ? __die+0x1f/0x70 [ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ? schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022] ? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice] [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145] ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [ +0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180 [ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824] process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [ +0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ? __pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [ +0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616] ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [ +0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604] </TASK> The previous way of handling this through returning -EBUSY is not viable, particularly when destroying AF_XDP socket, because the kernel proceeds with removal anyway. There is plenty of code between those calls and there is no need to create a large critical section that covers all of them, same as there is no need to protect ice_vsi_rebuild() with rtnl_lock(). Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp(). Leaving unprotected sections in between would result in two states that have to be considered: 1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but not yet open The latter case is actually already handled through !netif_running() case, we just need to adjust flag checking a little. The former one is not as trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction happens, this can make adding/deleting rings exit with an error. Luckily, VSI rebuild is pending and can apply new configuration for us in a managed fashion. Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to indicate that ice_xdp() can just hot-swap the program. Also, as ice_vsi_rebuild() flow is touched in this patch, make it more consistent by deconfiguring VSI when coalesce allocation fails. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12ice: Add netif_device_attach/detach into PF reset flowDawid Osuchowski
[ Upstream commit d11a67634227f9f9da51938af085fb41a733848f ] Ethtool callbacks can be executed while reset is in progress and try to access deleted resources, e.g. getting coalesce settings can result in a NULL pointer dereference seen below. Reproduction steps: Once the driver is fully initialized, trigger reset: # echo 1 > /sys/class/net/<interface>/device/reset when reset is in progress try to get coalesce settings using ethtool: # ethtool -c <interface> BUG: kernel NULL pointer dereference, address: 0000000000000020 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7 RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice] RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206 RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000 R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40 FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0 Call Trace: <TASK> ice_get_coalesce+0x17/0x30 [ice] coalesce_prepare_data+0x61/0x80 ethnl_default_doit+0xde/0x340 genl_family_rcv_msg_doit+0xf2/0x150 genl_rcv_msg+0x1b3/0x2c0 netlink_rcv_skb+0x5b/0x110 genl_rcv+0x28/0x40 netlink_unicast+0x19c/0x290 netlink_sendmsg+0x222/0x490 __sys_sendto+0x1df/0x1f0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x82/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7faee60d8e27 Calling netif_device_detach() before reset makes the net core not call the driver when ethtool command is issued, the attempt to execute an ethtool command during reset will result in the following message: netlink error: No such device instead of NULL pointer dereference. Once reset is done and ice_rebuild() is executing, the netif_device_attach() is called to allow for ethtool operations to occur again in a safe manner. Fixes: fcea6f3da546 ("ice: Add stats and ethtool support") Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12ice: Check all ice_vsi_rebuild() errors in functionEric Joyner
[ Upstream commit d47bf9a495cf424fad674321d943123dc12b926d ] Check the return value from ice_vsi_rebuild() and prevent the usage of incorrectly configured VSI. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14ice: Fix reset handlerGrzegorz Nitka
[ Upstream commit 25a7123579ecac9a89a7e5b8d8a580bee4b68acd ] Synchronize OICR IRQ when preparing for reset to avoid potential race conditions between the reset procedure and OICR Fixes: 4aad5335969f ("ice: add individual interrupt allocation") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05ice: Rebuild TC queues on VSI queue reconfigurationJan Sokolowski
[ Upstream commit f4b91c1d17c676b8ad4c6bd674da874f3f7d5701 ] TC queues needs to be correctly updated when the number of queues on a VSI is reconfigured, so netdev's queue and TC settings will be dynamically adjusted and could accurately represent the underlying hardware state after changes to the VSI queue counts. Fixes: 0754d65bd4be ("ice: Add infrastructure for mqprio support via ndo_setup_tc") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27ice: avoid IRQ collision to fix init failure on ACPI S3 resumeEn-Wei Wu
[ Upstream commit bc69ad74867dba1377abe14356c94a946d9837a3 ] A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes that irdma would break and report hardware initialization failed after suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). The problem is caused due to the collision between the irq numbers requested in irdma and the irq numbers requested in other drivers after suspend/resume. The irq numbers used by irdma are derived from ice's ice_pf->msix_entries which stores mappings between MSI-X index and Linux interrupt number. It's supposed to be cleaned up when suspend and rebuilt in resume but it's not, causing irdma using the old irq numbers stored in the old ice_pf->msix_entries to request_irq() when resume. And eventually collide with other drivers. This patch fixes this problem. On suspend, we call ice_deinit_rdma() to clean up the ice_pf->msix_entries (and free the MSI-X vectors used by irdma if we've dynamically allocated them). On resume, we call ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the MSI-X vectors if we would like to dynamically allocate them). Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Tested-by: Cyrus Lien <cyrus.lien@canonical.com> Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-21ice: add flag to distinguish reset from .ndo_bpf in XDP rings configLarysa Zaremba
[ Upstream commit 744d197162c2070a6045a71e2666ed93a57cc65d ] Commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in the rebuild process. The behaviour of the XDP rings config functions is context-dependent, so the change of order has led to ice_destroy_xdp_rings() doing additional work and removing XDP prog, when it was supposed to be preserved. Also, dependency on the PF state reset flags creates an additional, fortunately less common problem: * PFR is requested e.g. by tx_timeout handler * .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(), but reset flag is set, so rings are destroyed without deleting the program * ice_vsi_rebuild tries to delete non-existent XDP rings, because the program is still on the VSI * system crashes With a similar race, when requested to attach a program, ice_prepare_xdp_rings() can actually skip setting the program in the VSI and nevertheless report success. Instead of reverting to the old order of function calls, add an enum argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in order to distinguish between calls from rebuild and .ndo_bpf(). Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26ice: fix stats being updated by way too large valuesPrzemek Kitszel
[ Upstream commit 257310e998700e60382fbd3f4fd275fdbd9b2aaf ] Simplify stats accumulation logic to fix the case where we don't take previous stat value into account, we should always respect it. Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of magnitude too big during OpenStack reconfiguration events, possibly other reconfiguration cases too. The regression was reported to be between 6.1 and 6.2, so I was almost certain that on of the two "preserve stats over reset" commits were the culprit. While reading the code, it was found that in some cases we will increase the stats by arbitrarily large number (thanks to ignoring "-prev" part of condition, after zeroing it). Note that this fixes also the case where we were around limits of u64, but that was not the regression reported. Full disclosure: I remember suggesting this particular piece of code to Ben a few years ago, so blame on me. Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset") Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com> Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com Reported-by: Christian Rohmann <christian.rohmann@inovex.de> Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-15net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink()Rand Deeb
[ Upstream commit 06e456a05d669ca30b224b8ed962421770c1496c ] The function ice_bridge_setlink() may encounter a NULL pointer dereference if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently in nla_for_each_nested(). To address this issue, add a check to ensure that br_spec is not NULL before proceeding with the nested attribute iteration. Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Signed-off-by: Rand Deeb <rand.sec96@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-10ice: Shut down VSI with "link-down-on-close" enabledNgai-Mint Kwan
[ Upstream commit 6d05ff55ef4f4954d28551236239f297bd52ea48 ] Disabling netdev with ethtool private flag "link-down-on-close" enabled can cause NULL pointer dereference bug. Shut down VSI regardless of "link-down-on-close" state. Fixes: 8ac7132704f3 ("ice: Fix interface being down after reset with link-down-on-close flag on") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-10ice: Fix link_down_on_close messageKatarzyna Wieczerzycka
[ Upstream commit 6a8d8bb55e7001de2d50920381cc858f3a3e9fb7 ] The driver should not report an error message when for a medialess port the link_down_on_close flag is enabled and the physical link cannot be set down. Fixes: 8ac7132704f3 ("ice: Fix interface being down after reset with link-down-on-close flag on") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Katarzyna Wieczerzycka <katarzyna.wieczerzycka@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-13ice: Fix safe mode when DDP is missingMateusz Pacuszka
One thing is broken in the safe mode, that is ice_deinit_features() is being executed even that ice_init_features() was not causing stack trace during pci_unregister_driver(). Add check on the top of the function. Fixes: 5b246e533d01 ("ice: split probe into smaller functions") Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Link: https://lore.kernel.org/r/20231011233334.336092-4-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13ice: reset first in crash dump kernelsJesse Brandeburg
When the system boots into the crash dump kernel after a panic, the ice networking device may still have pending transactions that can cause errors or machine checks when the device is re-enabled. This can prevent the crash dump kernel from loading the driver or collecting the crash data. To avoid this issue, perform a function level reset (FLR) on the ice device via PCIe config space before enabling it on the crash kernel. This will clear any outstanding transactions and stop all queues and interrupts. Restore the config space after the FLR, otherwise it was found in testing that the driver wouldn't load successfully. The following sequence causes the original issue: - Load the ice driver with modprobe ice - Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs - Trigger a crash with echo c > /proc/sysrq-trigger - Load the ice driver again (or let it load automatically) with modprobe ice - The system crashes again during pcim_enable_device() Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Reported-by: Vishal Agrawal <vagrawal@redhat.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Link: https://lore.kernel.org/r/20231011233334.336092-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-08-17 (ice) This series contains updates to ice driver only. Jan removes unused functions and refactors code to make, possible, functions static. Jake rearranges some functions to be logically grouped. Marcin removes an unnecessary call to disable VLAN stripping. Yang Yingliang utilizes list_for_each_entry() helper for a couple list traversals. Przemek removes some parameters from ice_aq_alloc_free_res() which were always the same and reworks ice_aq_wait_for_event() to reduce chance of race. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: split ice_aq_wait_for_event() func into two ice: embed &ice_rq_event_info event into struct ice_aq_task ice: ice_aq_check_events: fix off-by-one check when filling buffer ice: drop two params from ice_aq_alloc_free_res() ice: use list_for_each_entry() helper ice: Remove redundant VSI configuration in eswitch setup ice: move E810T functions to before device agnostic ones ice: refactor ice_vsi_is_vlan_pruning_ena ice: refactor ice_ptp_hw to make functions static ice: refactor ice_sched to make functions static ice: Utilize assign_bit() helper ice: refactor ice_vf_lib to make functions static ice: refactor ice_lib to make functions static ice: refactor ice_ddp to make functions static ice: remove unused methods ==================== Link: https://lore.kernel.org/r/20230817212239.2601543-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17ice: split ice_aq_wait_for_event() func into twoPrzemek Kitszel
Mitigate race between registering on wait list and receiving AQ Response from FW. ice_aq_prep_for_event() should be called before sending AQ command, ice_aq_wait_for_event() should be called after sending AQ command, to wait for AQ Response. Please note, that this was found by reading the code, an actual race has not yet materialized. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17ice: embed &ice_rq_event_info event into struct ice_aq_taskPrzemek Kitszel
Expose struct ice_aq_task to callers, what takes burden of memory ownership out from AQ-wait family of functions, and reduces need for heap-based allocations. Embed struct ice_rq_event_info event into struct ice_aq_task (instead of it being a ptr) to remove some more code from the callers. Subsequent commit will improve more based on this one. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17ice: ice_aq_check_events: fix off-by-one check when filling bufferPrzemek Kitszel
Allow task's event buffer to be filled also in the case that it's size is exactly the size of the message. Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-17ice: Block switchdev mode when ADQ is active and vice versaMarcin Szycik
ADQ and switchdev are not supported simultaneously. Enabling both at the same time can result in nullptr dereference. To prevent this, check if ADQ is active when changing devlink mode to switchdev mode, and check if switchdev is active when enabling ADQ. Fixes: fbc7b27af0f9 ("ice: enable ndo_setup_tc support for mqprio_qdisc") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230816193405.1307580-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07ice: add FW load waitJan Sokolowski
As some cards load FW from external sources, we have to wait to be sure that FW is ready before setting link up. Add check and wait for FW readiness Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/dsa/port.c 9945c1fb03a3 ("net: dsa: fix older DSA drivers using phylink") a88dd7538461 ("net: dsa: remove legacy_pre_march2020 detection") https://lore.kernel.org/all/20230731102254.2c9868ca@canb.auug.org.au/ net/xdp/xsk.c 3c5b4d69c358 ("net: annotate data-races around sk->sk_mark") b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path") https://lore.kernel.org/all/20230731102631.39988412@canb.auug.org.au/ drivers/net/ethernet/broadcom/bnxt/bnxt.c 37b61cda9c16 ("bnxt: don't handle XDP in netpoll") 2b56b3d99241 ("eth: bnxt: handle invalid Tx completions more gracefully") https://lore.kernel.org/all/20230801101708.1dc7faac@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c 62da08331f1a ("net/mlx5e: Set proper IPsec source port in L4 selector") fbd517549c32 ("net/mlx5e: Add function to get IPsec offload namespace") drivers/net/ethernet/sfc/selftest.c 55c1528f9b97 ("sfc: fix field-spanning memcpy in selftest") ae9d445cd41f ("sfc: Miscellaneous comment removals") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31ice: Fix RDMA VSI removal during queue rebuildRafal Rogalski
During qdisc create/delete, it is necessary to rebuild the queue of VSIs. An error occurred because the VSIs created by RDMA were still active. Added check if RDMA is active. If yes, it disallows qdisc changes and writes a message in the system logs. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Rafal Rogalski <rafalx.rogalski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230728171243.2446101-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27ice: update reset path for SRIOV LAG supportDave Ertman
Add code to rebuild the LAG resources when rebuilding the state of the interface after a reset. Also added in a function for building per-queue information into the buffer used to configure VF queues for LAG fail-over. This improves code reuse. Due to differences in timing per interface for recovering from a reset, add in the ability to retry on non-local dependencies where needed. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-27ice: Add driver support for firmware changes for LAGDave Ertman
Add the defines, fields, and detection code for FW support of LAG for SRIOV. Also exposes some previously static functions to allow access in the lag code. Clean up code that is unused or not needed for LAG support. Also add an ordered workqueue for processing LAG events. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-24ice: Implement basic eswitch bridge setupWojciech Drewek
With this patch, ice driver is able to track if the port representors or uplink port were added to the linux bridge in switchdev mode. Listen for NETDEV_CHANGEUPPER events in order to detect this. ice_esw_br data structure reflects the linux bridge and stores all the ports of the bridge (ice_esw_br_port) in xarray, it's created when the first port is added to the bridge and freed once the last port is removed. Note that only one bridge is supported per eswitch. Bridge port (ice_esw_br_port) can be either a VF port representor port or uplink port (ice_esw_br_port_type). In both cases bridge port holds a reference to the VSI, VF's VSI in case of the PR and uplink VSI in case of the uplink. VSI's index is used as an index to the xarray in which ports are stored. Add a check which prevents configuring switchdev mode if uplink is already added to any bridge. This is needed because we need to listen for NETDEV_CHANGEUPPER events to record if the uplink was added to the bridge. Netdevice notifier is registered after eswitch mode is changed to switchdev. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-24ice: Prohibit rx mode change in switchdev modeWojciech Drewek
Don't allow to change promisc mode in switchdev mode. When switchdev is configured, PF netdev is set to be a default VSI. This is needed for the slow-path to work correctly. All the unmatched packets will be directed to PF netdev. It is possible that this setting might be overwritten by ndo_set_rx_mode. Prevent this by checking if switchdev is enabled in ice_set_rx_mode. Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-19Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-07-19 We've added 45 non-merge commits during the last 3 day(s) which contain a total of 71 files changed, 7808 insertions(+), 592 deletions(-). The main changes are: 1) multi-buffer support in AF_XDP, from Maciej Fijalkowski, Magnus Karlsson, Tirthendu Sarkar. 2) BPF link support for tc BPF programs, from Daniel Borkmann. 3) Enable bpf_map_sum_elem_count kfunc for all program types, from Anton Protopopov. 4) Add 'owner' field to bpf_rb_node to fix races in shared ownership, Dave Marchevsky. 5) Prevent potential skb_header_pointer() misuse, from Alexei Starovoitov. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) bpf, net: Introduce skb_pointer_if_linear(). bpf: sync tools/ uapi header with selftests/bpf: Add mprog API tests for BPF tcx links selftests/bpf: Add mprog API tests for BPF tcx opts bpftool: Extend net dump with tcx progs libbpf: Add helper macro to clear opts structs libbpf: Add link-based API for tcx libbpf: Add opts-based attach/detach/query API for tcx bpf: Add fd-based tcx multi-prog infra with link support bpf: Add generic attach/detach/query API for multi-progs selftests/xsk: reset NIC settings to default after running test suite selftests/xsk: add test for too many frags selftests/xsk: add metadata copy test for multi-buff selftests/xsk: add invalid descriptor test for multi-buffer selftests/xsk: add unaligned mode test for multi-buffer selftests/xsk: add basic multi-buffer test selftests/xsk: transmit and receive multi-buffer packets xsk: add multi-buffer documentation i40e: xsk: add TX multi-buffer support ice: xsk: Tx multi-buffer support ... ==================== Link: https://lore.kernel.org/r/20230719175424.75717-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-19ice: xsk: Tx multi-buffer supportMaciej Fijalkowski
Most of this patch is about actually supporting XDP_TX action. Pure Tx ZC support is only about looking at XDP_PKT_CONTD presence at options field and based on that generating EOP bit on Tx HW descriptor. This is that simple due to the implementation on xsk_tx_peek_release_desc_batch() where we are making sure that last produced descriptor is an EOP one. Overwrite xdp_zc_max_segs with a value that defines max scatter-gatter count on Tx side that HW can handle. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-16-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18ice: remove unnecessary (void*) conversionsWu Yunchuan
No need cast (void*) to (struct ice_ring_container *). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Link: https://lore.kernel.org/r/20230717031154.54740-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-14ice: prevent NULL pointer deref during reloadMichal Swiatkowski
Calling ethtool during reload can lead to call trace, because VSI isn't configured for some time, but netdev is alive. To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors to 0 after freeing and add a check for ::tx/rx_rings in ring related ethtool ops. Add proper unroll of filters in ice_start_eth(). Reproduction: $watch -n 0.1 -d 'ethtool -g enp24s0f0np0' $devlink dev reload pci/0000:18:00.0 action driver_reinit Call trace before fix: [66303.926205] BUG: kernel NULL pointer dereference, address: 0000000000000000 [66303.926259] #PF: supervisor read access in kernel mode [66303.926286] #PF: error_code(0x0000) - not-present page [66303.926311] PGD 0 P4D 0 [66303.926332] Oops: 0000 [#1] PREEMPT SMP PTI [66303.926358] CPU: 4 PID: 933821 Comm: ethtool Kdump: loaded Tainted: G OE 6.4.0-rc5+ #1 [66303.926400] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018 [66303.926446] RIP: 0010:ice_get_ringparam+0x22/0x50 [ice] [66303.926649] Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 c0 09 00 00 c7 46 04 e0 1f 00 00 c7 46 10 e0 1f 00 00 48 8b 50 20 <48> 8b 12 0f b7 52 3a 89 56 14 48 8b 40 28 48 8b 00 0f b7 40 58 48 [66303.926722] RSP: 0018:ffffad40472f39c8 EFLAGS: 00010246 [66303.926749] RAX: ffff98a8ada05828 RBX: ffff98a8c46dd060 RCX: ffffad40472f3b48 [66303.926781] RDX: 0000000000000000 RSI: ffff98a8c46dd068 RDI: ffff98a8b23c4000 [66303.926811] RBP: ffffad40472f3b48 R08: 00000000000337b0 R09: 0000000000000000 [66303.926843] R10: 0000000000000001 R11: 0000000000000100 R12: ffff98a8b23c4000 [66303.926874] R13: ffff98a8c46dd060 R14: 000000000000000f R15: ffffad40472f3a50 [66303.926906] FS: 00007f6397966740(0000) GS:ffff98b390900000(0000) knlGS:0000000000000000 [66303.926941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66303.926967] CR2: 0000000000000000 CR3: 000000011ac20002 CR4: 00000000007706e0 [66303.926999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [66303.927029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [66303.927060] PKRU: 55555554 [66303.927075] Call Trace: [66303.927094] <TASK> [66303.927111] ? __die+0x23/0x70 [66303.927140] ? page_fault_oops+0x171/0x4e0 [66303.927176] ? exc_page_fault+0x7f/0x180 [66303.927209] ? asm_exc_page_fault+0x26/0x30 [66303.927244] ? ice_get_ringparam+0x22/0x50 [ice] [66303.927433] rings_prepare_data+0x62/0x80 [66303.927469] ethnl_default_doit+0xe2/0x350 [66303.927501] genl_family_rcv_msg_doit.isra.0+0xe3/0x140 [66303.927538] genl_rcv_msg+0x1b1/0x2c0 [66303.927561] ? __pfx_ethnl_default_doit+0x10/0x10 [66303.927590] ? __pfx_genl_rcv_msg+0x10/0x10 [66303.927615] netlink_rcv_skb+0x58/0x110 [66303.927644] genl_rcv+0x28/0x40 [66303.927665] netlink_unicast+0x19e/0x290 [66303.927691] netlink_sendmsg+0x254/0x4d0 [66303.927717] sock_sendmsg+0x93/0xa0 [66303.927743] __sys_sendto+0x126/0x170 [66303.927780] __x64_sys_sendto+0x24/0x30 [66303.928593] do_syscall_64+0x5d/0x90 [66303.929370] ? __count_memcg_events+0x60/0xa0 [66303.930146] ? count_memcg_events.constprop.0+0x1a/0x30 [66303.930920] ? handle_mm_fault+0x9e/0x350 [66303.931688] ? do_user_addr_fault+0x258/0x740 [66303.932452] ? exc_page_fault+0x7f/0x180 [66303.933193] entry_SYSCALL_64_after_hwframe+0x72/0xdc Fixes: 5b246e533d01 ("ice: split probe into smaller functions") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05ice: Fix tx queue rate limit when TCs are configuredSridhar Samudrala
Configuring tx_maxrate via sysfs interface /sys/class/net/eth0/queues/tx-1/tx_maxrate was not working when TCs are configured because always main VSI was being used. Fix by using correct VSI in ice_set_tx_maxrate when TCs are configured. Fixes: 1ddef455f4a8 ("ice: Add NDO callback to set the maximum per-queue bitrate") Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05ice: Fix max_rate check while configuring TX rate limitsSridhar Samudrala
Remove incorrect check in ice_validate_mqprio_opt() that limits filter configuration when sum of max_rates of all TCs exceeds the link speed. The max rate of each TC is unrelated to value used by other TCs and is valid as long as it is less than link speed. Fixes: fbc7b27af0f9 ("ice: enable ndo_setup_tc support for mqprio_qdisc") Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-22ice: use ice_down_up() where applicableMaciej Fijalkowski
ice_change_mtu() is currently using a separate ice_down() and ice_up() calls to reflect changed MTU. ice_down_up() serves this purpose, so do the refactoring here. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-22ice: allow hot-swapping XDP programsMaciej Fijalkowski
Currently ice driver's .ndo_bpf callback brings interface down and up independently of XDP resources' presence. This is only needed when either these resources have to be configured or removed. It means that if one is switching XDP programs on-the-fly with running traffic, packets will be dropped. To avoid this, compare early on ice_xdp_setup_prog() state of incoming bpf_prog pointer vs the bpf_prog pointer that is already assigned to VSI. Do the swap in case VSI has bpf_prog and incoming one are non-NULL. Lastly, while at it, put old bpf_prog *after* the update of Rx ring's bpf_prog pointer. In theory previous code could expose us to a state where Rx ring's bpf_prog would still be referring to old_prog that got released with earlier bpf_prog_put(). Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-15ice: remove unnecessary check for old MAC == new MACPiotr Gardocki
The check has been moved to core. The ndo_set_mac_address callback is not being called with new MAC address equal to the old one anymore. Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/mlx5/driver.h 617f5db1a626 ("RDMA/mlx5: Fix affinity assignment") dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") https://lore.kernel.org/all/20230613125939.595e50b8@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh 47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported") 425ba803124b ("selftests: mptcp: join: support RM_ADDR for used endpoints or not") 45b1a1227a7a ("mptcp: introduces more address related mibs") 0639fa230a21 ("selftests: mptcp: add explicit check for new mibs") https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14ice: Fix ice module unloadJakub Buchocki
Clearing the interrupt scheme before PFR reset, during the removal routine, could cause the hardware errors and possibly lead to system reboot, as the PF reset can cause the interrupt to be generated. Place the call for PFR reset inside ice_deinit_dev(), wait until reset and all pending transactions are done, then call ice_clear_interrupt_scheme(). This introduces a PFR reset to multiple error paths. Additionally, remove the call for the reset from ice_load() - it will be a part of ice_unload() now. Error example: [ 75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash [ 77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [ 77.571418] {1}[Hardware Error]: event severity: recoverable [ 77.571459] {1}[Hardware Error]: Error 0, type: recoverable [ 77.571500] {1}[Hardware Error]: section_type: PCIe error [ 77.571540] {1}[Hardware Error]: port_type: 4, root port [ 77.571580] {1}[Hardware Error]: version: 3.0 [ 77.571615] {1}[Hardware Error]: command: 0x0547, status: 0x4010 [ 77.571661] {1}[Hardware Error]: device_id: 0000:c9:02.0 [ 77.571703] {1}[Hardware Error]: slot: 25 [ 77.571736] {1}[Hardware Error]: secondary_bus: 0xca [ 77.571773] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a [ 77.571821] {1}[Hardware Error]: class_code: 060400 [ 77.571858] {1}[Hardware Error]: bridge: secondary_status: 0x2800, control: 0x0013 [ 77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020 [ 77.572870] pcieport 0000:c9:02.0: [21] ACSViol (First) [ 77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID [ 77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010 [ 77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [ 77.691738] {2}[Hardware Error]: event severity: recoverable [ 77.691971] {2}[Hardware Error]: Error 0, type: recoverable [ 77.692192] {2}[Hardware Error]: section_type: PCIe error [ 77.692403] {2}[Hardware Error]: port_type: 4, root port [ 77.692616] {2}[Hardware Error]: version: 3.0 [ 77.692825] {2}[Hardware Error]: command: 0x0547, status: 0x4010 [ 77.693032] {2}[Hardware Error]: device_id: 0000:c9:02.0 [ 77.693238] {2}[Hardware Error]: slot: 25 [ 77.693440] {2}[Hardware Error]: secondary_bus: 0xca [ 77.693641] {2}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a [ 77.693853] {2}[Hardware Error]: class_code: 060400 [ 77.694054] {2}[Hardware Error]: bridge: secondary_status: 0x0800, control: 0x0013 [ 77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback) [ 77.719140] pcieport 0000:c9:02.0: AER: device recovery failed [ 77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020 [ 77.719390] pcieport 0000:c9:02.0: [21] ACSViol (First) [ 77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID [ 77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010 Fixes: 5b246e533d01 ("ice: split probe into smaller functions") Signed-off-by: Jakub Buchocki <jakubx.buchocki@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230612171421.21570-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08ice: do not re-enable miscellaneous interrupt until thread_fn completesJacob Keller
The ice driver uses threaded IRQ for managing Tx timestamps via the devm_request_threaded_irq() interface. The ice_misc_intr() handler function is responsible for processing the hard interrupt context, and can wake the ice_misc_intr_thread_fn() by returning IRQ_WAKE_THREAD. The request_threaded_irq() function comment says: @handler is still called in hard interrupt context and has to check whether the interrupt originates from the device. If yes, it needs to disable the interrupt on the device and return IRQ_WAKE_THREAD which will wake up the handler thread and run the @thread_fn. We currently re-enable the Other Interrupt Cause Register (OCIR) at the end of ice_misc_intr(). In practice, this seems to be ok, but it can make communicating between the handler function and the thread function difficult. This is because the interrupt can trigger again while the thread function is still processing. Move the OICR update to the end of the thread function, leaving the other interrupt cause disabled in hardware until we complete one pass of the thread function. This prevents the miscellaneous interrupt from firing until after we finish the thread function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-08ice: trigger PFINT_OICR_TSYN_TX interrupt instead of pollingJacob Keller
In ice_misc_intr_thread_fn(), if we do not complete all Tx timestamp work, the thread function will poll continuously forever. For E822 hardware, this wastes time as the return value from ice_ptp_process_ts() is accurate and always reports correctly that the PHY actually has new timestamp data. In addition, if we receive enough timestamps with the right pacing, we may never exit this polling. Should this occur, other tasks handled by the ice_misc_intr_thread_fn() will never be processed. Fix this by instead writing to PFINT_OICR, causing an emulated interrupt to be triggered immediately. This does take slightly more processing than just re-checking the timestamps. However, it allows all of the other interrupt causes a chance to be processed first in the hard IRQ function. Note that the OICR interrupt is configured to be throttled to no more than once every 124 microseconds. This gives an effective interrupt rate of ~8000 interrupts per second. This should thus not cause a significant increase in overall CPU usage when compared to sleeping. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-08ice: Fix XDP memory leak when NIC is brought up and downKamil Maziarz
Fix the buffer leak that occurs while switching the port up and down with traffic and XDP by checking for an active XDP program and freeing all empty TX buffers. Fixes: efc2214b6047 ("ice: Add support for XDP") Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-08ice: introduce ICE_TX_TSTAMP_WORK enumerationJacob Keller
The ice_ptp_process_ts() function and its various helper functions return a boolean value indicating whether any work is remaining. This use of a boolean has grown confusing as we have multiple helpers that pass status between each other. Readers must be aware of what "true" and "false" mean, and it is very easy to get their meaning inverted. The names of the functions are not standard "yes/no" questions, which is the best practice for boolean returns. Replace this use of an enumeration with a custom type, enum ice_tx_tstamp_work. This enumeration clearly indicates whether all work is done, or if more work is pending. To aid in readability, factor the actual list iteration and processing out into ice_ptp_process_tx_tstamp(), making it void. Then call this in ice_ptp_tx_tstamp() ensuring that we always check the Tracker list at the end when determining the appropriate return value. Now the return value is an explicit name instead of the true or false value. This is easier to follow and makes reading the resulting callers much simpler. In addition, this paves the way for future work to allow E822 hardware to process timestamps for all functions using a single interrupt on the clock owning PF. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-08ice: always return IRQ_WAKE_THREAD in ice_misc_intr()Karol Kolacinski
Refactor the ice_misc_intr() function to always return IRQ_WAKE_THREAD, and schedule the service task during the soft IRQ thread function instead of at the end of the hard IRQ handler. Remove the duplicate call to ice_service_task_schedule() that happened when we got a PCI exception. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-08ice: handle extts in the miscellaneous interrupt threadKarol Kolacinski
The ice_ptp_extts_work() and ice_ptp_periodic_work() functions are both scheduled on the same kthread worker, pf.ptp.kworker. The ice_ptp_periodic_work() function sends to the firmware to interact with the PHY, and must block to wait for responses. This can cause delay in responding to the PFINT_OICR_TSYN_EVNT interrupt cause, ultimately resulting in disruption to processing an input signal of the frequency is high enough. In our testing, even 100 Hz signals get disrupted. Fix this by instead processing the signal inside the miscellaneous interrupt thread prior to handling Tx timestamps. Use atomic bits in a new pf->misc_thread bitmap in order to safely communicate which tasks require processing within the ice_misc_intr_thread_fn(). This ensures the communication of desired tasks from the ice_misc_intr() are correctly processed without racing even in the event that the interrupt triggers again before the thread function exits. Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-16ice: add dynamic interrupt allocationPiotr Raczynski
Currently driver can only allocate interrupt vectors during init phase by calling pci_alloc_irq_vectors. Change that and make use of new pci_msix_alloc_irq_at/pci_msix_free_irq API and enable to allocate and free more interrupts after MSIX has been enabled. Since not all platforms supports dynamic allocation, check it with pci_msix_can_alloc_dyn. Extend the tracker to keep track how many interrupts are allocated initially so when all such vectors are already used, additional interrupts are automatically allocated dynamically. Remember each interrupt allocation method to then free appropriately. Since some features may require interrupts allocated dynamically add appropriate VSI flag and take it into account when allocating new interrupt. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>