linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2023-06-21	bnxt_en: Link representors to PCI device	Ivan Vecera
	Link VF representors to parent PCI device to benefit from systemd defined naming scheme. Without this change the representor is visible as ethN. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20230620144855.288443-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: add support for emac4 on sa8775p platforms	Bartosz Golaszewski
	sa8775p uses EMAC version 4, add the relevant defines, rename the has_emac3 switch to has_emac_ge_3 (has emac greater-or-equal than 3) and add the new compatible. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: add new switch to struct plat_stmmacenet_data	Bartosz Golaszewski
	On some platforms, the PCS can be integrated in the MAC so the driver will not see any PCS link activity. Add a switch that allows the platform drivers to let the core code know. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: add support for SGMII	Bartosz Golaszewski
	On sa8775p the MAC is connected to the external PHY over SGMII so add support for it to the driver. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: prepare the driver for more PHY modes	Bartosz Golaszewski
	In preparation for supporting SGMII, let's make the code a bit more generic. Add a new callback for MAC configuration so that we can assign a different variant of it in the future. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: add support for the phyaux clock	Bartosz Golaszewski
	On sa8775p, the EMAC revision is 4 and we use SGMII instead of RGMII. There's no "rgmii" clock but there's a fourth clock under a different name: "phyaux". Add a new field to the chip data struct that specifies the link clock name. Default to "rgmii" for backward compatibility. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: add support for the optional serdes phy	Bartosz Golaszewski
	On sa8775p platforms, there's a SGMII SerDes PHY between the MAC and external PHY that we need to enable and configure. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: remove stray space	Bartosz Golaszewski
	There's an unnecessary space in the rgmii_updatel() function, remove it. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: add a newline between headers	Bartosz Golaszewski
	Typically we use a newline between global and local headers so add it here as well. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: add missing include	Bartosz Golaszewski
	device_get_phy_mode() is declared in linux/property.h but this header is not included. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: use a helper variable for &pdev->dev	Bartosz Golaszewski
	Shrink code and avoid line breaks by using a helper variable for &pdev->dev. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: tweak the order of local variables	Bartosz Golaszewski
	Make sure we follow the reverse-xmas tree convention. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: rename a label in probe()	Bartosz Golaszewski
	The err_mem label's name is unclear. It actually should be reached on any error after stmmac_probe_config_dt() succeeds. Name it after the cleanup action that needs to be called before exiting. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: stmmac: dwmac-qcom-ethqos: shrink clock code with devres	Bartosz Golaszewski
	We can use a devm action to completely drop the remove callback and use stmmac_pltfr_remove() directly for remove. We can also drop one of the goto labels. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	sfc: fix uninitialized variable use	Arnd Bergmann
	The new efx_bind_neigh() function contains a broken code path when IPV6 is disabled: drivers/net/ethernet/sfc/tc_encap_actions.c:144:7: error: variable 'n' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (encap->type & EFX_ENCAP_FLAG_IPV6) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/sfc/tc_encap_actions.c:184:8: note: uninitialized use occurs here if (!n) { ^ drivers/net/ethernet/sfc/tc_encap_actions.c:144:3: note: remove the 'if' if its condition is always false if (encap->type & EFX_ENCAP_FLAG_IPV6) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/sfc/tc_encap_actions.c:141:22: note: initialize the variable 'n' to silence this warning struct neighbour *n; ^ = NULL Change it to use the existing error handling path here. Fixes: 7e5e7d800011a ("sfc: neighbour lookup for TC encap action offload") Suggested-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20230619091215.2731541-2-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	sfc: add CONFIG_INET dependency for TC offload	Arnd Bergmann
	The driver now fails to link when CONFIG_INET is disabled, so add an explicit Kconfig dependency: ld.lld: error: undefined symbol: ip_route_output_flow >>> referenced by tc_encap_actions.c >>> drivers/net/ethernet/sfc/tc_encap_actions.o:(efx_tc_flower_create_encap_md) in archive vmlinux.a ld.lld: error: undefined symbol: ip_send_check >>> referenced by tc_encap_actions.c >>> drivers/net/ethernet/sfc/tc_encap_actions.o:(efx_gen_encap_header) in archive vmlinux.a >>> referenced by tc_encap_actions.c >>> drivers/net/ethernet/sfc/tc_encap_actions.o:(efx_gen_encap_header) in archive vmlinux.a ld.lld: error: undefined symbol: arp_tbl >>> referenced by tc_encap_actions.c >>> drivers/net/ethernet/sfc/tc_encap_actions.o:(efx_tc_netevent_event) in archive vmlinux.a >>> referenced by tc_encap_actions.c >>> drivers/net/ethernet/sfc/tc_encap_actions.o:(efx_tc_netevent_event) in archive vmlinux.a Fixes: a1e82162af0b8 ("sfc: generate encap headers for TC offload") Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306151656.yttECVTP-lkp@intel.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230619091215.2731541-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	octeontx2-pf: TC flower offload support for rxqueue mapping	Ratheesh Kannoth
	TC rule support to offload rx queue mapping rules. Eg: tc filter add dev eth2 ingress protocol ip flower \ dst_ip 192.168.8.100 \ action skbedit queue_mapping 4 skip_sw action mirred ingress redirect dev eth5 Packets destined to 192.168.8.100 will be forwarded to rx queue 4 of eth5 interface. tc filter add dev eth2 ingress protocol ip flower \ dst_ip 192.168.8.100 \ action skbedit queue_mapping 9 skip_sw Packets destined to 192.168.8.100 will be forwarded to rx queue 4 of eth2 interface. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://lore.kernel.org/r/20230619060638.1032304-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	be2net: Extend xmit workaround to BE3 chip	Ross Lagerwall
	We have seen a bug where the NIC incorrectly changes the length in the IP header of a padded packet to include the padding bytes. The driver already has a workaround for this so do the workaround for this NIC too. This resolves the issue. The NIC in question identifies itself as follows: [ 8.828494] be2net 0000:02:00.0: FW version is 10.7.110.31 [ 8.834759] be2net 0000:02:00.0: Emulex OneConnect(be3): PF FLEX10 port 1 02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) Fixes: ca34fe38f06d ("be2net: fix wrong usage of adapter->generation") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Link: https://lore.kernel.org/r/20230616164549.2863037-1-ross.lagerwall@citrix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: fec: allow to build without PAGE_POOL_STATS	Lucas Stach
	Commit 6970ef27ff7f ("net: fec: add xdp and page pool statistics") selected CONFIG_PAGE_POOL_STATS from the FEC driver symbol, making it impossible to build without the page pool statistics when this driver is enabled. The help text of those statistics mentions increased overhead. Allow the user to choose between usefulness of the statistics and the added overhead. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230616191832.2944130-1-l.stach@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	net: dpaa2-mac: add 25gbase-r support	Josua Mayer
	Layerscape MACs support 25Gbps network speed with dpmac "CAUI" mode. Add the mappings between DPMAC_ETH_IF_* and HY_INTERFACE_MODE_*, as well as the 25000 mac capability. Tested on SolidRun LX2162a Clearfog, serdes 1 protocol 18. Signed-off-by: Josua Mayer <josua@solid-run.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20	net/mlx5: Add .getmaxphase ptp_clock_info callback	Rahul Rameshbabu
	Implement .getmaxphase callback of ptp_clock_info in mlx5 driver. No longer do a range check in .adjphase callback implementation. Handled by the ptp stack. Cc: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-19	Merge tag 'mlx5-fixes-2023-06-16' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2023-06-16 This series provides bug fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-19	net: qca_spi: Avoid high load if QCA7000 is not available	Stefan Wahren
	In case the QCA7000 is not available via SPI (e.g. in reset), the driver will cause a high load. The reason for this is that the synchronization is never finished and schedule() is never called. Since the synchronization is not timing critical, it's safe to drop this from the scheduling condition. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-17	sfc: use budget for TX completions	Íñigo Huguet
	When running workloads heavy unbalanced towards TX (high TX, low RX traffic), sfc driver can retain the CPU during too long times. Although in many cases this is not enough to be visible, it can affect performance and system responsiveness. A way to reproduce it is to use a debug kernel and run some parallel netperf TX tests. In some systems, this will lead to this message being logged: kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s! The reason is that sfc driver doesn't account any NAPI budget for the TX completion events work. With high-TX/low-RX traffic, this makes that the CPU is held for long time for NAPI poll. Documentations says "drivers can process completions for any number of Tx packets but should only process up to budget number of Rx packets". However, many drivers do limit the amount of TX completions that they process in a single NAPI poll. In the same way, this patch adds a limit for the TX work in sfc. With the patch applied, the watchdog warning never appears. Tested with netperf in different combinations: single process / parallel processes, TCP / UDP and different sizes of UDP messages. Repeated the tests before and after the patch, without any noticeable difference in network or CPU performance. Test hardware: Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz (4 cores, 2 threads/core) Solarflare Communications XtremeScale X2522-25G Network Adapter Fixes: 5227ecccea2d ("sfc: remove tx and MCDI handling from NAPI budget consideration") Fixes: d19a53721863 ("sfc_ef100: TX path for EF100 NICs") Reported-by: Fei Liu <feliu@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20230615084929.10506-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-16	net/mlx5e: Fix scheduling of IPsec ASO query while in atomic	Leon Romanovsky
	ASO query can be scheduled in atomic context as such it can't use usleep. Use udelay as recommended in Documentation/timers/timers-howto.rst. Fixes: 76e463f6508b ("net/mlx5e: Overcome slow response for first IPsec ASO WQE") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: Drop XFRM state lock when modifying flow steering	Leon Romanovsky
	XFRM state which is changed to be XFRM_STATE_EXPIRED doesn't really need to hold lock while modifying flow steering rules to drop traffic. That state can be deleted only and as such mlx5e_ipsec_handle_tx_limit() work will be canceled anyway and won't run in parallel. Fixes: b2f7b01d36a9 ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: Fix ESN update kernel panic	Patrisious Haddad
	Previously during mlx5e_ipsec_handle_event the driver tried to execute an operation that could sleep, while holding a spinlock, which caused the kernel panic mentioned below. Move the function call that can sleep outside of the spinlock context. Call Trace: <TASK> dump_stack_lvl+0x49/0x6c __schedule_bug.cold+0x42/0x4e schedule_debug.constprop.0+0xe0/0x118 __schedule+0x59/0x58a ? __mod_timer+0x2a1/0x3ef schedule+0x5e/0xd4 schedule_timeout+0x99/0x164 ? __pfx_process_timeout+0x10/0x10 __wait_for_common+0x90/0x1da ? __pfx_schedule_timeout+0x10/0x10 wait_func+0x34/0x142 [mlx5_core] mlx5_cmd_invoke+0x1f3/0x313 [mlx5_core] cmd_exec+0x1fe/0x325 [mlx5_core] mlx5_cmd_do+0x22/0x50 [mlx5_core] mlx5_cmd_exec+0x1c/0x40 [mlx5_core] mlx5_modify_ipsec_obj+0xb2/0x17f [mlx5_core] mlx5e_ipsec_update_esn_state+0x69/0xf0 [mlx5_core] ? wake_affine+0x62/0x1f8 mlx5e_ipsec_handle_event+0xb1/0xc0 [mlx5_core] process_one_work+0x1e2/0x3e6 ? __pfx_worker_thread+0x10/0x10 worker_thread+0x54/0x3ad ? __pfx_worker_thread+0x10/0x10 kthread+0xda/0x101 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x37 </TASK> BUG: workqueue leaked lock or atomic: kworker/u256:4/0x7fffffff/189754#012 last function: mlx5e_ipsec_handle_event [mlx5_core] CPU: 66 PID: 189754 Comm: kworker/u256:4 Kdump: loaded Tainted: G W 6.2.0-2596.20230309201517_5.el8uek.rc1.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X9-2/ASMMBX9-2, BIOS 61070300 08/17/2022 Workqueue: mlx5e_ipsec: eth%d mlx5e_ipsec_handle_event [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x49/0x6c process_one_work.cold+0x2b/0x3c ? __pfx_worker_thread+0x10/0x10 worker_thread+0x54/0x3ad ? __pfx_worker_thread+0x10/0x10 kthread+0xda/0x101 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x37 </TASK> BUG: scheduling while atomic: kworker/u256:4/189754/0x00000000 Fixes: cee137a63431 ("net/mlx5e: Handle ESN update events") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: Don't delay release of hardware objects	Leon Romanovsky
	XFRM core provides two callbacks to release resources, one is .xdo_dev_policy_delete() and another is .xdo_dev_policy_free(). This separation allows delayed release so "ip xfrm policy free" commands won't starve. Unfortunately, mlx5 command interface can't run in .xdo_dev_policy_free() callbacks as the latter runs in ATOMIC context. BUG: scheduling while atomic: swapper/7/0/0x00000100 Modules linked in: act_mirred act_tunnel_key cls_flower sch_ingress vxlan mlx5_vdpa vringh vhost_iotlb vdpa rpcrdma rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.3.0+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x33/0x50 __schedule_bug+0x4e/0x60 __schedule+0x5d5/0x780 ? __mod_timer+0x286/0x3d0 schedule+0x50/0x90 schedule_timeout+0x7c/0xf0 ? __bpf_trace_tick_stop+0x10/0x10 __wait_for_common+0x88/0x190 ? usleep_range_state+0x90/0x90 cmd_exec+0x42e/0xb40 [mlx5_core] mlx5_cmd_do+0x1e/0x40 [mlx5_core] mlx5_cmd_exec+0x18/0x30 [mlx5_core] mlx5_cmd_delete_fte+0xa8/0xd0 [mlx5_core] del_hw_fte+0x60/0x120 [mlx5_core] mlx5_del_flow_rules+0xec/0x270 [mlx5_core] ? default_send_IPI_single_phys+0x26/0x30 mlx5e_accel_ipsec_fs_del_pol+0x1a/0x60 [mlx5_core] mlx5e_xfrm_free_policy+0x15/0x20 [mlx5_core] xfrm_policy_destroy+0x5a/0xb0 xfrm4_dst_destroy+0x7b/0x100 dst_destroy+0x37/0x120 rcu_core+0x2d6/0x540 __do_softirq+0xcd/0x273 irq_exit_rcu+0x82/0xb0 sysvec_apic_timer_interrupt+0x72/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:default_idle+0x13/0x20 Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 7a 4d ee 00 85 c0 7e 07 0f 00 2d 2f 98 2e 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 40 b4 02 00 RSP: 0018:ffff888100843ee0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffff888100812b00 RCX: 4000000000000000 RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000002d2ec RBP: 0000000000000007 R08: 00000021daeded59 R09: 0000000000000001 R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 default_idle_call+0x30/0xb0 do_idle+0x1c1/0x1d0 cpu_startup_entry+0x19/0x20 start_secondary+0xfe/0x120 secondary_startup_64_no_verify+0xf3/0xfb </TASK> bad: scheduling from the idle thread! Fixes: a5b8ca9471d3 ("net/mlx5e: Add XFRM policy offload logic") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Free IRQ rmap and notifier on kernel shutdown	Saeed Mahameed
	The kernel IRQ system needs the irq affinity notifier to be clear before attempting to free the irq, see WARN_ON log below. On a normal driver unload we don't have this issue since we do the complete cleanup of the irq resources. To fix this, put the important resources cleanup in a helper function and use it in both normal driver unload and shutdown flows. [ 4497.498434] ------------[ cut here ]------------ [ 4497.498726] WARNING: CPU: 0 PID: 9 at kernel/irq/manage.c:2034 free_irq+0x295/0x340 [ 4497.499193] Modules linked in: [ 4497.499386] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.4.0-rc4+ #10 [ 4497.499876] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 [ 4497.500518] Workqueue: events do_poweroff [ 4497.500849] RIP: 0010:free_irq+0x295/0x340 [ 4497.501132] Code: 85 c0 0f 84 1d ff ff ff 48 89 ef ff d0 0f 1f 00 e9 10 ff ff ff 0f 0b e9 72 ff ff ff 49 8d 7f 28 ff d0 0f 1f 00 e9 df fd ff ff <0f> 0b 48 c7 80 c0 008 [ 4497.502269] RSP: 0018:ffffc90000053da0 EFLAGS: 00010282 [ 4497.502589] RAX: ffff888100949600 RBX: ffff88810330b948 RCX: 0000000000000000 [ 4497.503035] RDX: ffff888100949600 RSI: ffff888100400490 RDI: 0000000000000023 [ 4497.503472] RBP: ffff88810330c7e0 R08: ffff8881004005d0 R09: ffffffff8273a260 [ 4497.503923] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881009ae000 [ 4497.504359] R13: ffff8881009ae148 R14: 0000000000000000 R15: ffff888100949600 [ 4497.504804] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 4497.505302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4497.505671] CR2: 00007fce98806298 CR3: 000000000262e005 CR4: 0000000000370ef0 [ 4497.506104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4497.506540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4497.507002] Call Trace: [ 4497.507158] <TASK> [ 4497.507299] ? free_irq+0x295/0x340 [ 4497.507522] ? __warn+0x7c/0x130 [ 4497.507740] ? free_irq+0x295/0x340 [ 4497.507963] ? report_bug+0x171/0x1a0 [ 4497.508197] ? handle_bug+0x3c/0x70 [ 4497.508417] ? exc_invalid_op+0x17/0x70 [ 4497.508662] ? asm_exc_invalid_op+0x1a/0x20 [ 4497.508926] ? free_irq+0x295/0x340 [ 4497.509146] mlx5_irq_pool_free_irqs+0x48/0x90 [ 4497.509421] mlx5_irq_table_free_irqs+0x38/0x50 [ 4497.509714] mlx5_core_eq_free_irqs+0x27/0x40 [ 4497.509984] shutdown+0x7b/0x100 [ 4497.510184] pci_device_shutdown+0x30/0x60 [ 4497.510440] device_shutdown+0x14d/0x240 [ 4497.510698] kernel_power_off+0x30/0x70 [ 4497.510938] process_one_work+0x1e6/0x3e0 [ 4497.511183] worker_thread+0x49/0x3b0 [ 4497.511407] ? __pfx_worker_thread+0x10/0x10 [ 4497.511679] kthread+0xe0/0x110 [ 4497.511879] ? __pfx_kthread+0x10/0x10 [ 4497.512114] ret_from_fork+0x29/0x50 [ 4497.512342] </TASK> Fixes: 9c2d08010963 ("net/mlx5: Free irqs only on shutdown callback") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com>
2023-06-16	net/mlx5: DR, Fix wrong action data allocation in decap action	Yevgeny Kliteynik
	When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local variable was passed as its HW action data, resulting in attempt to free invalid address: BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core] Fixes: 4781df92f4da ("net/mlx5: DR, Move STEv0 modify header logic") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: DR, Support SW created encap actions for FW table	Yevgeny Kliteynik
	In some cases, steering might need to use SW-created action in FW table, which results in wrong packet reformat being used: mlx5_core 0000:81:00.1: mlx5_cmd_check:756:(pid 1154): SET_FLOW_TABLE_ENTRY(0×936) op_mod(0×0) failed, status bad resource(0×5), syndrome (0xf2ff71) This patch adds support for usage of SW-created packet reformat (encap) actions in FW tables, and adds clear error flow for attempt to use SW-created modify header on FW tables. Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: TC, Cleanup ct resources for nic flow	Chris Mi
	The cited commit removes special handling of CT action. But it removes too much. Pre ct/ct_nat tables and some other resources are not destroyed due to the cited commit. Fix it by adding it back. Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: TC, Add null pointer check for hardware miss support	Chris Mi
	The cited commits add hardware miss support to tc action. But if the rules can't be offloaded, the pointers are null and system will panic when accessing them. Fix it by checking null pointer. Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Fix driver load with single msix vector	Eli Cohen
	When a PCI device has just one msix vector available, we want to share this vector between async and completion events. Current code fails to do that assuming it will always have at least one dedicated vector for completion events. Fix this by detecting when the pool contains just a single vector. Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: xsk: Set napi_id to support busy polling on XSK RQ	Maxim Mikityanskiy
	The cited commit missed setting napi_id on XSK RQs, it only affected regular RQs. Add the missing part to support socket busy polling on XSK RQs. Fixes: a2740f529da2 ("net/mlx5e: xsk: Set napi_id to support busy polling") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: XDP, Allow growing tail for XDP multi buffer	Maxim Mikityanskiy
	The cited commits missed passing frag_size to __xdp_rxq_info_reg, which is required by bpf_xdp_adjust_tail to support growing the tail pointer in fragmented packets. Pass the missing parameter when the current RQ mode allows XDP multi buffer. Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") Fixes: 9cb9482ef10e ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Cc: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Remove unused ecpu field from struct mlx5_sf_table	Jiri Pirko
	"ecpu" field in struct mlx5_sf_table is not used anywhere. Remove it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Add header file for events	Juhee Kang
	Separate the event API defined in the generic mlx5.h header into a dedicated header. And remove the TODO comment in commit 69c1280b1f3b ("net/mlx5: Device events, Use async events chain"). Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: DR, update query of HCA caps for EC VFs	Daniel Jurgens
	This change is needed to use EC VFs with metadata based steering. There was an assumption that vport was equal to function ID. That's not the case for EC VF functions. Adjust to function ID and set the ec_vf_function bit accordingly. Fixes: 9ac0b128248e ("net/mlx5: Update vport caps query/set for EC VFs") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Fix the macro for accessing EC VF vports	Daniel Jurgens
	The last value is not set correctly. This results in representors not being created for all EC VFs when the base value is higher than 0. Fixes: a7719b29a821 ("net/mlx5: Add management of EC VF vports") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: Add local loopback counter to vport stats	Or Har-Toov
	Add counter for number of unicast, multicast and broadcast packets/ octets that were loopback. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5e: Remove mlx5e_dbg() and msglvl support	Gal Pressman
	The msglvl support was implemented using the mlx5e_dbg() macro which is rarely used in the driver, and is not very useful when you can just use dynamic debug instead. Remove mlx5e_dbg() and convert its usages to netdev_dbg(). Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: E-Switch, remove redundant else statements	Saeed Mahameed
	These else statement blocks are redundant since the if block already jumps to the function abort label. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
2023-06-16	net/mlx5: Bridge, expose FDB state via debugfs	Vlad Buslov
	For debugging purposes expose offloaded FDB state (flags, counters, etc.) via debugfs inside 'esw' root directory. Example debugfs file output: $ cat mlx5/0000\:08\:00.0/esw/bridge/bridge1/fdb DEV MAC VLAN PACKETS BYTES LASTUSE FLAGS enp8s0f0_1 e4:0a:05:08:00:06 2 2 204 4295567112 0x0 enp8s0f0_0 e4:0a:05:08:00:03 2 3 278 4295567112 0x0 Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Bridge, pass net device when linking vport to bridge	Vlad Buslov
	Following patch requires access to additional data in bridge net_device. Pass the whole structure down the stack instead of adding necessary fields as function arguments one-by-one. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Create eswitch debugfs root directory	Vlad Buslov
	Following patch in series uses the new directory for bridge FDB debugfs. The new directory is intended for all future eswitch-specific debugfs files. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Handle sync reset unload event	Moshe Shemesh
	Added a new event handler to firmware sync reset, which is used to support firmware sync reset flow on smart NIC. Adding this new stage to the flow enables the firmware to ensure host PFs unload before ECPFs unload, to avoid race of PFs recovery. If firmware sends sync_reset_unload event to driver the driver should unload and close all HW resources of the function. Once the driver finishes unloading part, it can't get any more events from firmware as event queues are closed, so it polls the reset state field to know when to continue to next stage of the sync reset flow. Added capability bit for supporting sync_reset_unload event. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Check DTOR entry value is not zero	Moshe Shemesh
	The Default Timeout Register (DTOR) provides timeout values to driver for flows that are device dependent. Zero value for DTOR entry is not valid and should not be used. In case of reading zero value from DTOR, the driver should use the hard coded SW default value instead. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Expose timeout for sync reset unload stage	Moshe Shemesh
	Expose new timoueout in Default Timeouts Register to be used on sync reset flow running on smart NIC. In this flow the driver should know how much time to wait from getting unload request till firmware will ask the PF to continue to next stage of the flow. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16	net/mlx5: Ack on sync_reset_request only if PF can do reset_now	Moshe Shemesh
	Verify at reset_request stage that PF is capable to do reset_now. In case PF is not capable, notify the firmware that the sync reset can not happen and so firmware will abort the sync reset at early stage and will not send reset_now event to any PF. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>