linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-10-30	net: lan969x: add PTP handler function	Daniel Machon
	Add PTP IRQ handler for lan969x. This is required, as the PTP registers are placed in two different targets on Sparx5 and lan969x. The implementation is otherwise the same as on Sparx5. Also, expose sparx5_get_hwtimestamp() for use by lan969x. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-10-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: lan969x: add lan969x ops to match data	Daniel Machon
	Add a bunch of small lan969x ops in bulk. These ops are explained in detail in a previous series [1]. [1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-8-d3290f581663@microchip.com/ Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-9-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: lan969x: add constants to match data	Daniel Machon
	Add the lan969x constants to match data. These are already used throughout the Sparx5 code (introduced in earlier series [1]), so no need to update any code use. [1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-0-d3290f581663@microchip.com/ Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-8-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: lan969x: add register diffs to match data	Daniel Machon
	Add new file lan969x_regs.c that defines all the register differences for lan969x, and add it to the lan969x match data. GW_DEV2G5_PHASE_DETECTOR_CTRL, FP_DEV2G5_PHAD_CTRL_PHAD_ENA and FP_DEV2G5_PHAD_CTRL_PHAD_FAILED are required by the new register macros which was introduced earlier. Add these for Sparx5 also. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-7-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: lan969x: add match data for lan969x	Daniel Machon
	Add match data for lan969x, with initial fields for iomap, iomap_size and ioranges. Add new Kconfig symbol CONFIG_LAN969X_CONFIG for compiling the lan969x driver. It has been decided to give lan969x its own Kconfig symbol, as a considerable amount of code is needed, beside the Sparx5 code, to add full chip support (and more will be added in future series). Also this makes it possible to compile Sparx5 without lan969x. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-6-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: sparx5: add registers required by lan969x	Daniel Machon
	Lan969x will require a few additional registers for certain operations. Some are shared, some are not. Add these. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-5-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: sparx5: add sparx5 context pointer to a few functions	Daniel Machon
	In preparation for lan969x, add the sparx5 context pointer to certain IFH (Internal Frame Header) functions. This is required, as the is_sparx5() function will be used here in a subsequent patch. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-4-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: sparx5: change frequency calculation for SDLB's	Daniel Machon
	In preparation for lan969x, rework the function that calculates the SDLB (Service Dual Leacky Bucket) clock. This is required, as the HSCH_SYS_CLK_PER register is Sparx5-exclusive. Instead derive the clock from the core clock, using the sparx5_clk_period() function. The clock stays the same before and after this patch, only now, sparx5_sdlb_clk_hz_get() can be used for lan969x too. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-3-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: sparx5: change spx5_wr to spx5_rmw in cal update()	Daniel Machon
	In preparation for lan969x, use spx5_rmw() for enabling the update of the calendar. This is required to not overwrite the DSM_TAXI_CAL_CFG register, as an additional write will be added before this one, in a subsequent patch. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-2-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: sparx5: add support for lan969x targets and core clock	Daniel Machon
	In preparation for lan969x, add lan969x targets to sparx5_target_chiptype and set the core clock frequency for these throughout. Lan969x only supports a core clock frequency of 328MHz. Also, set the policer update internal (pol_upd_int) matching the 328 MHz frequency of the lan969x targets. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-1-a0b5fae88a0f@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	gve: change to use page_pool_put_full_page when recycling pages	Harshitha Ramamurthy
	The driver currently uses page_pool_put_page() to recycle page pool pages. Since gve uses split pages, if the fragment being recycled is not the last fragment in the page, there is no dma sync operation. When the last fragment is recycled, dma sync is performed by page pool infra according to the value passed as dma_sync_size which right now is set to the size of fragment. But the correct thing to do is to dma sync the entire page when the last fragment is recycled. Hence change to using page_pool_put_full_page(). Link: https://lore.kernel.org/netdev/89d7ce83-cc1d-4791-87b5-6f7af29a031d@huawei.com/ Suggested-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Fixes: ebdfae0d377b ("gve: adopt page pool for DQ RDA mode") Link: https://patch.msgid.link/20241023221141.3008011-1-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	Merge branch 'refactoring-rvu-nic-driver'	Jakub Kicinski
	Geetha sowjanya says: ==================== Refactoring RVU NIC driver This is a preparation pathset for follow-up "Introducing RVU representors driver" patches. The RVU representor driver creates representor netdev of each rvu device when switch dev mode is enabled. RVU representor and NIC have a similar set of HW resources(NIX_LF,RQ/SQ/CQ) and implements a subset of NIC functionality. This patch set groups hw resources and queue configuration code into single API and export the existing functions so, that code can be shared between NIC and representor drivers. These patches are part of "Introduce RVU representors" patchset. https://lore.kernel.org/all/ZsdJ-w00yCI4NQ8T@nanopsycho.orion/T/ As suggested by "Jiri Pirko", submitting as separate patchset. ==================== Link: https://patch.msgid.link/20241023161843.15543-1-gakula@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	octeontx2-pf: Move shared APIs to header file	Geetha sowjanya
	Move mbox, hw resources and interrupt configuration functions to common header file. So, that they can be used later by the RVU representor driver. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241023161843.15543-5-gakula@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	octeontx2-pf: Reuse PF max mtu value	Geetha sowjanya
	Reuse the maximum support HW MTU value that is fetch during probe. Instead of fetching through mbox each time mtu is changed as the value is fixed for interface. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241023161843.15543-4-gakula@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	octeontx2-pf: Add new APIs for queue memory alloc/free.	Geetha sowjanya
	Group the queue(RX/TX/CQ) memory allocation and free code to single APIs. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241023161843.15543-3-gakula@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	octeontx2-pf: Define common API for HW resources configuration	Geetha sowjanya
	Define new API "otx2_init_rsrc" and move the HW blocks NIX/NPA resources configuration code under this API. So, that it can be used by the RVU representor driver that has similar resources of RVU NIC. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241023161843.15543-2-gakula@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	Merge branch 'mirroring-to-dsa-cpu-port'	Jakub Kicinski
	Vladimir Oltean says: ==================== Mirroring to DSA CPU port Users of the NXP LS1028A SoC (drivers/net/dsa/ocelot L2 switch inside) have requested to mirror packets from the ingress of a switch port to software. Both port-based and flow-based mirroring is required. The simplest way I could come up with was to set up tc mirred actions towards a dummy net_device, and make the offloading of that be accepted by the driver. Currently, the pattern in drivers is to reject mirred towards ports they don't know about, but I'm now permitting that, precisely by mirroring "to the CPU". For testers, this series depends on commit 34d35b4edbbe ("net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers") from net/main, which is absent from net-next as of the day of posting (Oct 23). Without the bug fix it is possible to create invalid configurations which are not rejected by the kernel. v2: https://lore.kernel.org/20241017165215.3709000-1-vladimir.oltean@nxp.com RFC: https://lore.kernel.org/20240913152915.2981126-1-vladimir.oltean@nxp.com For historical purposes, link to a much older (and much different) attempt: https://lore.kernel.org/20191002233750.13566-1-olteanv@gmail.com ==================== Link: https://patch.msgid.link/20241023135251.1752488-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: mscc: ocelot: allow tc-flower mirred action towards foreign interfaces	Vladimir Oltean
	Debugging certain flows in the offloaded switch data path can be done by installing two tc-mirred filters for mirroring: one in the hardware data path, which copies the frames to the CPU, and one which takes the frame from there and mirrors it to a virtual interface like a dummy device, where it can be seen with tcpdump. The effect of having 2 filters run on the same packet can be obtained by default using tc, by not specifying either the 'skip_sw' or 'skip_hw' keywords. Instead of refusing to offload mirroring/redirecting packets towards interfaces that aren't switch ports, just treat every other destination for what it is: something that is handled in software, behind the CPU port. Usage: $ ip link add dummy0 type dummy; ip link set dummy0 up $ tc qdisc add dev swp0 clsact $ tc filter add dev swp0 ingress protocol ip flower action mirred ingress mirror dev dummy0 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: dsa: allow matchall mirroring rules towards the CPU	Vladimir Oltean
	If the CPU bandwidth capacity permits, it may be useful to mirror the entire ingress of a user port to software. This is in fact possible to express even if there is no net_device representation for the CPU port. In fact, that approach was already exhausted and that representation wouldn't have even helped [1]. The idea behind implementing this is that currently, we refuse to offload any mirroring towards a non-DSA target net_device. But if we acknowledge the fact that to reach any foreign net_device, the switch must send the packet to the CPU anyway, then we can simply offload just that part, and let the software do the rest. There is only one condition we need to uphold: the filter needs to be present in the software data path as well (no skip_sw). There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress of target interface). We don't have the ability/API to offload FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port, but we could also permit that through mirred to the CPU + software. Example: $ ip link add dummy0 type dummy; ip link set dummy0 up $ tc qdisc add dev swp0 clsact $ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0 Any DSA driver with a ds->ops->port_mirror_add() implementation can now make use of this with no additional change. [1] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: dsa: add more extack messages in dsa_user_add_cls_matchall_mirred()	Vladimir Oltean
	Do not leave -EOPNOTSUPP errors without an explanation. It is confusing for the user to figure out what is wrong otherwise. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: dsa: use "extack" as argument to flow_action_basic_hw_stats_check()	Vladimir Oltean
	We already have an "extack" stack variable in dsa_user_add_cls_matchall_police() and dsa_user_add_cls_matchall_mirred(), there is no need to retrieve it again from cls->common.extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: dsa: clean up dsa_user_add_cls_matchall()	Vladimir Oltean
	The body is a bit hard to read, hard to extend, and has duplicated conditions. Clean up the "if (many conditions) else if (many conditions, some of them repeated)" pattern by: - Moving the repeated conditions out - Replacing the repeated tests for the same variable with a switch/case - Moving the protocol check inside the dsa_user_add_cls_matchall_mirred() function call. This is pure refactoring, no logic has been changed, though some tests were reordered. The order does not matter - they are independent things to be tested for. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	net: sched: propagate "skip_sw" flag to struct flow_cls_common_offload	Vladimir Oltean
	Background: switchdev ports offload the Linux bridge, and most of the packets they handle will never see the CPU. The ports between which there exists no hardware data path are considered 'foreign' to switchdev. These can either be normal physical NICs without switchdev offload, or incompatible switchdev ports, or virtual interfaces like veth/dummy/etc. In some cases, an offloaded filter can only do half the work, and the rest must be handled by software. Redirecting/mirroring from the ingress of a switchdev port towards a foreign interface is one example of combined hardware/software data path. The most that the switchdev port can do is to extract the matching packets from its offloaded data path and send them to the CPU. From there on, the software filter runs (a second time, after the first run in hardware) on the packet and performs the mirred action. It makes sense for switchdev drivers which allow this kind of "half offloading" to sense the "skip_sw" flag of the filter/action pair, and deny attempts from the user to install a filter that does not run in software, because that simply won't work. In fact, a mirred action on a switchdev port towards a dummy interface appears to be a valid way of (selectively) monitoring offloaded traffic that flows through it. IFF_PROMISC was also discussed years ago, but (despite initial disagreement) there seems to be consensus that this flag should not affect the destination taken by packets, but merely whether or not the NIC discards packets with unknown MAC DA for local processing. [1] https://lore.kernel.org/netdev/20190830092637.7f83d162@ceranb/ [2] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/ Suggested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/netdev/ZxUo0Dc0M5Y6l9qF@shredder.mtl.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20241023135251.1752488-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	Merge branch 'ptp-driver-for-s390-clocks'	Jakub Kicinski
	Sven Schnelle says: ==================== PtP driver for s390 clocks these patches add support for using the s390 physical and TOD clock as ptp clock. To do so, the first patch adds a clock id to the s390 TOD clock, while the second patch adds the PtP driver itself. ==================== Link: https://patch.msgid.link/20241023065601.449586-1-svens@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	s390/time: Add PtP driver	Sven Schnelle
	Add a small PtP driver which allows user space to get the values of the physical and tod clock. This allows programs like chrony to use STP as clock source and steer the kernel clock. The physical clock can be used as a debugging aid to get the clock without any additional offsets like STP steering or LPAR offset. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://patch.msgid.link/20241023065601.449586-3-svens@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	s390/time: Add clocksource id to TOD clock	Sven Schnelle
	To allow specifying the clock source in the upcoming PtP driver, add a clocksource ID to the s390 TOD clock. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://patch.msgid.link/20241023065601.449586-2-svens@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	tests: hsr: Increase timeout to 50 seconds	Yunshui Jiang
	The HSR test, hsr_ping.sh, actually needs 7 min to run. Around 375s to be exact, and even more on a debug kernel or kernel with other network security limits. The timeout setting for the kselftest is currently 45 seconds, which is way too short to integrate hsr tests to run_kselftest infrastructure. However, timeout of hundreds of seconds is quite a long time, especially in a CI/CD environment. It seems that we need accelerate the test and balance with timeout setting. The most time-consuming func is do_ping_long, where ping command sends 10 packages to the given address. The default interval between two ping packages is 1s according to the ping Mannual. There isn't any operation between pings thus we could pass -i 0.1 to ping to make it 10 times faster. While even with this short interval, the test still need about 46.4 seconds to finish because of the two HSR interfaces, each of which is tested by calling do_ping func 12 times and do_ping_long func 19 times and sleep for 3s. So, an explicit setting is also needed to slightly increase the timeout. And to leave us some slack, use 50 as default timeout. Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn> Link: https://patch.msgid.link/20241028082757.2945232-1-jiangyunshui@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30	x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()	Linus Torvalds
	The barrier_nospec() in 64-bit copy_from_user() is slow. Instead use pointer masking to force the user pointer to all 1's for an invalid address. The kernel test robot reports a 2.6% improvement in the per_thread_ops benchmark [1]. This is a variation on a patch originally by Josh Poimboeuf [2]. Link: https://lore.kernel.org/202410281344.d02c72a2-oliver.sang@intel.com [1] Link: https://lore.kernel.org/5b887fe4c580214900e21f6c61095adf9a142735.1730166635.git.jpoimboe@kernel.org [2] Tested-and-reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-30	Merge tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Update more header copies with the kernel sources, including const.h, msr-index.h, arm64's cputype.h, kvm's, bits.h and unaligned.h - The return from 'write' isn't a pid, fix cut'n'paste error in 'perf trace' - Fix up the python binding build on architectures without HAVE_KVM_STAT_SUPPORT - Add some more bounds checks to augmented_raw_syscalls.bpf.c (used to collect syscall pointer arguments in 'perf trace') to make the resulting bytecode to pass the kernel BPF verifier, allowing us to go back accepting clang 12.0.1 as the minimum version required for compiling BPF sources - Add __NR_capget for x86 to fix a regression on running perf + intel PT (hw tracing) as non-root setting up the capabilities as described in https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html - Fix missing syscalltbl in non-explicitly listed architectures, noticed on ARM 32-bit, that still needs a .tbl generator for the syscall id<->name tables, should be added for v6.13 - Handle 'perf test' failure when handling broken DWARF for ASM files * tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf cap: Add __NR_capget to arch/x86 unistd tools headers: Update the linux/unaligned.h copy with the kernel sources tools headers arm64: Sync arm64's cputype.h with the kernel sources tools headers: Synchronize {uapi/}linux/bits.h with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT perf test: Handle perftool-testsuite_probe failure due to broken DWARF tools headers UAPI: Sync kvm headers with the kernel sources perf trace: Fix non-listed archs in the syscalltbl routines perf build: Change the clang check back to 12.0.1 perf trace augmented_raw_syscalls: Add more checks to pass the verifier perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers perf trace: The return from 'write' isn't a pid tools headers UAPI: Sync linux/const.h with the kernel headers
2024-10-30	Merge branch 'fixes-for-bits-iterator'	Alexei Starovoitov
	Hou Tao says: ==================== The patch set fixes several issues in bits iterator. Patch #1 fixes the kmemleak problem of bits iterator. Patch #2~#3 fix the overflow problem of nr_bits. Patch #4 fixes the potential stack corruption when bits iterator is used on 32-bit host. Patch #5 adds more test cases for bits iterator. Please see the individual patches for more details. And comments are always welcome. --- v4: * patch #1: add ack from Yafang * patch #3: revert code-churn like changes: (1) compute nr_bytes and nr_bits before the check of nr_words. (2) use nr_bits == 64 to check for single u64, preventing build warning on 32-bit hosts. * patch #4: use "BITS_PER_LONG == 32" instead of "!defined(CONFIG_64BIT)" v3: https://lore.kernel.org/bpf/20241025013233.804027-1-houtao@huaweicloud.com/T/#t * split the bits-iterator related patches from "Misc fixes for bpf" patch set * patch #1: use "!nr_bits \|\| bits >= nr_bits" to stop the iteration * patch #2: add a new helper for the overflow problem * patch #3: decrease the limitation from 512 to 511 and check whether nr_bytes is too large for bpf memory allocator explicitly * patch #5: add two more test cases for bit iterator v2: http://lore.kernel.org/bpf/d49fa2f4-f743-c763-7579-c3cab4dd88cb@huaweicloud.com ==================== Link: https://lore.kernel.org/r/20241030100516.3633640-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30	selftests/bpf: Add three test cases for bits_iter	Hou Tao
	Add more test cases for bits iterator: (1) huge word test Verify the multiplication overflow of nr_bits in bits_iter. Without the overflow check, when nr_words is 67108865, nr_bits becomes 64, causing bpf_probe_read_kernel_common() to corrupt the stack. (2) max word test Verify correct handling of maximum nr_words value (511). (3) bad word test Verify early termination of bits iteration when bits iterator initialization fails. Also rename bits_nomem to bits_too_big to better reflect its purpose. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30	bpf: Use __u64 to save the bits in bits iterator	Hou Tao
	On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the content of the u64. However, bits_copy is only 4 bytes, leading to stack corruption. The straightforward solution would be to replace u64 with unsigned long in bpf_iter_bits_new(). However, this introduces confusion and problems for 32-bit hosts because the size of ulong in bpf program is 8 bytes, but it is treated as 4-bytes after passed to bpf_iter_bits_new(). Fix it by changing the type of both bits and bit_count from unsigned long to u64. However, the change is not enough. The main reason is that bpf_iter_bits_next() uses find_next_bit() to find the next bit and the pointer passed to find_next_bit() is an unsigned long pointer instead of a u64 pointer. For 32-bit little-endian host, it is fine but it is not the case for 32-bit big-endian host. Because under 32-bit big-endian host, the first iterated unsigned long will be the bits 32-63 of the u64 instead of the expected bits 0-31. Therefore, in addition to changing the type, swap the two unsigned longs within the u64 for 32-bit big-endian host. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30	bpf: Check the validity of nr_words in bpf_iter_bits_new()	Hou Tao
	Check the validity of nr_words in bpf_iter_bits_new(). Without this check, when multiplication overflow occurs for nr_bits (e.g., when nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008). Fix it by limiting the maximum value of nr_words to 511. The value is derived from the current implementation of BPF memory allocator. To ensure compatibility if the BPF memory allocator's size limitation changes in the future, use the helper bpf_mem_alloc_check_size() to check whether nr_bytes is too larger. And return -E2BIG instead of -ENOMEM for oversized nr_bytes. Fixes: 4665415975b0 ("bpf: Add bits iterator") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30	bpf: Add bpf_mem_alloc_check_size() helper	Hou Tao
	Introduce bpf_mem_alloc_check_size() to check whether the allocation size exceeds the limitation for the kmalloc-equivalent allocator. The upper limit for percpu allocation is LLIST_NODE_SZ bytes larger than non-percpu allocation, so a percpu argument is added to the helper. The helper will be used in the following patch to check whether the size parameter passed to bpf_mem_alloc() is too big. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30	bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()	Hou Tao
	bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the bits are dynamically allocated. However, the check is incorrect and may cause a kmemleak as shown below: unreferenced object 0xffff88812628c8c0 (size 32): comm "swapper/0", pid 1, jiffies 4294727320 hex dump (first 32 bytes): b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0 ..U........... f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00 .............. backtrace (crc 781e32cc): [<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80 [<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0 [<00000000597124d6>] __alloc.isra.0+0x89/0xb0 [<000000004ebfffcd>] alloc_bulk+0x2af/0x720 [<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0 [<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610 [<000000008b616eac>] bpf_global_ma_init+0x19/0x30 [<00000000fc473efc>] do_one_initcall+0xd3/0x3c0 [<00000000ec81498c>] kernel_init_freeable+0x66a/0x940 [<00000000b119f72f>] kernel_init+0x20/0x160 [<00000000f11ac9a7>] ret_from_fork+0x3c/0x70 [<0000000004671da4>] ret_from_fork_asm+0x1a/0x30 That is because nr_bits will be set as zero in bpf_iter_bits_next() after all bits have been iterated. Fix the issue by setting kit->bit to kit->nr_bits instead of setting kit->nr_bits to zero when the iteration completes in bpf_iter_bits_next(). In addition, use "!nr_bits \|\| bits >= nr_bits" to check whether the iteration is complete and still use "nr_bits > 64" to indicate whether bits are dynamically allocated. The "!nr_bits" check is necessary because bpf_iter_bits_new() may fail before setting kit->nr_bits, and this condition will stop the iteration early instead of accessing the zeroed or freed kit->bits. Considering the initial value of kit->bits is -1 and the type of kit->nr_bits is unsigned int, change the type of kit->nr_bits to int. The potential overflow problem will be handled in the following patch. Fixes: 4665415975b0 ("bpf: Add bits iterator") Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-30	Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs	Sungwoo Kim
	Fix __hci_cmd_sync_sk() to return not NULL for unknown opcodes. __hci_cmd_sync_sk() returns NULL if a command returns a status event. However, it also returns NULL where an opcode doesn't exist in the hci_cc table because hci_cmd_complete_evt() assumes status = skb->data[0] for unknown opcodes. This leads to null-ptr-deref in cmd_sync for HCI_OP_READ_LOCAL_CODECS as there is no hci_cc for HCI_OP_READ_LOCAL_CODECS, which always assumes status = skb->data[0]. KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] CPU: 1 PID: 2000 Comm: kworker/u9:5 Not tainted 6.9.0-ga6bcb805883c-dirty #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: hci7 hci_power_on RIP: 0010:hci_read_supported_codecs+0xb9/0x870 net/bluetooth/hci_codec.c:138 Code: 08 48 89 ef e8 b8 c1 8f fd 48 8b 75 00 e9 96 00 00 00 49 89 c6 48 ba 00 00 00 00 00 fc ff df 4c 8d 60 70 4c 89 e3 48 c1 eb 03 <0f> b6 04 13 84 c0 0f 85 82 06 00 00 41 83 3c 24 02 77 0a e8 bf 78 RSP: 0018:ffff888120bafac8 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 000000000000000e RCX: ffff8881173f0040 RDX: dffffc0000000000 RSI: ffffffffa58496c0 RDI: ffff88810b9ad1e4 RBP: ffff88810b9ac000 R08: ffffffffa77882a7 R09: 1ffffffff4ef1054 R10: dffffc0000000000 R11: fffffbfff4ef1055 R12: 0000000000000070 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88810b9ac000 FS: 0000000000000000(0000) GS:ffff8881f6c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6ddaa3439e CR3: 0000000139764003 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> hci_read_local_codecs_sync net/bluetooth/hci_sync.c:4546 [inline] hci_init_stage_sync net/bluetooth/hci_sync.c:3441 [inline] hci_init4_sync net/bluetooth/hci_sync.c:4706 [inline] hci_init_sync net/bluetooth/hci_sync.c:4742 [inline] hci_dev_init_sync net/bluetooth/hci_sync.c:4912 [inline] hci_dev_open_sync+0x19a9/0x2d30 net/bluetooth/hci_sync.c:4994 hci_dev_do_open net/bluetooth/hci_core.c:483 [inline] hci_power_on+0x11e/0x560 net/bluetooth/hci_core.c:1015 process_one_work kernel/workqueue.c:3267 [inline] process_scheduled_works+0x8ef/0x14f0 kernel/workqueue.c:3348 worker_thread+0x91f/0xe50 kernel/workqueue.c:3429 kthread+0x2cb/0x360 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: abfeea476c68 ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY") Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-30	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes, both in drivers (ufs and scsi_debug)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Fix another deadlock during RTC update scsi: scsi_debug: Fix do_device_access() handling of unexpected SG copy length
2024-10-30	selftests/bpf: Add a selftest for bpf_csum_diff()	Puranjay Mohan
	Add a selftest for the bpf_csum_diff() helper. This selftests runs the helper in all three configurations(push, pull, and diff) and verifies its output. The correct results have been computed by hand and by the helper's older implementation. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241026125339.26459-5-puranjay@kernel.org
2024-10-30	selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier	Puranjay Mohan
	The bpf_csum_diff() helper has been fixed to return a 16-bit value for all archs, so now we don't need to mask the result. This commit is basically reverting the below: commit 6185266c5a85 ("selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241026125339.26459-4-puranjay@kernel.org
2024-10-30	bpf: bpf_csum_diff: Optimize and homogenize for all archs	Puranjay Mohan
	1. Optimization ------------ The current implementation copies the 'from' and 'to' buffers to a scratchpad and it takes the bitwise NOT of 'from' buffer while copying. In the next step csum_partial() is called with this scratchpad. so, mathematically, the current implementation is doing: result = csum(to - from) Here, 'to' and '~ from' are copied in to the scratchpad buffer, we need it in the scratchpad buffer because csum_partial() takes a single contiguous buffer and not two disjoint buffers like 'to' and 'from'. We can re write this equation to: result = csum(to) - csum(from) using the distributive property of csum(). this allows 'to' and 'from' to be at different locations and therefore this scratchpad and copying is not needed. This in C code will look like: result = csum_sub(csum_partial(to, to_size, seed), csum_partial(from, from_size, 0)); 2. Homogenization -------------- The bpf_csum_diff() helper calls csum_partial() which is implemented by some architectures like arm and x86 but other architectures rely on the generic implementation in lib/checksum.c The generic implementation in lib/checksum.c returns a 16 bit value but the arch specific implementations can return more than 16 bits, this works out in most places because before the result is used, it is passed through csum_fold() that turns it into a 16-bit value. bpf_csum_diff() directly returns the value from csum_partial() and therefore the returned values could be different on different architectures. see discussion in [1]: for the int value 28 the calculated checksums are: x86 : -29 : 0xffffffe3 generic (arm64, riscv) : 65507 : 0x0000ffe3 arm : 131042 : 0x0001ffe2 Pass the result of bpf_csum_diff() through from32to16() before returning to homogenize this result for all architectures. NOTE: from32to16() is used instead of csum_fold() because csum_fold() does from32to16() + bitwise NOT of the result, which is not what we want to do here. [1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/ Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241026125339.26459-3-puranjay@kernel.org
2024-10-30	net: checksum: Move from32to16() to generic header	Puranjay Mohan
	from32to16() is used by lib/checksum.c and also by arch/parisc/lib/checksum.c. The next patch will use it in the bpf_csum_diff helper. Move from32to16() to the include/net/checksum.h as csum_from32to16() and remove other implementations. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241026125339.26459-2-puranjay@kernel.org
2024-10-30	ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1	Christoffer Sandberg
	Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs@tuxedo.de> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-2-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-10-30	ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3	Christoffer Sandberg
	Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs@tuxedo.de> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-10-30	ALSA: usb-audio: Add quirks for Dell WD19 dock	Jan Schär
	The WD19 family of docks has the same audio chipset as the WD15. This change enables jack detection on the WD19. We don't need the dell_dock_mixer_init quirk for the WD19. It is only needed because of the dell_alc4020_map quirk for the WD15 in mixer_maps.c, which disables the volume controls. Even for the WD15, this quirk was apparently only needed when the dock firmware was not updated. Signed-off-by: Jan Schär <jan@jschaer.ch> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20241029221249.15661-1-jan@jschaer.ch Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-10-30	Merge tag 'asoc-fix-v6.12-rc5' of ↵	Takashi Iwai
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.12 The biggest set of changes here is Hans' fixes and quirks for various Baytrail based platforms with RT5640 CODECs, and there's one core fix for a missed length assignment for __counted_by() checking. Otherwise it's small device specific fixes, several of them in the DT bindings.
2024-10-30	Merge branch 'tcp-warn-once'	David S. Miller
	Jason Xing says: ==================== tcp: add tcp_warn_once() common helper Paolo Abeni suggested we can introduce a new helper to cover more cases in the future for better debug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-30	tcp: add more warn of socket in tcp_send_loss_probe()	Jason Xing
	Add two fields to print in the helper which here covers tcp_send_loss_probe(). Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-30	tcp: add a common helper to debug the underlying issue	Jason Xing
	Following the commit c8770db2d544 ("tcp: check skb is non-NULL in tcp_rto_delta_us()"), we decided to add a helper so that it's easier to get verbose warning on either cases. Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-30	netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()	Eric Dumazet
	I got a syzbot report without a repro [1] crashing in nf_send_reset6() I think the issue is that dev->hard_header_len is zero, and we attempt later to push an Ethernet header. Use LL_MAX_HEADER, as other functions in net/ipv6/netfilter/nf_reject_ipv6.c. [1] skbuff: skb_under_panic: text:ffffffff89b1d008 len:74 put:14 head:ffff88803123aa00 data:ffff88803123a9f2 tail:0x3c end:0x140 dev:syz_tun kernel BUG at net/core/skbuff.c:206 ! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 UID: 0 PID: 7373 Comm: syz.1.568 Not tainted 6.12.0-rc2-syzkaller-00631-g6d858708d465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:skb_panic net/core/skbuff.c:206 [inline] RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216 Code: 0d 8d 48 c7 c6 60 a6 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 ba 30 38 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 RSP: 0018:ffffc900045269b0 EFLAGS: 00010282 RAX: 0000000000000088 RBX: dffffc0000000000 RCX: cd66dacdc5d8e800 RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000000 RBP: ffff88802d39a3d0 R08: ffffffff8174afec R09: 1ffff920008a4ccc R10: dffffc0000000000 R11: fffff520008a4ccd R12: 0000000000000140 R13: ffff88803123aa00 R14: ffff88803123a9f2 R15: 000000000000003c FS: 00007fdbee5ff6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000005d322000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_push+0xe5/0x100 net/core/skbuff.c:2636 eth_header+0x38/0x1f0 net/ethernet/eth.c:83 dev_hard_header include/linux/netdevice.h:3208 [inline] nf_send_reset6+0xce6/0x1270 net/ipv6/netfilter/nf_reject_ipv6.c:358 nft_reject_inet_eval+0x3b9/0x690 net/netfilter/nft_reject_inet.c:48 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x4ad/0x1da0 net/netfilter/nf_tables_core.c:288 nft_do_chain_inet+0x418/0x6b0 net/netfilter/nft_chain_filter.c:161 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] NF_HOOK include/linux/netfilter.h:312 [inline] br_nf_pre_routing_ipv6+0x63e/0x770 net/bridge/br_netfilter_ipv6.c:184 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_bridge_pre net/bridge/br_input.c:277 [inline] br_handle_frame+0x9fd/0x1530 net/bridge/br_input.c:424 __netif_receive_skb_core+0x13e8/0x4570 net/core/dev.c:5562 __netif_receive_skb_one_core net/core/dev.c:5666 [inline] __netif_receive_skb+0x12f/0x650 net/core/dev.c:5781 netif_receive_skb_internal net/core/dev.c:5867 [inline] netif_receive_skb+0x1e8/0x890 net/core/dev.c:5926 tun_rx_batched+0x1b7/0x8f0 drivers/net/tun.c:1550 tun_get_user+0x3056/0x47e0 drivers/net/tun.c:2007 tun_chr_write_iter+0x10d/0x1f0 drivers/net/tun.c:2053 new_sync_write fs/read_write.c:590 [inline] vfs_write+0xa6d/0xc90 fs/read_write.c:683 ksys_write+0x183/0x2b0 fs/read_write.c:736 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fdbeeb7d1ff Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 8d 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 1c 8e 02 00 48 RSP: 002b:00007fdbee5ff000 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fdbeed36058 RCX: 00007fdbeeb7d1ff RDX: 000000000000008e RSI: 0000000020000040 RDI: 00000000000000c8 RBP: 00007fdbeebf12be R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000008e R11: 0000000000000293 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fdbeed36058 R15: 00007ffc38de06e8 </TASK> Fixes: c8d7b98bec43 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-30	netfilter: Fix use-after-free in get_info()	Dong Chenchen
	ip6table_nat module unload has refcnt warning for UAF. call trace is: WARNING: CPU: 1 PID: 379 at kernel/module/main.c:853 module_put+0x6f/0x80 Modules linked in: ip6table_nat(-) CPU: 1 UID: 0 PID: 379 Comm: ip6tables Not tainted 6.12.0-rc4-00047-gc2ee9f594da8-dirty #205 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:module_put+0x6f/0x80 Call Trace: <TASK> get_info+0x128/0x180 do_ip6t_get_ctl+0x6a/0x430 nf_getsockopt+0x46/0x80 ipv6_getsockopt+0xb9/0x100 rawv6_getsockopt+0x42/0x190 do_sock_getsockopt+0xaa/0x180 __sys_getsockopt+0x70/0xc0 __x64_sys_getsockopt+0x20/0x30 do_syscall_64+0xa2/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Concurrent execution of module unload and get_info() trigered the warning. The root cause is as follows: cpu0 cpu1 module_exit //mod->state = MODULE_STATE_GOING ip6table_nat_exit xt_unregister_template kfree(t) //removed from templ_list getinfo() t = xt_find_table_lock list_for_each_entry(tmpl, &xt_templates[af]...) if (strcmp(tmpl->name, name)) continue; //table not found try_module_get list_for_each_entry(t, &xt_net->tables[af]...) return t; //not get refcnt module_put(t->me) //uaf unregister_pernet_subsys //remove table from xt_net list While xt_table module was going away and has been removed from xt_templates list, we couldnt get refcnt of xt_table->me. Check module in xt_net->tables list re-traversal to fix it. Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>