Age | Commit message (Collapse) | Author |
|
Add PTP IRQ handler for lan969x. This is required, as the PTP registers
are placed in two different targets on Sparx5 and lan969x. The
implementation is otherwise the same as on Sparx5.
Also, expose sparx5_get_hwtimestamp() for use by lan969x.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-10-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a bunch of small lan969x ops in bulk. These ops are explained in
detail in a previous series [1].
[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-8-d3290f581663@microchip.com/
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-9-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the lan969x constants to match data. These are already used
throughout the Sparx5 code (introduced in earlier series [1]), so no
need to update any code use.
[1] https://lore.kernel.org/netdev/20241004-b4-sparx5-lan969x-switch-driver-v2-0-d3290f581663@microchip.com/
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-8-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add new file lan969x_regs.c that defines all the register differences
for lan969x, and add it to the lan969x match data.
GW_DEV2G5_PHASE_DETECTOR_CTRL, FP_DEV2G5_PHAD_CTRL_PHAD_ENA and
FP_DEV2G5_PHAD_CTRL_PHAD_FAILED are required by the new register macros
which was introduced earlier. Add these for Sparx5 also.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-7-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add match data for lan969x, with initial fields for iomap, iomap_size
and ioranges. Add new Kconfig symbol CONFIG_LAN969X_CONFIG for compiling
the lan969x driver.
It has been decided to give lan969x its own Kconfig symbol, as a
considerable amount of code is needed, beside the Sparx5 code, to add
full chip support (and more will be added in future series). Also this
makes it possible to compile Sparx5 without lan969x.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-6-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Lan969x will require a few additional registers for certain operations.
Some are shared, some are not. Add these.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-5-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for lan969x, add the sparx5 context pointer to certain
IFH (Internal Frame Header) functions. This is required, as the
is_sparx5() function will be used here in a subsequent patch.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-4-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for lan969x, rework the function that calculates the SDLB
(Service Dual Leacky Bucket) clock. This is required, as the
HSCH_SYS_CLK_PER register is Sparx5-exclusive. Instead derive the clock
from the core clock, using the sparx5_clk_period() function. The clock
stays the same before and after this patch, only now,
sparx5_sdlb_clk_hz_get() can be used for lan969x too.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-3-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for lan969x, use spx5_rmw() for enabling the update of
the calendar. This is required to not overwrite the DSM_TAXI_CAL_CFG
register, as an additional write will be added before this one, in a
subsequent patch.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-2-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for lan969x, add lan969x targets to
sparx5_target_chiptype and set the core clock frequency for these
throughout. Lan969x only supports a core clock frequency of 328MHz.
Also, set the policer update internal (pol_upd_int) matching the 328 MHz
frequency of the lan969x targets.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241024-sparx5-lan969x-switch-driver-2-v2-1-a0b5fae88a0f@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver currently uses page_pool_put_page() to recycle
page pool pages. Since gve uses split pages, if the fragment
being recycled is not the last fragment in the page, there
is no dma sync operation. When the last fragment is recycled,
dma sync is performed by page pool infra according to the
value passed as dma_sync_size which right now is set to the
size of fragment.
But the correct thing to do is to dma sync the entire page when
the last fragment is recycled. Hence change to using
page_pool_put_full_page().
Link: https://lore.kernel.org/netdev/89d7ce83-cc1d-4791-87b5-6f7af29a031d@huawei.com/
Suggested-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Fixes: ebdfae0d377b ("gve: adopt page pool for DQ RDA mode")
Link: https://patch.msgid.link/20241023221141.3008011-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Geetha sowjanya says:
====================
Refactoring RVU NIC driver
This is a preparation pathset for follow-up "Introducing RVU representors
driver" patches. The RVU representor driver creates representor netdev
of each rvu device when switch dev mode is enabled.
RVU representor and NIC have a similar set of HW resources(NIX_LF,RQ/SQ/CQ)
and implements a subset of NIC functionality.
This patch set groups hw resources and queue configuration code into single
API and export the existing functions so, that code can be shared between
NIC and representor drivers.
These patches are part of "Introduce RVU representors" patchset.
https://lore.kernel.org/all/ZsdJ-w00yCI4NQ8T@nanopsycho.orion/T/
As suggested by "Jiri Pirko", submitting as separate patchset.
====================
Link: https://patch.msgid.link/20241023161843.15543-1-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move mbox, hw resources and interrupt configuration functions to common
header file. So, that they can be used later by the RVU representor driver.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-5-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reuse the maximum support HW MTU value that is fetch during probe.
Instead of fetching through mbox each time mtu is changed as the
value is fixed for interface.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-4-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Group the queue(RX/TX/CQ) memory allocation and free code to single APIs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-3-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Define new API "otx2_init_rsrc" and move the HW blocks
NIX/NPA resources configuration code under this API. So, that
it can be used by the RVU representor driver that has similar
resources of RVU NIC.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241023161843.15543-2-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir Oltean says:
====================
Mirroring to DSA CPU port
Users of the NXP LS1028A SoC (drivers/net/dsa/ocelot L2 switch inside)
have requested to mirror packets from the ingress of a switch port to
software. Both port-based and flow-based mirroring is required.
The simplest way I could come up with was to set up tc mirred actions
towards a dummy net_device, and make the offloading of that be accepted
by the driver. Currently, the pattern in drivers is to reject mirred
towards ports they don't know about, but I'm now permitting that,
precisely by mirroring "to the CPU".
For testers, this series depends on commit 34d35b4edbbe ("net/sched:
act_api: deny mismatched skip_sw/skip_hw flags for actions created by
classifiers") from net/main, which is absent from net-next as of the
day of posting (Oct 23). Without the bug fix it is possible to create
invalid configurations which are not rejected by the kernel.
v2: https://lore.kernel.org/20241017165215.3709000-1-vladimir.oltean@nxp.com
RFC: https://lore.kernel.org/20240913152915.2981126-1-vladimir.oltean@nxp.com
For historical purposes, link to a much older (and much different) attempt:
https://lore.kernel.org/20191002233750.13566-1-olteanv@gmail.com
====================
Link: https://patch.msgid.link/20241023135251.1752488-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Debugging certain flows in the offloaded switch data path can be done by
installing two tc-mirred filters for mirroring: one in the hardware data
path, which copies the frames to the CPU, and one which takes the frame
from there and mirrors it to a virtual interface like a dummy device,
where it can be seen with tcpdump.
The effect of having 2 filters run on the same packet can be obtained by
default using tc, by not specifying either the 'skip_sw' or 'skip_hw'
keywords.
Instead of refusing to offload mirroring/redirecting packets towards
interfaces that aren't switch ports, just treat every other destination
for what it is: something that is handled in software, behind the CPU
port.
Usage:
$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress protocol ip flower action mirred ingress mirror dev dummy0
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the CPU bandwidth capacity permits, it may be useful to mirror the
entire ingress of a user port to software.
This is in fact possible to express even if there is no net_device
representation for the CPU port. In fact, that approach was already
exhausted and that representation wouldn't have even helped [1].
The idea behind implementing this is that currently, we refuse to
offload any mirroring towards a non-DSA target net_device. But if we
acknowledge the fact that to reach any foreign net_device, the switch
must send the packet to the CPU anyway, then we can simply offload just
that part, and let the software do the rest. There is only one condition
we need to uphold: the filter needs to be present in the software data
path as well (no skip_sw).
There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress
of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress
of target interface). We don't have the ability/API to offload
FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port,
but we could also permit that through mirred to the CPU + software.
Example:
$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0
Any DSA driver with a ds->ops->port_mirror_add() implementation can now
make use of this with no additional change.
[1] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not leave -EOPNOTSUPP errors without an explanation. It is confusing
for the user to figure out what is wrong otherwise.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We already have an "extack" stack variable in
dsa_user_add_cls_matchall_police() and
dsa_user_add_cls_matchall_mirred(), there is no need to retrieve
it again from cls->common.extack.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The body is a bit hard to read, hard to extend, and has duplicated
conditions.
Clean up the "if (many conditions) else if (many conditions, some
of them repeated)" pattern by:
- Moving the repeated conditions out
- Replacing the repeated tests for the same variable with a switch/case
- Moving the protocol check inside the dsa_user_add_cls_matchall_mirred()
function call.
This is pure refactoring, no logic has been changed, though some tests
were reordered. The order does not matter - they are independent things
to be tested for.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Background: switchdev ports offload the Linux bridge, and most of the
packets they handle will never see the CPU. The ports between which
there exists no hardware data path are considered 'foreign' to switchdev.
These can either be normal physical NICs without switchdev offload, or
incompatible switchdev ports, or virtual interfaces like veth/dummy/etc.
In some cases, an offloaded filter can only do half the work, and the
rest must be handled by software. Redirecting/mirroring from the ingress
of a switchdev port towards a foreign interface is one example of
combined hardware/software data path. The most that the switchdev port
can do is to extract the matching packets from its offloaded data path
and send them to the CPU. From there on, the software filter runs
(a second time, after the first run in hardware) on the packet and
performs the mirred action.
It makes sense for switchdev drivers which allow this kind of "half
offloading" to sense the "skip_sw" flag of the filter/action pair, and
deny attempts from the user to install a filter that does not run in
software, because that simply won't work.
In fact, a mirred action on a switchdev port towards a dummy interface
appears to be a valid way of (selectively) monitoring offloaded traffic
that flows through it. IFF_PROMISC was also discussed years ago, but
(despite initial disagreement) there seems to be consensus that this
flag should not affect the destination taken by packets, but merely
whether or not the NIC discards packets with unknown MAC DA for local
processing.
[1] https://lore.kernel.org/netdev/20190830092637.7f83d162@ceranb/
[2] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/netdev/ZxUo0Dc0M5Y6l9qF@shredder.mtl.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241023135251.1752488-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sven Schnelle says:
====================
PtP driver for s390 clocks
these patches add support for using the s390 physical and TOD clock as ptp
clock. To do so, the first patch adds a clock id to the s390 TOD clock,
while the second patch adds the PtP driver itself.
====================
Link: https://patch.msgid.link/20241023065601.449586-1-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a small PtP driver which allows user space to get
the values of the physical and tod clock. This allows
programs like chrony to use STP as clock source and
steer the kernel clock. The physical clock can be used
as a debugging aid to get the clock without any additional
offsets like STP steering or LPAR offset.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-3-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To allow specifying the clock source in the upcoming PtP driver,
add a clocksource ID to the s390 TOD clock.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://patch.msgid.link/20241023065601.449586-2-svens@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The HSR test, hsr_ping.sh, actually needs 7 min to run. Around 375s to
be exact, and even more on a debug kernel or kernel with other network
security limits. The timeout setting for the kselftest is currently 45
seconds, which is way too short to integrate hsr tests to run_kselftest
infrastructure. However, timeout of hundreds of seconds is quite a long
time, especially in a CI/CD environment. It seems that we need
accelerate the test and balance with timeout setting.
The most time-consuming func is do_ping_long, where ping command sends
10 packages to the given address. The default interval between two ping
packages is 1s according to the ping Mannual. There isn't any operation
between pings thus we could pass -i 0.1 to ping to make it 10 times
faster.
While even with this short interval, the test still need about 46.4
seconds to finish because of the two HSR interfaces, each of which is
tested by calling do_ping func 12 times and do_ping_long func 19 times
and sleep for 3s.
So, an explicit setting is also needed to slightly increase the
timeout. And to leave us some slack, use 50 as default timeout.
Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn>
Link: https://patch.msgid.link/20241028082757.2945232-1-jiangyunshui@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The barrier_nospec() in 64-bit copy_from_user() is slow. Instead use
pointer masking to force the user pointer to all 1's for an invalid
address.
The kernel test robot reports a 2.6% improvement in the per_thread_ops
benchmark [1].
This is a variation on a patch originally by Josh Poimboeuf [2].
Link: https://lore.kernel.org/202410281344.d02c72a2-oliver.sang@intel.com [1]
Link: https://lore.kernel.org/5b887fe4c580214900e21f6c61095adf9a142735.1730166635.git.jpoimboe@kernel.org [2]
Tested-and-reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Update more header copies with the kernel sources, including const.h,
msr-index.h, arm64's cputype.h, kvm's, bits.h and unaligned.h
- The return from 'write' isn't a pid, fix cut'n'paste error in 'perf
trace'
- Fix up the python binding build on architectures without
HAVE_KVM_STAT_SUPPORT
- Add some more bounds checks to augmented_raw_syscalls.bpf.c (used to
collect syscall pointer arguments in 'perf trace') to make the
resulting bytecode to pass the kernel BPF verifier, allowing us to go
back accepting clang 12.0.1 as the minimum version required for
compiling BPF sources
- Add __NR_capget for x86 to fix a regression on running perf + intel
PT (hw tracing) as non-root setting up the capabilities as described
in https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
- Fix missing syscalltbl in non-explicitly listed architectures,
noticed on ARM 32-bit, that still needs a .tbl generator for the
syscall id<->name tables, should be added for v6.13
- Handle 'perf test' failure when handling broken DWARF for ASM files
* tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf cap: Add __NR_capget to arch/x86 unistd
tools headers: Update the linux/unaligned.h copy with the kernel sources
tools headers arm64: Sync arm64's cputype.h with the kernel sources
tools headers: Synchronize {uapi/}linux/bits.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT
perf test: Handle perftool-testsuite_probe failure due to broken DWARF
tools headers UAPI: Sync kvm headers with the kernel sources
perf trace: Fix non-listed archs in the syscalltbl routines
perf build: Change the clang check back to 12.0.1
perf trace augmented_raw_syscalls: Add more checks to pass the verifier
perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers
perf trace: The return from 'write' isn't a pid
tools headers UAPI: Sync linux/const.h with the kernel headers
|
|
Hou Tao says:
====================
The patch set fixes several issues in bits iterator. Patch #1 fixes the
kmemleak problem of bits iterator. Patch #2~#3 fix the overflow problem
of nr_bits. Patch #4 fixes the potential stack corruption when bits
iterator is used on 32-bit host. Patch #5 adds more test cases for bits
iterator.
Please see the individual patches for more details. And comments are
always welcome.
---
v4:
* patch #1: add ack from Yafang
* patch #3: revert code-churn like changes:
(1) compute nr_bytes and nr_bits before the check of nr_words.
(2) use nr_bits == 64 to check for single u64, preventing build
warning on 32-bit hosts.
* patch #4: use "BITS_PER_LONG == 32" instead of "!defined(CONFIG_64BIT)"
v3: https://lore.kernel.org/bpf/20241025013233.804027-1-houtao@huaweicloud.com/T/#t
* split the bits-iterator related patches from "Misc fixes for bpf"
patch set
* patch #1: use "!nr_bits || bits >= nr_bits" to stop the iteration
* patch #2: add a new helper for the overflow problem
* patch #3: decrease the limitation from 512 to 511 and check whether
nr_bytes is too large for bpf memory allocator explicitly
* patch #5: add two more test cases for bit iterator
v2: http://lore.kernel.org/bpf/d49fa2f4-f743-c763-7579-c3cab4dd88cb@huaweicloud.com
====================
Link: https://lore.kernel.org/r/20241030100516.3633640-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add more test cases for bits iterator:
(1) huge word test
Verify the multiplication overflow of nr_bits in bits_iter. Without
the overflow check, when nr_words is 67108865, nr_bits becomes 64,
causing bpf_probe_read_kernel_common() to corrupt the stack.
(2) max word test
Verify correct handling of maximum nr_words value (511).
(3) bad word test
Verify early termination of bits iteration when bits iterator
initialization fails.
Also rename bits_nomem to bits_too_big to better reflect its purpose.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to
bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the
content of the u64. However, bits_copy is only 4 bytes, leading to stack
corruption.
The straightforward solution would be to replace u64 with unsigned long
in bpf_iter_bits_new(). However, this introduces confusion and problems
for 32-bit hosts because the size of ulong in bpf program is 8 bytes,
but it is treated as 4-bytes after passed to bpf_iter_bits_new().
Fix it by changing the type of both bits and bit_count from unsigned
long to u64. However, the change is not enough. The main reason is that
bpf_iter_bits_next() uses find_next_bit() to find the next bit and the
pointer passed to find_next_bit() is an unsigned long pointer instead
of a u64 pointer. For 32-bit little-endian host, it is fine but it is
not the case for 32-bit big-endian host. Because under 32-bit big-endian
host, the first iterated unsigned long will be the bits 32-63 of the u64
instead of the expected bits 0-31. Therefore, in addition to changing
the type, swap the two unsigned longs within the u64 for 32-bit
big-endian host.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Check the validity of nr_words in bpf_iter_bits_new(). Without this
check, when multiplication overflow occurs for nr_bits (e.g., when
nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).
Fix it by limiting the maximum value of nr_words to 511. The value is
derived from the current implementation of BPF memory allocator. To
ensure compatibility if the BPF memory allocator's size limitation
changes in the future, use the helper bpf_mem_alloc_check_size() to
check whether nr_bytes is too larger. And return -E2BIG instead of
-ENOMEM for oversized nr_bytes.
Fixes: 4665415975b0 ("bpf: Add bits iterator")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce bpf_mem_alloc_check_size() to check whether the allocation
size exceeds the limitation for the kmalloc-equivalent allocator. The
upper limit for percpu allocation is LLIST_NODE_SZ bytes larger than
non-percpu allocation, so a percpu argument is added to the helper.
The helper will be used in the following patch to check whether the size
parameter passed to bpf_mem_alloc() is too big.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the
bits are dynamically allocated. However, the check is incorrect and may
cause a kmemleak as shown below:
unreferenced object 0xffff88812628c8c0 (size 32):
comm "swapper/0", pid 1, jiffies 4294727320
hex dump (first 32 bytes):
b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0 ..U...........
f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00 ..............
backtrace (crc 781e32cc):
[<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80
[<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0
[<00000000597124d6>] __alloc.isra.0+0x89/0xb0
[<000000004ebfffcd>] alloc_bulk+0x2af/0x720
[<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0
[<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610
[<000000008b616eac>] bpf_global_ma_init+0x19/0x30
[<00000000fc473efc>] do_one_initcall+0xd3/0x3c0
[<00000000ec81498c>] kernel_init_freeable+0x66a/0x940
[<00000000b119f72f>] kernel_init+0x20/0x160
[<00000000f11ac9a7>] ret_from_fork+0x3c/0x70
[<0000000004671da4>] ret_from_fork_asm+0x1a/0x30
That is because nr_bits will be set as zero in bpf_iter_bits_next()
after all bits have been iterated.
Fix the issue by setting kit->bit to kit->nr_bits instead of setting
kit->nr_bits to zero when the iteration completes in
bpf_iter_bits_next(). In addition, use "!nr_bits || bits >= nr_bits" to
check whether the iteration is complete and still use "nr_bits > 64" to
indicate whether bits are dynamically allocated. The "!nr_bits" check is
necessary because bpf_iter_bits_new() may fail before setting
kit->nr_bits, and this condition will stop the iteration early instead
of accessing the zeroed or freed kit->bits.
Considering the initial value of kit->bits is -1 and the type of
kit->nr_bits is unsigned int, change the type of kit->nr_bits to int.
The potential overflow problem will be handled in the following patch.
Fixes: 4665415975b0 ("bpf: Add bits iterator")
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Fix __hci_cmd_sync_sk() to return not NULL for unknown opcodes.
__hci_cmd_sync_sk() returns NULL if a command returns a status event.
However, it also returns NULL where an opcode doesn't exist in the
hci_cc table because hci_cmd_complete_evt() assumes status = skb->data[0]
for unknown opcodes.
This leads to null-ptr-deref in cmd_sync for HCI_OP_READ_LOCAL_CODECS as
there is no hci_cc for HCI_OP_READ_LOCAL_CODECS, which always assumes
status = skb->data[0].
KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
CPU: 1 PID: 2000 Comm: kworker/u9:5 Not tainted 6.9.0-ga6bcb805883c-dirty #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: hci7 hci_power_on
RIP: 0010:hci_read_supported_codecs+0xb9/0x870 net/bluetooth/hci_codec.c:138
Code: 08 48 89 ef e8 b8 c1 8f fd 48 8b 75 00 e9 96 00 00 00 49 89 c6 48 ba 00 00 00 00 00 fc ff df 4c 8d 60 70 4c 89 e3 48 c1 eb 03 <0f> b6 04 13 84 c0 0f 85 82 06 00 00 41 83 3c 24 02 77 0a e8 bf 78
RSP: 0018:ffff888120bafac8 EFLAGS: 00010212
RAX: 0000000000000000 RBX: 000000000000000e RCX: ffff8881173f0040
RDX: dffffc0000000000 RSI: ffffffffa58496c0 RDI: ffff88810b9ad1e4
RBP: ffff88810b9ac000 R08: ffffffffa77882a7 R09: 1ffffffff4ef1054
R10: dffffc0000000000 R11: fffffbfff4ef1055 R12: 0000000000000070
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88810b9ac000
FS: 0000000000000000(0000) GS:ffff8881f6c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6ddaa3439e CR3: 0000000139764003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
hci_read_local_codecs_sync net/bluetooth/hci_sync.c:4546 [inline]
hci_init_stage_sync net/bluetooth/hci_sync.c:3441 [inline]
hci_init4_sync net/bluetooth/hci_sync.c:4706 [inline]
hci_init_sync net/bluetooth/hci_sync.c:4742 [inline]
hci_dev_init_sync net/bluetooth/hci_sync.c:4912 [inline]
hci_dev_open_sync+0x19a9/0x2d30 net/bluetooth/hci_sync.c:4994
hci_dev_do_open net/bluetooth/hci_core.c:483 [inline]
hci_power_on+0x11e/0x560 net/bluetooth/hci_core.c:1015
process_one_work kernel/workqueue.c:3267 [inline]
process_scheduled_works+0x8ef/0x14f0 kernel/workqueue.c:3348
worker_thread+0x91f/0xe50 kernel/workqueue.c:3429
kthread+0x2cb/0x360 kernel/kthread.c:388
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Fixes: abfeea476c68 ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY")
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, both in drivers (ufs and scsi_debug)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix another deadlock during RTC update
scsi: scsi_debug: Fix do_device_access() handling of unexpected SG copy length
|
|
Add a selftest for the bpf_csum_diff() helper. This selftests runs the
helper in all three configurations(push, pull, and diff) and verifies
its output. The correct results have been computed by hand and by the
helper's older implementation.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-5-puranjay@kernel.org
|
|
The bpf_csum_diff() helper has been fixed to return a 16-bit value for
all archs, so now we don't need to mask the result.
This commit is basically reverting the below:
commit 6185266c5a85 ("selftests/bpf: Mask bpf_csum_diff() return value
to 16 bits in test_verifier")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-4-puranjay@kernel.org
|
|
1. Optimization
------------
The current implementation copies the 'from' and 'to' buffers to a
scratchpad and it takes the bitwise NOT of 'from' buffer while copying.
In the next step csum_partial() is called with this scratchpad.
so, mathematically, the current implementation is doing:
result = csum(to - from)
Here, 'to' and '~ from' are copied in to the scratchpad buffer, we need
it in the scratchpad buffer because csum_partial() takes a single
contiguous buffer and not two disjoint buffers like 'to' and 'from'.
We can re write this equation to:
result = csum(to) - csum(from)
using the distributive property of csum().
this allows 'to' and 'from' to be at different locations and therefore
this scratchpad and copying is not needed.
This in C code will look like:
result = csum_sub(csum_partial(to, to_size, seed),
csum_partial(from, from_size, 0));
2. Homogenization
--------------
The bpf_csum_diff() helper calls csum_partial() which is implemented by
some architectures like arm and x86 but other architectures rely on the
generic implementation in lib/checksum.c
The generic implementation in lib/checksum.c returns a 16 bit value but
the arch specific implementations can return more than 16 bits, this
works out in most places because before the result is used, it is passed
through csum_fold() that turns it into a 16-bit value.
bpf_csum_diff() directly returns the value from csum_partial() and
therefore the returned values could be different on different
architectures. see discussion in [1]:
for the int value 28 the calculated checksums are:
x86 : -29 : 0xffffffe3
generic (arm64, riscv) : 65507 : 0x0000ffe3
arm : 131042 : 0x0001ffe2
Pass the result of bpf_csum_diff() through from32to16() before returning
to homogenize this result for all architectures.
NOTE: from32to16() is used instead of csum_fold() because csum_fold()
does from32to16() + bitwise NOT of the result, which is not what we want
to do here.
[1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-3-puranjay@kernel.org
|
|
from32to16() is used by lib/checksum.c and also by
arch/parisc/lib/checksum.c. The next patch will use it in the
bpf_csum_diff helper.
Move from32to16() to the include/net/checksum.h as csum_from32to16() and
remove other implementations.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-2-puranjay@kernel.org
|
|
Quirk is needed to enable headset microphone on missing pin 0x19.
Signed-off-by: Christoffer Sandberg <cs@tuxedo.de>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20241029151653.80726-2-wse@tuxedocomputers.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Quirk is needed to enable headset microphone on missing pin 0x19.
Signed-off-by: Christoffer Sandberg <cs@tuxedo.de>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20241029151653.80726-1-wse@tuxedocomputers.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
The WD19 family of docks has the same audio chipset as the WD15. This
change enables jack detection on the WD19.
We don't need the dell_dock_mixer_init quirk for the WD19. It is only
needed because of the dell_alc4020_map quirk for the WD15 in
mixer_maps.c, which disables the volume controls. Even for the WD15,
this quirk was apparently only needed when the dock firmware was not
updated.
Signed-off-by: Jan Schär <jan@jschaer.ch>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20241029221249.15661-1-jan@jschaer.ch
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.12
The biggest set of changes here is Hans' fixes and quirks for various
Baytrail based platforms with RT5640 CODECs, and there's one core fix
for a missed length assignment for __counted_by() checking. Otherwise
it's small device specific fixes, several of them in the DT bindings.
|
|
Jason Xing says:
====================
tcp: add tcp_warn_once() common helper
Paolo Abeni suggested we can introduce a new helper to cover more cases
in the future for better debug.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add two fields to print in the helper which here covers tcp_send_loss_probe().
Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Cc: Neal Cardwell <ncardwell@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following the commit c8770db2d544 ("tcp: check skb is non-NULL
in tcp_rto_delta_us()"), we decided to add a helper so that it's
easier to get verbose warning on either cases.
Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Cc: Neal Cardwell <ncardwell@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I got a syzbot report without a repro [1] crashing in nf_send_reset6()
I think the issue is that dev->hard_header_len is zero, and we attempt
later to push an Ethernet header.
Use LL_MAX_HEADER, as other functions in net/ipv6/netfilter/nf_reject_ipv6.c.
[1]
skbuff: skb_under_panic: text:ffffffff89b1d008 len:74 put:14 head:ffff88803123aa00 data:ffff88803123a9f2 tail:0x3c end:0x140 dev:syz_tun
kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 7373 Comm: syz.1.568 Not tainted 6.12.0-rc2-syzkaller-00631-g6d858708d465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0d 8d 48 c7 c6 60 a6 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 ba 30 38 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc900045269b0 EFLAGS: 00010282
RAX: 0000000000000088 RBX: dffffc0000000000 RCX: cd66dacdc5d8e800
RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000000
RBP: ffff88802d39a3d0 R08: ffffffff8174afec R09: 1ffff920008a4ccc
R10: dffffc0000000000 R11: fffff520008a4ccd R12: 0000000000000140
R13: ffff88803123aa00 R14: ffff88803123a9f2 R15: 000000000000003c
FS: 00007fdbee5ff6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000005d322000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
skb_push+0xe5/0x100 net/core/skbuff.c:2636
eth_header+0x38/0x1f0 net/ethernet/eth.c:83
dev_hard_header include/linux/netdevice.h:3208 [inline]
nf_send_reset6+0xce6/0x1270 net/ipv6/netfilter/nf_reject_ipv6.c:358
nft_reject_inet_eval+0x3b9/0x690 net/netfilter/nft_reject_inet.c:48
expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
nft_do_chain+0x4ad/0x1da0 net/netfilter/nf_tables_core.c:288
nft_do_chain_inet+0x418/0x6b0 net/netfilter/nft_chain_filter.c:161
nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
nf_hook include/linux/netfilter.h:269 [inline]
NF_HOOK include/linux/netfilter.h:312 [inline]
br_nf_pre_routing_ipv6+0x63e/0x770 net/bridge/br_netfilter_ipv6.c:184
nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:277 [inline]
br_handle_frame+0x9fd/0x1530 net/bridge/br_input.c:424
__netif_receive_skb_core+0x13e8/0x4570 net/core/dev.c:5562
__netif_receive_skb_one_core net/core/dev.c:5666 [inline]
__netif_receive_skb+0x12f/0x650 net/core/dev.c:5781
netif_receive_skb_internal net/core/dev.c:5867 [inline]
netif_receive_skb+0x1e8/0x890 net/core/dev.c:5926
tun_rx_batched+0x1b7/0x8f0 drivers/net/tun.c:1550
tun_get_user+0x3056/0x47e0 drivers/net/tun.c:2007
tun_chr_write_iter+0x10d/0x1f0 drivers/net/tun.c:2053
new_sync_write fs/read_write.c:590 [inline]
vfs_write+0xa6d/0xc90 fs/read_write.c:683
ksys_write+0x183/0x2b0 fs/read_write.c:736
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fdbeeb7d1ff
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 8d 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 1c 8e 02 00 48
RSP: 002b:00007fdbee5ff000 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fdbeed36058 RCX: 00007fdbeeb7d1ff
RDX: 000000000000008e RSI: 0000000020000040 RDI: 00000000000000c8
RBP: 00007fdbeebf12be R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000008e R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fdbeed36058 R15: 00007ffc38de06e8
</TASK>
Fixes: c8d7b98bec43 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ip6table_nat module unload has refcnt warning for UAF. call trace is:
WARNING: CPU: 1 PID: 379 at kernel/module/main.c:853 module_put+0x6f/0x80
Modules linked in: ip6table_nat(-)
CPU: 1 UID: 0 PID: 379 Comm: ip6tables Not tainted 6.12.0-rc4-00047-gc2ee9f594da8-dirty #205
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:module_put+0x6f/0x80
Call Trace:
<TASK>
get_info+0x128/0x180
do_ip6t_get_ctl+0x6a/0x430
nf_getsockopt+0x46/0x80
ipv6_getsockopt+0xb9/0x100
rawv6_getsockopt+0x42/0x190
do_sock_getsockopt+0xaa/0x180
__sys_getsockopt+0x70/0xc0
__x64_sys_getsockopt+0x20/0x30
do_syscall_64+0xa2/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Concurrent execution of module unload and get_info() trigered the warning.
The root cause is as follows:
cpu0 cpu1
module_exit
//mod->state = MODULE_STATE_GOING
ip6table_nat_exit
xt_unregister_template
kfree(t)
//removed from templ_list
getinfo()
t = xt_find_table_lock
list_for_each_entry(tmpl, &xt_templates[af]...)
if (strcmp(tmpl->name, name))
continue; //table not found
try_module_get
list_for_each_entry(t, &xt_net->tables[af]...)
return t; //not get refcnt
module_put(t->me) //uaf
unregister_pernet_subsys
//remove table from xt_net list
While xt_table module was going away and has been removed from
xt_templates list, we couldnt get refcnt of xt_table->me. Check
module in xt_net->tables list re-traversal to fix it.
Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default")
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|