Age | Commit message (Collapse) | Author |
|
Since 1b620d539ccc ("kbuild: disable header exports for UML in a
straightforward way"), installing headers fails on UML, so just disable
installing them, since they're not needed anyway on the architecture.
Fixes: b438b3b8d6e6 ("wireguard: selftests: support UML")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A previous commit tried to make the ratelimiter timings test more
reliable but in the process made it less reliable on other
configurations. This is an impossible problem to solve without
increasingly ridiculous heuristics. And it's not even a problem that
actually needs to be solved in any comprehensive way, since this is only
ever used during development. So just cordon this off with a DEBUG_
ifdef, just like we do for the trie's randomized tests, so it can be
enabled while hacking on the code, and otherwise disabled in CI. In the
process we also revert 151c8e499f47.
Fixes: 151c8e499f47 ("wireguard: ratelimiter: use hrtimer in selftest")
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Like in previous patch for sfc, prevent potential (but unlikely) NULL
pointer dereference.
Fixes: 12804793b17c ("sfc: decouple TXQ type from label")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Link: https://lore.kernel.org/r/20220915141958.16458-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As in previous commit for sfc, fix TX channels offset when
efx_siena_separate_tx_channels is false (the default)
Fixes: 25bde571b4a8 ("sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Link: https://lore.kernel.org/r/20220915141653.15504-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- two fixes for hangs in the umount sequence where threads depend on
each other and the work must be finished in the right order
- in zoned mode, wait for flushing all block group metadata IO before
finishing the zone
* tag 'for-6.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: wait for extent buffer IOs before finishing a zone
btrfs: fix hang during unmount when stopping a space reclaim worker
btrfs: fix hang during unmount when stopping block group reclaim worker
|
|
Kuniyuki Iwashima says:
====================
tcp: Introduce optional per-netns ehash.
The more sockets we have in the hash table, the longer we spend looking
up the socket. While running a number of small workloads on the same
host, they penalise each other and cause performance degradation.
The root cause might be a single workload that consumes much more
resources than the others. It often happens on a cloud service where
different workloads share the same computing resource.
On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash
entries), after running iperf3 in different netns, creating 24Mi sockets
without data transfer in the root netns causes about 10% performance
regression for the iperf3's connection.
thash_entries sockets length Gbps
524288 1 1 50.7
24Mi 48 45.1
It is basically related to the length of the list of each hash bucket.
For testing purposes to see how performance drops along the length,
I set 131072 (1Mi / 8) to thash_entries, and here's the result.
thash_entries sockets length Gbps
131072 1 1 50.7
1Mi 8 49.9
2Mi 16 48.9
4Mi 32 47.3
8Mi 64 44.6
16Mi 128 40.6
24Mi 192 36.3
32Mi 256 32.5
40Mi 320 27.0
48Mi 384 25.0
To resolve the socket lookup degradation, we introduce an optional
per-netns hash table for TCP, but it's just ehash, and we still share
the global bhash, bhash2 and lhash2.
With a smaller ehash, we can look up non-listener sockets faster and
isolate such noisy neighbours. Also, we can reduce lock contention.
For details, please see the last patch.
patch 1 - 4: prep for per-netns ehash
patch 5: small optimisation for netns dismantle without TIME_WAIT sockets
patch 6: add per-netns ehash
Many thanks to Eric Dumazet for reviewing and advising.
====================
Link: https://lore.kernel.org/r/20220908011022.45342-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The more sockets we have in the hash table, the longer we spend looking
up the socket. While running a number of small workloads on the same
host, they penalise each other and cause performance degradation.
The root cause might be a single workload that consumes much more
resources than the others. It often happens on a cloud service where
different workloads share the same computing resource.
On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash
entries), after running iperf3 in different netns, creating 24Mi sockets
without data transfer in the root netns causes about 10% performance
regression for the iperf3's connection.
thash_entries sockets length Gbps
524288 1 1 50.7
24Mi 48 45.1
It is basically related to the length of the list of each hash bucket.
For testing purposes to see how performance drops along the length,
I set 131072 (1Mi / 8) to thash_entries, and here's the result.
thash_entries sockets length Gbps
131072 1 1 50.7
1Mi 8 49.9
2Mi 16 48.9
4Mi 32 47.3
8Mi 64 44.6
16Mi 128 40.6
24Mi 192 36.3
32Mi 256 32.5
40Mi 320 27.0
48Mi 384 25.0
To resolve the socket lookup degradation, we introduce an optional
per-netns hash table for TCP, but it's just ehash, and we still share
the global bhash, bhash2 and lhash2.
With a smaller ehash, we can look up non-listener sockets faster and
isolate such noisy neighbours. In addition, we can reduce lock contention.
We can control the ehash size by a new sysctl knob. However, depending
on workloads, it will require very sensitive tuning, so we disable the
feature by default (net.ipv4.tcp_child_ehash_entries == 0). Moreover,
we can fall back to using the global ehash in case we fail to allocate
enough memory for a new ehash. The maximum size is 16Mi, which is large
enough that even if we have 48Mi sockets, the average list length is 3,
and regression would be less than 1%.
We can check the current ehash size by another read-only sysctl knob,
net.ipv4.tcp_ehash_entries. A negative value means the netns shares
the global ehash (per-netns ehash is disabled or failed to allocate
memory).
# dmesg | cut -d ' ' -f 5- | grep "established hash"
TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc hugepage)
# sysctl net.ipv4.tcp_ehash_entries
net.ipv4.tcp_ehash_entries = 524288 # can be changed by thash_entries
# sysctl net.ipv4.tcp_child_ehash_entries
net.ipv4.tcp_child_ehash_entries = 0 # disabled by default
# ip netns add test1
# ip netns exec test1 sysctl net.ipv4.tcp_ehash_entries
net.ipv4.tcp_ehash_entries = -524288 # share the global ehash
# sysctl -w net.ipv4.tcp_child_ehash_entries=100
net.ipv4.tcp_child_ehash_entries = 100
# ip netns add test2
# ip netns exec test2 sysctl net.ipv4.tcp_ehash_entries
net.ipv4.tcp_ehash_entries = 128 # own a per-netns ehash with 2^n buckets
When more than two processes in the same netns create per-netns ehash
concurrently with different sizes, we need to guarantee the size in
one of the following ways:
1) Share the global ehash and create per-netns ehash
First, unshare() with tcp_child_ehash_entries==0. It creates dedicated
netns sysctl knobs where we can safely change tcp_child_ehash_entries
and clone()/unshare() to create a per-netns ehash.
2) Control write on sysctl by BPF
We can use BPF_PROG_TYPE_CGROUP_SYSCTL to allow/deny read/write on
sysctl knobs.
Note that the global ehash allocated at the boot time is spread over
available NUMA nodes, but inet_pernet_hashinfo_alloc() will allocate
pages for each per-netns ehash depending on the current process's NUMA
policy. By default, the allocation is done in the local node only, so
the per-netns hash table could fully reside on a random node. Thus,
depending on the NUMA policy the netns is created with and the CPU the
current thread is running on, we could see some performance differences
for highly optimised networking applications.
Note also that the default values of two sysctl knobs depend on the ehash
size and should be tuned carefully:
tcp_max_tw_buckets : tcp_child_ehash_entries / 2
tcp_max_syn_backlog : max(128, tcp_child_ehash_entries / 128)
As a bonus, we can dismantle netns faster. Currently, while destroying
netns, we call inet_twsk_purge(), which walks through the global ehash.
It can be potentially big because it can have many sockets other than
TIME_WAIT in all netns. Splitting ehash changes that situation, where
it's only necessary for inet_twsk_purge() to clean up TIME_WAIT sockets
in each netns.
With regard to this, we do not free the per-netns ehash in inet_twsk_kill()
to avoid UAF while iterating the per-netns ehash in inet_twsk_purge().
Instead, we do it in tcp_sk_exit_batch() after calling tcp_twsk_purge() to
keep it protocol-family-independent.
In the future, we could optimise ehash lookup/iteration further by removing
netns comparison for the per-netns ehash.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While destroying netns, we call inet_twsk_purge() in tcp_sk_exit_batch()
and tcpv6_net_exit_batch() for AF_INET and AF_INET6. These commands
trigger the kernel to walk through the potentially big ehash twice even
though the netns has no TIME_WAIT sockets.
# ip netns add test
# ip netns del test
or
# unshare -n /bin/true >/dev/null
When tw_refcount is 1, we need not call inet_twsk_purge() at least
for the net. We can save such unneeded iterations if all netns in
net_exit_list have no TIME_WAIT sockets. This change eliminates
the tax by the additional unshare() described in the next patch to
guarantee the per-netns ehash size.
Tested:
# mount -t debugfs none /sys/kernel/debug/
# echo cleanup_net > /sys/kernel/debug/tracing/set_ftrace_filter
# echo inet_twsk_purge >> /sys/kernel/debug/tracing/set_ftrace_filter
# echo function > /sys/kernel/debug/tracing/current_tracer
# cat ./add_del_unshare.sh
for i in `seq 1 40`
do
(for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait;
# ./add_del_unshare.sh
Before the patch:
# cat /sys/kernel/debug/tracing/trace_pipe
kworker/u128:0-8 [031] ...1. 174.162765: cleanup_net <-process_one_work
kworker/u128:0-8 [031] ...1. 174.240796: inet_twsk_purge <-cleanup_net
kworker/u128:0-8 [032] ...1. 174.244759: inet_twsk_purge <-tcp_sk_exit_batch
kworker/u128:0-8 [034] ...1. 174.290861: cleanup_net <-process_one_work
kworker/u128:0-8 [039] ...1. 175.245027: inet_twsk_purge <-cleanup_net
kworker/u128:0-8 [046] ...1. 175.290541: inet_twsk_purge <-tcp_sk_exit_batch
kworker/u128:0-8 [037] ...1. 175.321046: cleanup_net <-process_one_work
kworker/u128:0-8 [024] ...1. 175.941633: inet_twsk_purge <-cleanup_net
kworker/u128:0-8 [025] ...1. 176.242539: inet_twsk_purge <-tcp_sk_exit_batch
After:
# cat /sys/kernel/debug/tracing/trace_pipe
kworker/u128:0-8 [038] ...1. 428.116174: cleanup_net <-process_one_work
kworker/u128:0-8 [038] ...1. 428.262532: cleanup_net <-process_one_work
kworker/u128:0-8 [030] ...1. 429.292645: cleanup_net <-process_one_work
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We will soon introduce an optional per-netns ehash.
This means we cannot use tcp_hashinfo directly in most places.
Instead, access it via net->ipv4.tcp_death_row.hashinfo.
The access will be valid only while initialising tcp_hashinfo
itself and creating/destroying each netns.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We will soon introduce an optional per-netns ehash.
This means we cannot use the global sk->sk_prot->h.hashinfo
to fetch a TCP hashinfo.
Instead, set NULL to sk->sk_prot->h.hashinfo for TCP and get
a proper hashinfo from net->ipv4.tcp_death_row.hashinfo.
Note that we need not use sk->sk_prot->h.hashinfo if DCCP is
disabled.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We will soon introduce an optional per-netns ehash and access hash
tables via net->ipv4.tcp_death_row->hashinfo instead of &tcp_hashinfo
in most places.
It could harm the fast path because dereferences of two fields in net
and tcp_death_row might incur two extra cache line misses. To save one
dereference, let's place tcp_death_row back in netns_ipv4 and fetch
hashinfo via net->ipv4.tcp_death_row"."hashinfo.
Note tcp_death_row was initially placed in netns_ipv4, and commit
fbb8295248e1 ("tcp: allocate tcp_death_row outside of struct netns_ipv4")
changed it to a pointer so that we can fire TIME_WAIT timers after freeing
net. However, we don't do so after commit 04c494e68a13 ("Revert "tcp/dccp:
get rid of inet_twsk_purge()""), so we need not define tcp_death_row as a
pointer.
Also, we move refcount_dec_and_test(&tw_refcount) from tcp_sk_exit() to
tcp_sk_exit_batch() as a debug check.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds no functional change and cleans up some functions
that the following patches touch around so that we make them tidy
and easy to review/revert. The changes are
- Keep reverse christmas tree order
- Remove unnecessary init of port in inet_csk_find_open_port()
- Use req_to_sk() once in reqsk_queue_unlink()
- Use sock_net(sk) once in tcp_time_wait() and tcp_v[46]_connect()
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs fix from Christian Brauner:
"Beginning of the merge window we introduced the vfs{g,u}id_t types in
b27c82e12965 ("attr: port attribute changes to new types") and changed
various codepaths over including chown_common().
When userspace passes -1 for an ownership change the ownership fields
in struct iattr stay uninitialized. Usually this is fine because any
code making use of any fields in struct iattr must check the
->ia_valid field whether the value of interest has been initialized.
That's true for all struct iattr passing code.
However, over the course of the last year with more heavy use of KMSAN
we found quite a few places that got this wrong. A recent one I fixed
was 3cb6ee991496 ("9p: only copy valid iattrs in 9P2000.L setattr
implementation").
But we also have LSM hooks. Actually we have two. The first one is
security_inode_setattr() in notify_change() which does the right thing
and passes the full struct iattr down to LSMs and thus LSMs can check
whether it is initialized.
But then we also have security_path_chown() which passes down a path
argument and the target ownership as the filesystem would see it. For
the latter we now generate the target values based on struct iattr and
pass it down. However, when userspace passes -1 then struct iattr
isn't initialized.
This patch simply initializes ->ia_vfs{g,u}id with INVALID_VFS{G,U}ID
so the hook continue to see invalid ownership when -1 is passed from
userspace. The only LSM that cares about the actual values is Tomoyo.
The vfs codepaths don't look at these fields without ->ia_valid being
set so there's no harm in initializing ->ia_vfs{g,u}id. Arguably this
is also safer since we can't end up copying valid ownership values
when invalid ownership values should be passed.
This only affects mainline. No kernel has been released with this and
thus no backport is needed. The commit is thus marked with a Fixes:
tag but annotated with "# mainline only" (I didn't quite remember what
Greg said about how to tell stable autoselect to not bother with fixes
for mainline only)"
* tag 'fs.fixes.v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
open: always initialize ownership fields
|
|
There is a single kmalloc in this driver, and it's not currently
guarded against allocation failure. Do it here by just bailing-out
the reboot handler, in case this tentative allocation fails.
Fixes: 416581e48679 ("efi: efibc: avoid efivar API for setting variables")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Remove a left-over from commit 2874c5fd2842 ("treewide: Replace GPLv2
boilerplate/reference with SPDX - rule 152")
There is no need for an empty "License:".
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/0e5ff727626b748238f4b78932f81572143d8f0b.1662896317.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a regression test for commit 592335a4164c ("bonding: accept
unsolicited NA message") and commit b7f14132bf58 ("bonding: use unspecified
address if no available link local address"). When the bond interface
up and no available link local address, unspecified address(::) is used to
send the NS message. The unsolicited NA message should also be accepted
for validation.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220920033047.173244-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-3-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-2-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
EMAC IP found on RZ/G2L Gb ethernet supports MII interface.
This patch adds support for selecting MII interface mode.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20220914192604.265859-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve reverts from Kees Cook:
"The recent work to support time namespace unsharing turns out to have
some undesirable corner cases, so rather than allowing the API to stay
exposed for another release, it'd be best to remove it ASAP, with the
replacement getting another cycle of testing. Nothing is known to use
this yet, so no userspace breakage is expected.
For more details, see:
https://lore.kernel.org/lkml/ed418e43ad28b8688cfea2b7c90fce1c@ispras.ru
Summary:
- Remove the recent 'unshare time namespace on vfork+exec' feature
(Andrei Vagin)"
* tag 'execve-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
Revert "fs/exec: allow to unshare a time namespace on vfork+exec"
Revert "selftests/timens: add a test for vfork+exit"
|
|
Unlike with bridges, one can't add an interface to a bond and set it up
at the same time:
| # ip link set dummy0 down
| # ip link set dummy0 master bond0 up
| Error: Device can not be enslaved while up.
Of all drivers with ndo_add_slave callback, bond and team decline if
IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and
immediately up again) and the others just don't care.
Support the common notion of setting the interface up after enslaving it
by sorting the operations accordingly.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220914150623.24152-1-phil@nwl.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Radhey Shyam Pandey says:
====================
macb: add zynqmp SGMII dynamic configuration support
This patchset add firmware and driver support to do SD/GEM dynamic
configuration. In traditional flow GEM secure space configuration
is done by FSBL. However in specific usescases like dynamic designs
where GEM is not enabled in base vivado design, FSBL skips GEM
initialization and we need a mechanism to configure GEM secure space
in linux space at runtime.
====================
Link: https://lore.kernel.org/r/1663158796-14869-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for the dynamic configuration which takes care of
configuring the GEM secure space configuration registers
using EEMI APIs.
High level sequence is to:
- Check for the PM dynamic configuration support, if no error proceed with
GEM dynamic configurations(next steps) otherwise skip the dynamic
configuration.
- Configure GEM Fixed configurations.
- Configure GEM_CLK_CTRL (gemX_sgmii_mode).
- Trigger GEM reset.
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Conor Dooley <conor.dooley@microchip.com> (for MPFS)
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add new APIs in firmware to configure SD/GEM registers. Internally
it calls PM IOCTL for below SD/GEM register configuration:
- SD/EMMC select
- SD slot type
- SD base clock
- SD 8 bit support
- SD fixed config
- GEM SGMII Mode
- GEM fixed config
Signed-off-by: Ronak Jain <ronak.jain@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot is still complaining uninit-value in tcp_recvmsg(), for
commit 1228b34c8d0ecf6d ("net: clear msg_get_inq in __sys_recvfrom() and
__copy_msghdr_from_user()") missed that __get_compat_msghdr() is called
instead of copy_msghdr_from_user() when MSG_CMSG_COMPAT is specified.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 1228b34c8d0ecf6d ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()")
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/d06d0f7f-696c-83b4-b2d5-70b5f2730a37@I-love.SAKURA.ne.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'ipmr-always-call-ip-6-_mr_forward-from-rcu-read-side-critical-section'
Ido Schimmel says:
====================
ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section
Patch #1 fixes a bug in ipmr code.
Patch #2 adds corresponding test cases.
====================
Link: https://lore.kernel.org/r/20220914075339.4074096-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add IPv4 and IPv6 test cases for unresolved multicast routes, testing
that queued packets are forwarded after installing a matching (S, G)
route.
The test cases can be used to reproduce the bugs fixed in "ipmr: Always
call ip{,6}_mr_forward() from RCU read-side critical section".
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These functions expect to be called from RCU read-side critical section,
but this only happens when invoked from the data path via
ip{,6}_mr_input(). They can also be invoked from process context in
response to user space adding a multicast route which resolves a cache
entry with queued packets [1][2].
Fix by adding missing rcu_read_lock() / rcu_read_unlock() in these call
paths.
[1]
WARNING: suspicious RCU usage
6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted
-----------------------------
net/ipv4/ipmr.c:84 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by smcrouted/246:
#0: ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip_mroute_setsockopt+0x11c/0x1420
stack backtrace:
CPU: 0 PID: 246 Comm: smcrouted Not tainted 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x91/0xb9
vif_dev_read+0xbf/0xd0
ipmr_queue_xmit+0x135/0x1ab0
ip_mr_forward+0xe7b/0x13d0
ipmr_mfc_add+0x1a06/0x2ad0
ip_mroute_setsockopt+0x5c1/0x1420
do_ip_setsockopt+0x23d/0x37f0
ip_setsockopt+0x56/0x80
raw_setsockopt+0x219/0x290
__sys_setsockopt+0x236/0x4d0
__x64_sys_setsockopt+0xbe/0x160
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[2]
WARNING: suspicious RCU usage
6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted
-----------------------------
net/ipv6/ip6mr.c:69 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by smcrouted/246:
#0: ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip6_mroute_setsockopt+0x6b9/0x2630
stack backtrace:
CPU: 1 PID: 246 Comm: smcrouted Not tainted 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x91/0xb9
vif_dev_read+0xbf/0xd0
ip6mr_forward2.isra.0+0xc9/0x1160
ip6_mr_forward+0xef0/0x13f0
ip6mr_mfc_add+0x1ff2/0x31f0
ip6_mroute_setsockopt+0x1825/0x2630
do_ipv6_setsockopt+0x462/0x4440
ipv6_setsockopt+0x105/0x140
rawv6_setsockopt+0xd8/0x690
__sys_setsockopt+0x236/0x4d0
__x64_sys_setsockopt+0xbe/0x160
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: ebc3197963fc ("ipmr: add rcu protection over (struct vif_device)->dev")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The symbol is not used outside of the file, so mark it static.
Fixes the following warning:
./drivers/net/xen-netfront.c:676:16: warning: symbol 'bounce_skb' was not declared. Should it be static?
Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220914064339.49841-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IPA can route packets between IPA-connected entities. The AP and
modem are currently the only such entities supported, and no routing
is required to transfer packets between them.
The number of entries in each routing table is fixed, and defined at
initialization time. Some of these entries are designated for use
by the modem, and the rest are available for the AP to use. The AP
sends a QMI message to the modem which describes (among other
things) information about routing table memory available for the
modem to use.
Currently the QMI initialization packet gives wrong information in
its description of routing tables. What *should* be supplied is the
maximum index that the modem can use for the routing table memory
located at a given location. The current code instead supplies the
total *number* of routing table entries. Furthermore, the modem is
granted the entire table, not just the subset it's supposed to use.
This patch fixes this. First, the ipa_mem_bounds structure is
generalized so its "end" field can be interpreted either as a final
byte offset, or a final array index. Second, the IPv4 and IPv6
(non-hashed and hashed) table information fields in the QMI
ipa_init_modem_driver_req structure are changed to be ipa_mem_bounds
rather than ipa_mem_array structures. Third, we set the "end" value
for each routing table to be the last index, rather than setting the
"count" to be the number of indices. Finally, instead of allowing
the modem to use all of a routing table's memory, it is limited to
just the portion meant to be used by the modem. In all versions of
IPA currently supported, that is IPA_ROUTE_MODEM_COUNT (8) entries.
Update a few comments for clarity.
Fixes: 530f9216a9537 ("soc: qcom: ipa: AP/modem communications")
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20220913204602.1803004-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for interrupts for LAN8804 PHY.
Tested-by: Michael Walle <michael@walle.cc> # on kontron-kswitch-d10
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220913142926.816746-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Russell King says:
====================
sfp: add support for HALNy GPON module
This series adds support for the HALNy GPON SFP module. In order to do
this sensibly, we need a more flexible quirk system, since we need to
change the behaviour of the SFP cage driver to ignore the LOS and
TX_FAULT signals after module detection.
Since we move the SFP quirks into the SFP cage driver, we can use it
for the MA5671A and 3FE46541AA modules as well.
====================
Link: https://lore.kernel.org/r/YyDUnvM1b0dZPmmd@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a quirk for the HALNy HL-GSFP module, which appears to have an
inverted RX_LOS signal, and maybe uses TX_FAULT as a serial port
transmit pin. Rather than use these hardware signals, switch to
using software polling for these status signals.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move this module over to the new fixup mechanism.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new fixup mechanism to the SFP quirks, and use it for this
module.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We need to handle more quirks than just those which affect the link
modes of the module. Move the quirk lookup into sfp.c, and pass the
quirk to sfp-bus.c
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Re-implement the decision making for soft state polling. Instead of
generating the soft state mask in sfp_soft_start_poll() by looking at
which GPIOs are available, record their availability in
sfp_sm_mod_probe() in sfp->state_hw_mask.
This will then allow us to clear bits in sfp->state_hw_mask in module
specific quirks when the hardware signals should not be used, thereby
allowing us to switch to using the software state polling.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the free-form description of device tree bindings for VSC9959
and VSC9953 with a YAML formatted dt-schema description. This contains
more or less the same information, but reworded to be a bit more
succint.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220913125806.524314-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Alex Elder says:
====================
net: ipa: a mix of cleanups
This series contains a set of cleanups done in preparation for a
more substantitive upcoming series that reworks how IPA registers
and their fields are defined.
The first eliminates about half of the possible GSI register
constant symbols by removing offset definitions that are not
currently required.
The next two mainly rearrange code for some common enumerated types.
The next one fixes two spots that reuse local variable names in
inner scopes when defining offsets.
The next adds some additional restrictions on the value held in a
register.
And the last one just fixes two field mask symbol names so they
adhere to the common naming convention.
====================
Link: https://lore.kernel.org/r/20220910011131.1431934-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All field mask symbols are defined with a "_FMASK" suffix, but
EOT_COAL_GRANULARITY and DRBIP_ACL_ENABLE are defined without one.
Fix that.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Starting with IPA v4.5, replication is done differently from before,
and as a result the "replication" portion of the how the sequencer
is specified must be zero.
Add a check for the configuration data failing that requirement, and
only update the sesquencer type value when it's supported.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In ipa_endpoint_init_hdr(), as well as ipa_endpoint_init_hdr_ext(),
a top-level automatic variable named "offset" is used to represent
the offset of a register.
However, deeper within each of those functions is *another*
definition of a local variable with the same name, representing
something else. Scoping rules ensure the result is what was
intended, but this variable name reuse is bad practice and makes
the code confusing.
Fix this by naming the inner variable "off". Use "off" instead of
"checksum_offset" in ipa_endpoint_init_cfg() for consistency.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the definition of ipa_version_valid(), making it a static
inline function defined together with the enumerated type in
"ipa_version.h". Define a new count value in the type.
Rename the function to be ipa_version_supported(), and have it
return true only if the IPA version supplied is explicitly supported
by the driver.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the definition of the gsi_ee_id enumerated type out of "gsi.h"
and into "ipa_version.h". That latter header file isolates the
definition of the ipa_version enumerated type, allowing it to be
included in both IPA and GSI code. We have the same requirement for
gsi_ee_id, and moving it here makes it easier to get only that
definition without everything else defined in "gsi.h".
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each GSI execution environment (EE) is able to access many of the
GSI registers associated with the other EEs. A block of GSI
registers is contained within a region of memory, and an EE's
register offset can be determined by adding the register's base
offset to the product of the EE ID and a fixed constant.
Despite this possibility, the AP IPA code *never* accesses any GSI
registers other than its own. So there's no need to define the
macros that compute register offsets for other EEs.
Redefine the AP access macros to compute the offset the way the more
general "any EE" macro would, and get rid of the unneeded macros.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In of_mdiobus_register(), we should call of_node_put() for 'child'
escaped out of for_each_available_child_of_node().
Fixes: 66bdede495c7 ("of_mdio: Fix broken PHY IRQ in case of probe deferral")
Co-developed-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Liang He <windhl@126.com>
Link: https://lore.kernel.org/r/20220913125659.3331969-1-windhl@126.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This turns the FTGPIO010 irqchip immutable.
Tested on the D-Link DIR-685.
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
If creation of software node fails, the locally allocated string
array is left unfreed. Free it on error path.
Fixes: 6fda593f3082 ("gpio: mockup: Convert to use software nodes")
Cc: stable@vger.kernel.org
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
We now remove the device's debugfs entries when unbinding the driver.
This now causes a NULL-pointer dereference on module exit because the
platform devices are unregistered *after* the global debugfs directory
has been recursively removed. Fix it by unregistering the devices first.
Fixes: 303e6da99429 ("gpio: mockup: remove gpio debugfs when remove device")
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|