Age | Commit message (Collapse) | Author |
|
When I was doing some experiments, I found that when using the first
parameter, namely, struct net, in ip_metrics_convert() always triggers NULL
pointer crash. Then I digged into this part, realizing that we can remove
this one due to its uselessness.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Smatch complains:
net/bridge/br_netlink_tunnel.c:
318 br_process_vlan_tunnel_info() warn: inconsistent indenting
Fix it with a proper indenting
Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When copying smc settings to clcsock, avoid setting clcsock's sk_sndbuf
to sysctl_tcp_wmem[1], since this may overwrite the value set by
tcp_sndbuf_expand() in TCP connection establishment.
And the other setting sk_{snd|rcv}buf to sysctl value in
smc_adjust_sock_bufsizes() can also be omitted since the initialization
of smc sock and clcsock has set sk_{snd|rcv}buf to smc.sysctl_{w|r}mem
or ipv4_sysctl_tcp_{w|r}mem[1].
Fixes: 30c3c4a4497c ("net/smc: Use correct buffer sizes when switching between TCP and SMC")
Link: https://lore.kernel.org/r/5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>, too.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 2863d665ea41282379f108e4da6c8a2366ba66db.
This patch introduced a potential kernel crash when multiple napi instances
redirect to the same AF_XDP socket. By removing the queue_index check, it is
possible for multiple napi instances to access the Rx ring at the same time,
which will result in a corrupted ring state which can lead to a crash when
flushing the rings in __xsk_flush(). This can happen when the linked list of
sockets to flush gets corrupted by concurrent accesses. A quick and small fix
is not possible, so let us revert this for now.
Reported-by: Yuval El-Hanany <YuvalE@radware.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com
Link: https://lore.kernel.org/bpf/20240604122927.29080-2-magnus.karlsson@gmail.com
|
|
syzbot reported crash when rawtp program executed through the
test_run interface calls bpf_get_attach_cookie helper or any
other helper that touches task->bpf_ctx pointer.
Setting the run context (task->bpf_ctx pointer) for test_run
callback.
Fixes: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value")
Reported-by: syzbot+3ab78ff125b7979e45f9@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://syzkaller.appspot.com/bug?extid=3ab78ff125b7979e45f9
Link: https://lore.kernel.org/bpf/20240604150024.359247-1-jolsa@kernel.org
|
|
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.
Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20240531111552.3209198-2-leitao@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core instead
of this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Move openvswitch driver to leverage the core allocation.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240531111552.3209198-1-leitao@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
According to current semantics we should never try to shift data
between skbs which differ on decrypted or pp_recycle status.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
TLS (and hopefully soon PSP will) use EOR to prevent skbs
with different decrypted state from getting merged, without
adding new tests to the skb handling. In both cases once
the connection switches to an "encrypted" state, all subsequent
skbs will be encrypted, so a single "EOR fence" is sufficient
to prevent mixing.
Add a helper for setting the EOR bit, to make this arrangement
more explicit.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
tcp_skb_can_collapse() checks for conditions which don't make
sense on input. Because of this we ended up sprinkling a few
pairs of mptcp_skb_can_collapse() and skb_cmp_decrypted() calls
on the input path. Group them in a new helper. This should make
it less likely that someone will check mptcp and not decrypted
or vice versa when adding new code.
This implicitly adds a decrypted check early in tcp_collapse().
AFAIU this will very slightly increase our ability to collapse
packets under memory pressure, not a real bug.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
For TLS offload we mark packets with skb->decrypted to make sure
they don't escape the host without getting encrypted first.
The crypto state lives in the socket, so it may get detached
by a call to skb_orphan(). As a safety check - the egress path
drops all packets with skb->decrypted and no "crypto-safe" socket.
The skb marking was added to sendpage only (and not sendmsg),
because tls_device injected data into the TCP stack using sendpage.
This special case was missed when sendpage got folded into sendmsg.
Fixes: c5c37af6ecad ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
extend cls_flower to match TUNNEL_FLAGS_PRESENT bits in tunnel metadata.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata.
This is a prerequisite for matching these control flags using TC flower.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Catching and debugging missing qdiscs is pretty tricky. When qdisc
is deleted we replace it with a noop qdisc, which silently drops
all the packets. Since the noop qdisc has a single static instance
we can't count drops at the qdisc level. Count them as dev->tx_drops.
ip netns add red
ip link add type veth peer netns red
ip link set dev veth0 up
ip -netns red link set dev veth0 up
ip a a dev veth0 10.0.0.1/24
ip -netns red a a dev veth0 10.0.0.2/24
ping -c 2 10.0.0.2
# 2 packets transmitted, 2 received, 0% packet loss, time 1031ms
ip -s link show dev veth0
# TX: bytes packets errors dropped carrier collsns
# 1314 17 0 0 0 0
tc qdisc replace dev veth0 root handle 1234: mq
tc qdisc replace dev veth0 parent 1234:1 pfifo
tc qdisc del dev veth0 parent 1234:1
ping -c 2 10.0.0.2
# 2 packets transmitted, 0 received, 100% packet loss, time 1034ms
ip -s link show dev veth0
# TX: bytes packets errors dropped carrier collsns
# 1314 17 0 3 0 0
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240529162527.3688979-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.10-rc3
The first fixes for v6.10. And we have a big one, I suspect the
biggest wireless pull request we ever had. There are fixes all over,
both in stack and drivers. Likely the most important here are mt76 not
working on mt7615 devices, ath11k not being able to connect to 6 GHz
networks and rtlwifi suffering from packet loss. But of course there's
much more.
* tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (37 commits)
wifi: rtlwifi: Ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
wifi: mt76: mt7615: add missing chanctx ops
wifi: wilc1000: document SRCU usage instead of SRCU
Revert "wifi: wilc1000: set atomic flag on kmemdup in srcu critical section"
Revert "wifi: wilc1000: convert list management to RCU"
wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan()
wifi: mac80211: correctly parse Spatial Reuse Parameter Set element
wifi: mac80211: fix Spatial Reuse element size check
wifi: iwlwifi: mvm: don't read past the mfuart notifcation
wifi: iwlwifi: mvm: Fix scan abort handling with HW rfkill
wifi: iwlwifi: mvm: check n_ssids before accessing the ssids
wifi: iwlwifi: mvm: properly set 6 GHz channel direct probe option
wifi: iwlwifi: mvm: handle BA session teardown in RF-kill
wifi: iwlwifi: mvm: Handle BIGTK cipher in kek_kck cmd
wifi: iwlwifi: mvm: remove stale STA link data during restart
wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef
wifi: iwlwifi: mvm: set properly mac header
wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64
wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup
wifi: iwlwifi: mvm: fix a crash on 7265
...
====================
Link: https://lore.kernel.org/r/20240603115129.9494CC2BD10@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After fixing four different bugs involving dst_cache
users, it might be worth adding a check about BH being
blocked by dst_cache callers.
DEBUG_NET_WARN_ON_ONCE(!in_softirq());
It is not fatal, if we missed valid case where no
BH deadlock is to be feared, we might change this.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As explained in commit 1378817486d6 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.
ila_output() is called from lwtunnel_output()
possibly from process context, and under rcu_read_lock().
We might be interrupted by a softirq, re-enter ila_output()
and corrupt dst_cache data structures.
Fix the race by using local_bh_disable().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As explained in commit 1378817486d6 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.
Disabling preemption in seg6_output_core() is not good enough,
because seg6_output_core() is called from process context,
lwtunnel_output() only uses rcu_read_lock().
We might be interrupted by a softirq, re-enter seg6_output_core()
and corrupt dst_cache data structures.
Fix the race by using local_bh_disable() instead of
preempt_disable().
Apply a similar change in seg6_input_core().
Fixes: fa79581ea66c ("ipv6: sr: fix several BUGs when preemption is enabled")
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Lebrun <dlebrun@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As explained in commit 1378817486d6 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.
Disabling preemption in rpl_output() is not good enough,
because rpl_output() is called from process context,
lwtunnel_output() only uses rcu_read_lock().
We might be interrupted by a softirq, re-enter rpl_output()
and corrupt dst_cache data structures.
Fix the race by using local_bh_disable() instead of
preempt_disable().
Apply a similar change in rpl_input().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Aring <aahringo@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As explained in commit 1378817486d6 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.
Disabling preemption in ioam6_output() is not good enough,
because ioam6_output() is called from process context,
lwtunnel_output() only uses rcu_read_lock().
We might be interrupted by a softirq, re-enter ioam6_output()
and corrupt dst_cache data structures.
Fix the race by using local_bh_disable() instead of
preempt_disable().
Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Justin Iurman <justin.iurman@uliege.be>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The in_token->pages[] array is not NULL terminated. This results in
the following KASAN splat:
KASAN: maybe wild-memory-access in range [0x04a2013400000008-0x04a201340000000f]
Fixes: bafa6b4d95d9 ("SUNRPC: Fix gss_free_in_token_pages()")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose default value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.
In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.
The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CONFIG_ARCH_NO_SG_CHAIN is defined
SG_MAX_SINGLE_ALLOC is used to limit maximum number of entries that
will be allocated in one piece of scatterlist. When the entries of
scatterlist exceeds SG_MAX_SINGLE_ALLOC, sg chain will be used. From
commit 7c703e54cc71 ("arch: switch the default on ARCH_HAS_SG_CHAIN"),
we can know that the macro CONFIG_ARCH_NO_SG_CHAIN is used to identify
whether sg chain is supported. So, SMC-R's rmb buffer should be limited
by SG_MAX_SINGLE_ALLOC only when the macro CONFIG_ARCH_NO_SG_CHAIN is
defined.
Fixes: a3fe3d01bd0d ("net/smc: introduce sg-logic for RMBs")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When splice() support was added in commit 2b514574f7e8 ("net:
af_unix: implement splice for stream af_unix sockets"), we had
to release unix_sk(sk)->readlock (current iolock) before calling
splice_to_pipe().
Due to the unlock, commit 73ed5d25dce0 ("af-unix: fix use-after-free
with concurrent readers while splicing") added a safeguard in
unix_stream_read_generic(); we had to bump the skb refcount before
calling ->recv_actor() and then check if the skb was consumed by a
concurrent reader.
However, the pipe side locking was refactored, and since commit
25869262ef7a ("skb_splice_bits(): get rid of callback"), we can
call splice_to_pipe() without releasing unix_sk(sk)->iolock.
Now, the skb is always alive after the ->recv_actor() callback,
so let's remove the unnecessary drop_skb logic.
This is mostly the revert of commit 73ed5d25dce0 ("af-unix: fix
use-after-free with concurrent readers while splicing").
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240529144648.68591-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TCP_CLOSE may or may not have current/rnext keys and should not be
considered "established". The fast-path for TCP_CLOSE is
SKB_DROP_REASON_TCP_CLOSE. This is what tcp_rcv_state_process() does
anyways. Add an early drop path to not spend any time verifying
segment signatures for sockets in TCP_CLOSE state.
Cc: stable@vger.kernel.org # v6.7
Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://lore.kernel.org/r/20240529-tcp_ao-sk_state-v1-1-d69b5d323c52@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently NCSI driver will send several NCSI commands back to back without
waiting the response of previous NCSI command or timeout in some state
when NIC have multi channel. This operation against the single thread
manner defined by NCSI SPEC(section 6.3.2.3 in DSP0222_1.1.1)
According to NCSI SPEC(section 6.2.13.1 in DSP0222_1.1.1), we should probe
one channel at a time by sending NCSI commands (Clear initial state, Get
version ID, Get capabilities...), than repeat this steps until the max
number of channels which we got from NCSI command (Get capabilities) has
been probed.
Fixes: e6f44ed6d04d ("net/ncsi: Package and channel management")
Signed-off-by: DelphineCCChiu <delphine_cc_chiu@wiwynn.com>
Link: https://lore.kernel.org/r/20240529065856.825241-1-delphine_cc_chiu@wiwynn.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The following sysctl are global and can't be read from a netns:
net.core.rmem_default
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max
Make the following sysctl parameters available readonly from within a
network namespace, allowing a container to read them.
Signed-off-by: Matteo Croce <teknoraver@meta.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/r/20240530232722.45255-2-technoboy85@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
John Sperbeck reported that if we turn off CONFIG_RFS_ACCEL, the 'head'
is not defined, which will trigger compile error. So I move the 'head'
out of the CONFIG_RFS_ACCEL scope.
Fixes: 84b6823cd96b ("net: rps: protect last_qtail with rps_input_queue_tail_save() helper")
Reported-by: John Sperbeck <jsperbeck@google.com>
Closes: https://lore.kernel.org/all/20240529203421.2432481-1-jsperbeck@google.com/
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240530032717.57787-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The object "ax25_dev" is managed by reference counting. Thus it should
not be directly released by kfree(), replace with ax25_dev_put().
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/20240530051733.11416-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When releasing a socket in ax25_release(), we call netdev_put() to
decrease the refcount on the associated ax.25 device. However, the
execution path for accepting an incoming connection never calls
netdev_hold(). This imbalance leads to refcount errors, and ultimately
to kernel crashes.
A typical call trace for the above situation will start with one of the
following errors:
refcount_t: decrement hit 0; leaking memory.
refcount_t: underflow; use-after-free.
And will then have a trace like:
Call Trace:
<TASK>
? show_regs+0x64/0x70
? __warn+0x83/0x120
? refcount_warn_saturate+0xb2/0x100
? report_bug+0x158/0x190
? prb_read_valid+0x20/0x30
? handle_bug+0x3e/0x70
? exc_invalid_op+0x1c/0x70
? asm_exc_invalid_op+0x1f/0x30
? refcount_warn_saturate+0xb2/0x100
? refcount_warn_saturate+0xb2/0x100
ax25_release+0x2ad/0x360
__sock_release+0x35/0xa0
sock_close+0x19/0x20
[...]
On reboot (or any attempt to remove the interface), the kernel gets
stuck in an infinite loop:
unregister_netdevice: waiting for ax0 to become free. Usage count = 0
This patch corrects these issues by ensuring that we call netdev_hold()
and ax25_dev_hold() for new connections in ax25_accept(). This makes the
logic leading to ax25_accept() match the logic for ax25_bind(): in both
cases we increment the refcount, which is ultimately decremented in
ax25_release().
Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
Tested-by: Duoming Zhou <duoming@zju.edu.cn>
Tested-by: Dan Cross <crossd@gmail.com>
Tested-by: Chris Maness <christopher.maness@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/20240529210242.3346844-2-lars@oddbit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently there are no strict checks while setting SO_TXTIME
from userspace. With the recent development in skb->tstamp_type
clockid with unsupported clocks results in warn_on_once, which causes
unnecessary aborts in some systems which enables panic on warns.
Add validation in setsockopt to support only CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_TAI to be set from userspace.
Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/
Link: https://lore.kernel.org/lkml/6bdba7b6-fd22-4ea5-a356-12268674def1@quicinc.com/
Fixes: 1693c5db6ab8 ("net: Add additional bit to support clockid_t timestamp type")
Reported-by: syzbot+d7b227731ec589e7f4f0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d7b227731ec589e7f4f0
Reported-by: syzbot+30a35a2e9c5067cc43fa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=30a35a2e9c5067cc43fa
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240529183130.1717083-1-quic_abchauha@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Statistic values should be set to ETHTOOL_STAT_NOT_SET even if the
device doesn't support statistics. Otherwise zeros will be returned as
if they are proper values:
host# ethtool -I -T lo
Time stamping parameters for lo:
Capabilities:
software-transmit
software-receive
software-system-clock
PTP Hardware Clock: none
Hardware Transmit Timestamp Modes: none
Hardware Receive Filter Modes: none
Statistics:
tx_pkts: 0
tx_lost: 0
tx_err: 0
Fixes: 0e9c127729be ("ethtool: add interface to read Tx hardware timestamping statistics")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Link: https://lore.kernel.org/r/20240530040814.1014446-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_classifier.c
abd5576b9c57 ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
56a5cf538c3f ("net: ti: icssg-prueth: Fix start counter for ft1 filter")
https://lore.kernel.org/all/20240531123822.3bb7eadf@canb.auug.org.au/
No other adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are too many ifdef in IPv6 segment routing code that may cause logic
problems. like commit 160e9d275218 ("ipv6: sr: fix invalid unregister error
path"). To avoid this, the init functions are redefined for both cases. The
code could be more clear after all fidefs are removed.
Suggested-by: Simon Horman <horms@kernel.org>
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240529040908.3472952-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass an additional pointer of bpf_struct_ops_link to callback function reg,
unreg, and update provided by subsystems defined in bpf_struct_ops. A
bpf_struct_ops_map can be registered for multiple links. Passing a pointer
of bpf_struct_ops_link helps subsystems to distinguish them.
This pointer will be used in the later patches to let the subsystem
initiate a detachment on a link that was registered to it previously.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240530065946.979330-2-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
don't return 0 if snd_buf->len really greater than snd_buf->buflen
Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Fixes: 0c77668ddb4e ("SUNRPC: Introduce trace points in rpc_auth_gss.ko")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- gro: initialize network_offset in network layer
- tcp: reduce accepted window in NEW_SYN_RECV state
Current release - new code bugs:
- eth: mlx5e: do not use ptp structure for tx ts stats when not
initialized
- eth: ice: check for unregistering correct number of devlink params
Previous releases - regressions:
- bpf: Allow delete from sockmap/sockhash only if update is allowed
- sched: taprio: extend minimum interval restriction to entire cycle
too
- netfilter: ipset: add list flush to cancel_gc
- ipv4: fix address dump when IPv4 is disabled on an interface
- sock_map: avoid race between sock_map_close and sk_psock_put
- eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete
status rules
Previous releases - always broken:
- core: fix __dst_negative_advice() race
- bpf:
- fix multi-uprobe PID filtering logic
- fix pkt_type override upon netkit pass verdict
- netfilter: tproxy: bail out if IP has been disabled on the device
- af_unix: annotate data-race around unix_sk(sk)->addr
- eth: mlx5e: fix UDP GSO for encapsulated packets
- eth: idpf: don't enable NAPI and interrupts prior to allocating Rx
buffers
- eth: i40e: fully suspend and resume IO operations in EEH case
- eth: octeontx2-pf: free send queue buffers incase of leaf to inner
- eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound"
* tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
netdev: add qstat for csum complete
ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
net: ena: Fix redundant device NUMA node override
ice: check for unregistering correct number of devlink params
ice: fix 200G PHY types to link speed mapping
i40e: Fully suspend and resume IO operations in EEH case
i40e: factoring out i40e_suspend/i40e_resume
e1000e: move force SMBUS near the end of enable_ulp function
net: dsa: microchip: fix RGMII error in KSZ DSA driver
ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
net: fix __dst_negative_advice() race
nfc/nci: Add the inconsistency check between the input data length and count
MAINTAINERS: dwmac: starfive: update Maintainer
net/sched: taprio: extend minimum interval restriction to entire cycle too
net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
netfilter: nft_fib: allow from forward/input without iif selector
netfilter: tproxy: bail out if IP has been disabled on the device
netfilter: nft_payload: skbuff vlan metadata mangle support
net: ti: icssg-prueth: Fix start counter for ft1 filter
sock_map: avoid race between sock_map_close and sk_psock_put
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
Patch #1 syzbot reports that nf_reinject() could be called without
rcu_read_lock() when flushing pending packets at nfnetlink
queue removal, from Eric Dumazet.
Patch #2 flushes ipset list:set when canceling garbage collection to
reference to other lists to fix a race, from Jozsef Kadlecsik.
Patch #3 restores q-in-q matching with nft_payload by reverting
f6ae9f120dad ("netfilter: nft_payload: add C-VLAN support").
Patch #4 fixes vlan mangling in skbuff when vlan offload is present
in skbuff, without this patch nft_payload corrupts packets
in this case.
Patch #5 fixes possible nul-deref in tproxy no IP address is found in
netdevice, reported by syzbot and patch from Florian Westphal.
Patch #6 removes a superfluous restriction which prevents loose fib
lookups from input and forward hooks, from Eric Garver.
My assessment is that patches #1, #2 and #5 address possible kernel
crash, anything else in this batch fixes broken features.
netfilter pull request 24-05-29
* tag 'nf-24-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_fib: allow from forward/input without iif selector
netfilter: tproxy: bail out if IP has been disabled on the device
netfilter: nft_payload: skbuff vlan metadata mangle support
netfilter: nft_payload: restore vlan q-in-q match support
netfilter: ipset: Add list flush to cancel_gc
netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()
====================
Link: https://lore.kernel.org/r/20240528225519.1155786-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
A recent change to inet_dump_ifaddr had the function incorrectly iterate
over net rather than tgt_net, resulting in the data coming for the
incorrect network namespace.
Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Closes: https://github.com/lxc/incus/issues/892
Bisected-by: Stéphane Graber <stgraber@stgraber.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Tested-by: Stéphane Graber <stgraber@stgraber.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240528203030.10839-1-aleksandr.mikhalitsyn@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No DSA driver makes use of the mac_prepare()/mac_finish() shimmed
operations anymore, so we can remove these.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1sByNx-00ELW1-Vp@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.
RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).
Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.
Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.
Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.
Many thanks to Clement Lecigne for tracking this issue.
This old bug became visible after the blamed commit, using UDP sockets.
Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These functions have races when they:
1) Write sk->sk_err
2) call sk_error_report(sk)
3) call tcp_done(sk)
As described in prior patches in this series:
An smp_wmb() is missing.
We should call tcp_done() before sk_error_report(sk)
to have consistent tcp_poll() results on SMP hosts.
Use tcp_done_with_error() where we centralized the
correct sequence.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20240528125253.1966136-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcp_abort() has the same issue than the one fixed in the prior patch
in tcp_write_err().
In order to get consistent results from tcp_poll(), we must call
sk_error_report() after tcp_done().
We can use tcp_done_with_error() to centralize this logic.
Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20240528125253.1966136-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I noticed flakes in a packetdrill test, expecting an epoll_wait()
to return EPOLLERR | EPOLLHUP on a failed connect() attempt,
after multiple SYN retransmits. It sometimes return EPOLLERR only.
The issue is that tcp_write_err():
1) writes an error in sk->sk_err,
2) calls sk_error_report(),
3) then calls tcp_done().
tcp_done() is writing SHUTDOWN_MASK into sk->sk_shutdown,
among other things.
Problem is that the awaken user thread (from 2) sk_error_report())
might call tcp_poll() before tcp_done() has written sk->sk_shutdown.
tcp_poll() only sees a non zero sk->sk_err and returns EPOLLERR.
This patch fixes the issue by making sure to call sk_error_report()
after tcp_done().
tcp_write_err() also lacks an smp_wmb().
We can reuse tcp_done_with_error() to factor out the details,
as Neal suggested.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20240528125253.1966136-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcp_reset() ends with a sequence that is carefuly ordered.
We need to fix [e]poll bugs in the following patches,
it makes sense to use a common helper.
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20240528125253.1966136-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull 9p fixes from Dominique Martinet:
"Two fixes headed to stable trees:
- a trace event was dumping uninitialized values
- a missing lock that was thought to have exclusive access, and it
turned out not to"
* tag '9p-for-6.10-rc2' of https://github.com/martinetd/linux:
9p: add missing locking around taking dentry fid list
net/9p: fix uninit-value in p9_client_rpc()
|
|
When testing the previous patch with CONFIG_UBSAN_BOUNDS, I've
noticed the following:
UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:372:4
index 0 is out of range for type 'struct ieee80211_channel *[]'
CPU: 0 PID: 1435 Comm: wpa_supplicant Not tainted 6.9.0+ #1
Hardware name: LENOVO 20UN005QRT/20UN005QRT <...BIOS details...>
Call Trace:
<TASK>
dump_stack_lvl+0x2d/0x90
__ubsan_handle_out_of_bounds+0xe7/0x140
? timerqueue_add+0x98/0xb0
ieee80211_prep_hw_scan+0x2db/0x480 [mac80211]
? __kmalloc+0xe1/0x470
__ieee80211_start_scan+0x541/0x760 [mac80211]
rdev_scan+0x1f/0xe0 [cfg80211]
nl80211_trigger_scan+0x9b6/0xae0 [cfg80211]
...<the rest is not too useful...>
Since '__ieee80211_start_scan()' leaves 'hw_scan_req->req.n_channels'
uninitialized, actual boundaries of 'hw_scan_req->req.channels' can't
be checked in 'ieee80211_prep_hw_scan()'. Although an initialization
of 'hw_scan_req->req.n_channels' introduces some confusion around
allocated vs. used VLA members, this shouldn't be a problem since
everything is correctly adjusted soon in 'ieee80211_prep_hw_scan()'.
Cleanup 'kmalloc()' math in '__ieee80211_start_scan()' by using the
convenient 'struct_size()' as well.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://msgid.link/20240517153332.18271-2-dmantipov@yandex.ru
[improve (imho) indentation a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Currently, the way of parsing Spatial Reuse Parameter Set element is
incorrect and some members of struct ieee80211_he_obss_pd are not assigned.
To address this issue, it must be parsed in the order of the elements of
Spatial Reuse Parameter Set defined in the IEEE Std 802.11ax specification.
The diagram of the Spatial Reuse Parameter Set element (IEEE Std 802.11ax
-2021-9.4.2.252).
-------------------------------------------------------------------------
| | | | |Non-SRG| SRG | SRG | SRG | SRG |
|Element|Length| Element | SR |OBSS PD|OBSS PD|OBSS PD| BSS |Partial|
| ID | | ID |Control| Max | Min | Max |Color | BSSID |
| | |Extension| | Offset| Offset|Offset |Bitmap|Bitmap |
-------------------------------------------------------------------------
Fixes: 1ced169cc1c2 ("mac80211: allow setting spatial reuse parameters from bss_conf")
Signed-off-by: Lingbo Kong <quic_lingbok@quicinc.com>
Link: https://msgid.link/20240516021854.5682-3-quic_lingbok@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Currently, the way to check the size of Spatial Reuse IE data in the
ieee80211_parse_extension_element() is incorrect.
This is because the len variable in the ieee80211_parse_extension_element()
function is equal to the size of Spatial Reuse IE data minus one and the
value of returned by the ieee80211_he_spr_size() function is equal to
the length of Spatial Reuse IE data. So the result of the
len >= ieee80211_he_spr_size(data) statement always false.
To address this issue and make it consistent with the logic used elsewhere
with ieee80211_he_oper_size(), change the
"len >= ieee80211_he_spr_size(data)" to
“len >= ieee80211_he_spr_size(data) - 1”.
Fixes: 9d0480a7c05b ("wifi: mac80211: move element parsing to a new file")
Signed-off-by: Lingbo Kong <quic_lingbok@quicinc.com>
Link: https://msgid.link/20240516021854.5682-2-quic_lingbok@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Original changes[1] posted is having proper changes. However, at the same
time, there was chandef puncturing changes which had a conflict with this.
While applying, two errors crept in -
a) Whitespace error.
b) Link ID being passed to channel switch started notifier function is
0. However proper link ID is present in the function.
Fix these now.
[1] https://lore.kernel.org/all/20240130140918.1172387-5-quic_adisi@quicinc.com/
Fixes: 1a96bb4e8a79 ("wifi: mac80211: start and finalize channel switch on link basis")
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://msgid.link/20240509032555.263933-1-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|