Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Remove usermode driver (UMD) framework (Thomas Weißschuh)
- Introduce Strongly Connected Component (SCC) in the verifier to
detect loops and refine register liveness (Eduard Zingerman)
- Allow 'void *' cast using bpf_rdonly_cast() and corresponding
'__arg_untrusted' for global function parameters (Eduard Zingerman)
- Improve precision for BPF_ADD and BPF_SUB operations in the verifier
(Harishankar Vishwanathan)
- Teach the verifier that constant pointer to a map cannot be NULL
(Ihor Solodrai)
- Introduce BPF streams for error reporting of various conditions
detected by BPF runtime (Kumar Kartikeya Dwivedi)
- Teach the verifier to insert runtime speculation barrier (lfence on
x86) to mitigate speculative execution instead of rejecting the
programs (Luis Gerhorst)
- Various improvements for 'veristat' (Mykyta Yatsenko)
- For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to
improve bug detection by syzbot (Paul Chaignon)
- Support BPF private stack on arm64 (Puranjay Mohan)
- Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's
node (Song Liu)
- Introduce kfuncs for read-only string opreations (Viktor Malik)
- Implement show_fdinfo() for bpf_links (Tao Chen)
- Reduce verifier's stack consumption (Yonghong Song)
- Implement mprog API for cgroup-bpf programs (Yonghong Song)
* tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits)
selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite
selftests/bpf: Add selftest for attaching tracing programs to functions in deny list
bpf: Add log for attaching tracing programs to functions in deny list
bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions
bpf: Fix various typos in verifier.c comments
bpf: Add third round of bounds deduction
selftests/bpf: Test invariants on JSLT crossing sign
selftests/bpf: Test cross-sign 64bits range refinement
selftests/bpf: Update reg_bound range refinement logic
bpf: Improve bounds when s64 crosses sign boundary
bpf: Simplify bounds refinement from s32
selftests/bpf: Enable private stack tests for arm64
bpf, arm64: JIT support for private stack
bpf: Move bpf_jit_get_prog_name() to core.c
bpf, arm64: Fix fp initialization for exception boundary
umd: Remove usermode driver framework
bpf/preload: Don't select USERMODE_DRIVER
selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure
selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure
selftests/bpf: Increase xdp data size for arm64 64K page size
...
|
|
Both taprio and mqprio have code to validate respective entry index
attributes. The validation is indented to ensure that the attribute is
present, and that it's value is in range, and that each value is only
used once.
The purpose of this patch is to align the implementation of taprio with
that of mqprio as there seems to be no good reason for them to differ.
For one thing, this way, bugs will be present in both or neither.
As a follow-up some consideration could be given to a common function
used by both sch.
No functional change intended.
Except of tdc run: the results of the taprio tests
# ok 81 ba39 - Add taprio Qdisc to multi-queue device (8 queues)
# ok 82 9462 - Add taprio Qdisc with multiple sched-entry
# ok 83 8d92 - Add taprio Qdisc with txtime-delay
# ok 84 d092 - Delete taprio Qdisc with valid handle
# ok 85 8471 - Show taprio class
# ok 86 0a85 - Add taprio Qdisc to single-queue device
# ok 87 6f62 - Add taprio Qdisc with too short interval
# ok 88 831f - Add taprio Qdisc with too short cycle-time
# ok 89 3e1e - Add taprio Qdisc with an invalid cycle-time
# ok 90 39b4 - Reject grafting taprio as child qdisc of software taprio
# ok 91 e8a1 - Reject grafting taprio as child qdisc of offloaded taprio
# ok 92 a7bf - Graft cbs as child of software taprio
# ok 93 6a83 - Graft cbs as child of offloaded taprio
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Maher Azzouzi <maherazz04@gmail.com>
Link: https://lore.kernel.org/netdev/20250723125521.GA2459@horms.kernel.org/
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20250725-taprio-idx-parse-v1-1-b582fffcde37@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, packets dropped by pfifo_fast due to queue overflow are
marked with a generic SKB_DROP_REASON_QDISC_DROP in __dev_xmit_skb().
This patch adds explicit drop reason SKB_DROP_REASON_QDISC_OVERLIMIT
for queue-full cases, providing better distinction from other qdisc drops.
Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250724212837119BP9HOs0ibXDRWgsXMMir7@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc8).
Conflicts:
drivers/net/ethernet/microsoft/mana/gdma_main.c
9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion")
755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au
Adjacent changes:
drivers/net/ethernet/ti/icssg/icssg_prueth.h
6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG")
ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
DualPI2 provides L4S-type low latency & loss to traffic that uses a
scalable congestion controller (e.g. TCP-Prague, DCTCP) without
degrading the performance of 'classic' traffic (e.g. Reno,
Cubic etc.). It is to be the reference implementation of IETF RFC9332
DualQ Coupled AQM (https://datatracker.ietf.org/doc/html/rfc9332).
Note that creating two independent queues cannot meet the goal of
DualPI2 mentioned in RFC9332: "...to preserve fairness between
ECN-capable and non-ECN-capable traffic." Further, it could even
lead to starvation of Classic traffic, which is also inconsistent
with the requirements in RFC9332: "...although priority MUST be
bounded in order not to starve Classic traffic." DualPI2 is
designed to maintain approximate per-flow fairness on L-queue and
C-queue by forming a single qdisc using the coupling factor and
scheduler between two queues.
The qdisc provides two queues called low latency and classic. It
classifies packets based on the ECN field in the IP headers. By
default it directs non-ECN and ECT(0) into the classic queue and
ECT(1) and CE into the low latency queue, as per the IETF spec.
Each queue runs its own AQM:
* The classic AQM is called PI2, which is similar to the PIE AQM but
more responsive and simpler. Classic traffic requires a decent
target queue (default 15ms for Internet deployment) to fully
utilize the link and to avoid high drop rates.
* The low latency AQM is, by default, a very shallow ECN marking
threshold (1ms) similar to that used for DCTCP.
The DualQ isolates the low queuing delay of the Low Latency queue
from the larger delay of the 'Classic' queue. However, from a
bandwidth perspective, flows in either queue will share out the link
capacity as if there was just a single queue. This bandwidth pooling
effect is achieved by coupling together the drop and ECN-marking
probabilities of the two AQMs.
The PI2 AQM has two main parameters in addition to its target delay.
The integral gain factor alpha is used to slowly correct any persistent
standing queue error from the target delay, while the proportional gain
factor beta is used to quickly compensate for queue changes (growth or
shrinkage). Either alpha and beta are given as a parameter, or they can
be calculated by tc from alternative typical and maximum RTT parameters.
Internally, the output of a linear Proportional Integral (PI)
controller is used for both queues. This output is squared to
calculate the drop or ECN-marking probability of the classic queue.
This counterbalances the square-root rate equation of Reno/Cubic,
which is the trick that balances flow rates across the queues. For
the ECN-marking probability of the low latency queue, the output of
the base AQM is multiplied by a coupling factor. This determines the
balance between the flow rates in each queue. The default setting
makes the flow rates roughly equal, which should be generally
applicable.
If DUALPI2 AQM has detected overload (due to excessive non-responsive
traffic in either queue), it will switch to signaling congestion
solely using drop, irrespective of the ECN field. Alternatively, it
can be configured to limit the drop probability and let the queue
grow and eventually overflow (like tail-drop).
GSO splitting in DUALPI2 is configurable from userspace while the
default behavior is to split gso. When running DUALPI2 at unshaped
10gigE with 4 download streams test, splitting gso apart results in
halving the latency with no loss in throughput:
Summary of tcp_4down run 'no_split_gso':
avg median # data pts
Ping (ms) ICMP : 0.53 0.30 ms 350
TCP download avg : 2326.86 N/A Mbits/s 350
TCP download sum : 9307.42 N/A Mbits/s 350
TCP download::1 : 2672.99 2568.73 Mbits/s 350
TCP download::2 : 2586.96 2570.51 Mbits/s 350
TCP download::3 : 1786.26 1798.82 Mbits/s 350
TCP download::4 : 2261.21 2309.49 Mbits/s 350
Summart of tcp_4down run 'split_gso':
avg median # data pts
Ping (ms) ICMP : 0.22 0.23 ms 350
TCP download avg : 2335.02 N/A Mbits/s 350
TCP download sum : 9340.09 N/A Mbits/s 350
TCP download::1 : 2335.30 2334.22 Mbits/s 350
TCP download::2 : 2334.72 2334.20 Mbits/s 350
TCP download::3 : 2335.28 2334.58 Mbits/s 350
TCP download::4 : 2334.79 2334.39 Mbits/s 350
A similar result is observed when running DUALPI2 at unshaped 1gigE
with 1 download stream test:
Summary of tcp_1down run 'no_split_gso':
avg median # data pts
Ping (ms) ICMP : 1.13 1.25 ms 350
TCP download : 941.41 941.46 Mbits/s 350
Summart of tcp_1down run 'split_gso':
avg median # data pts
Ping (ms) ICMP : 0.51 0.55 ms 350
TCP download : 941.41 941.45 Mbits/s 350
Additional details can be found in the draft:
https://datatracker.ietf.org/doc/html/rfc9332
Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Co-developed-by: Olga Albisser <olga@albisser.org>
Signed-off-by: Olga Albisser <olga@albisser.org>
Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Co-developed-by: Henrik Steen <henrist@henrist.net>
Signed-off-by: Henrik Steen <henrist@henrist.net>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Bob Briscoe <research@bobbriscoe.net>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Acked-by: Dave Taht <dave.taht@gmail.com>
Link: https://patch.msgid.link/20250722095915.24485-4-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The configuration and statistics dump of the DualPI2 Qdisc provides
information related to both queues, such as packet numbers and queuing
delays in the L-queue and C-queue, as well as general information such as
probability value, WRR credits, memory usage, packet marking counters, max
queue size, etc.
The following patch includes enqueue/dequeue for DualPI2.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Link: https://patch.msgid.link/20250722095915.24485-3-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
DualPI2 is the reference implementation of IETF RFC9332 DualQ Coupled
AQM (https://datatracker.ietf.org/doc/html/rfc9332) providing two
queues called low latency (L-queue) and classic (C-queue). By default,
it enqueues non-ECN and ECT(0) packets into the C-queue and ECT(1) and
CE packets into the low latency queue (L-queue), as per IETF RFC9332 spec.
This patch defines the dualpi2 Qdisc structure and parsing, and the
following two patches include dumping and enqueue/dequeue for the DualPI2.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Link: https://patch.msgid.link/20250722095915.24485-2-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
qfq_delete_class
might_sleep could be trigger in the atomic context in qfq_delete_class.
qfq_destroy_class was moved into atomic context locked
by sch_tree_lock to avoid a race condition bug on
qfq_aggregate. However, might_sleep could be triggered by
qfq_destroy_class, which introduced sleeping in atomic context (path:
qfq_destroy_class->qdisc_put->__qdisc_destroy->lockdep_unregister_key
->might_sleep).
Considering the race is on the qfq_aggregate objects, keeping
qfq_rm_from_agg in the lock but moving the left part out can solve
this issue.
Fixes: 5e28d5a3f774 ("net/sched: sch_qfq: Fix race condition on qfq_aggregate")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/4a04e0cc-a64b-44e7-9213-2880ed641d77@sabinyo.mountain
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20250717230128.159766-1-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge BPF and other fixes after downstream PR.
No conflicts.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc7).
Conflicts:
Documentation/netlink/specs/ovpn.yaml
880d43ca9aa4 ("netlink: specs: clean up spaces in brackets")
af52020fc599 ("ovpn: reject unexpected netlink attributes")
drivers/net/phy/phy_device.c
a44312d58e78 ("net: phy: Don't register LEDs for genphy")
f0f2b992d818 ("net: phy: Don't register LEDs for genphy")
https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org
drivers/net/wireless/intel/iwlwifi/fw/regulatory.c
drivers/net/wireless/intel/iwlwifi/mld/regulatory.c
5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap")
ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware")
net/ipv6/mcast.c
ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()")
a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()")
https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
htb_lookup_leaf has a BUG_ON that can trigger with the following:
tc qdisc del dev lo root
tc qdisc add dev lo root handle 1: htb default 1
tc class add dev lo parent 1: classid 1:1 htb rate 64bit
tc qdisc add dev lo parent 1:1 handle 2: netem
tc qdisc add dev lo parent 2:1 handle 3: blackhole
ping -I lo -c1 -W0.001 127.0.0.1
The root cause is the following:
1. htb_dequeue calls htb_dequeue_tree which calls the dequeue handler on
the selected leaf qdisc
2. netem_dequeue calls enqueue on the child qdisc
3. blackhole_enqueue drops the packet and returns a value that is not
just NET_XMIT_SUCCESS
4. Because of this, netem_dequeue calls qdisc_tree_reduce_backlog, and
since qlen is now 0, it calls htb_qlen_notify -> htb_deactivate ->
htb_deactiviate_prios -> htb_remove_class_from_row -> htb_safe_rb_erase
5. As this is the only class in the selected hprio rbtree,
__rb_change_child in __rb_erase_augmented sets the rb_root pointer to
NULL
6. Because blackhole_dequeue returns NULL, netem_dequeue returns NULL,
which causes htb_dequeue_tree to call htb_lookup_leaf with the same
hprio rbtree, and fail the BUG_ON
The function graph for this scenario is shown here:
0) | htb_enqueue() {
0) + 13.635 us | netem_enqueue();
0) 4.719 us | htb_activate_prios();
0) # 2249.199 us | }
0) | htb_dequeue() {
0) 2.355 us | htb_lookup_leaf();
0) | netem_dequeue() {
0) + 11.061 us | blackhole_enqueue();
0) | qdisc_tree_reduce_backlog() {
0) | qdisc_lookup_rcu() {
0) 1.873 us | qdisc_match_from_root();
0) 6.292 us | }
0) 1.894 us | htb_search();
0) | htb_qlen_notify() {
0) 2.655 us | htb_deactivate_prios();
0) 6.933 us | }
0) + 25.227 us | }
0) 1.983 us | blackhole_dequeue();
0) + 86.553 us | }
0) # 2932.761 us | qdisc_warn_nonwc();
0) | htb_lookup_leaf() {
0) | BUG_ON();
------------------------------------------
The full original bug report can be seen here [1].
We can fix this just by returning NULL instead of the BUG_ON,
as htb_dequeue_tree returns NULL when htb_lookup_leaf returns
NULL.
[1] https://lore.kernel.org/netdev/pF5XOOIim0IuEfhI-SOxTgRvNoDwuux7UHKnE_Y5-zVd4wmGvNk2ceHjKb8ORnzw0cGwfmVu42g9dL7XyJLf1NEzaztboTWcm0Ogxuojoeo=@willsroot.io/
Fixes: 512bb43eb542 ("pkt_sched: sch_htb: Optimize WARN_ONs in htb_dequeue_tree() etc.")
Signed-off-by: William Liu <will@willsroot.io>
Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io>
Link: https://patch.msgid.link/20250717022816.221364-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use BTF_ID_LIST_SINGLE(a, b, c) instead of
BTF_ID_LIST(a)
BTF_ID(b, c)
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Link: https://lore.kernel.org/r/20250710055419.70544-1-yangfeng59949@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
A race condition can occur when 'agg' is modified in qfq_change_agg
(called during qfq_enqueue) while other threads access it
concurrently. For example, qfq_dump_class may trigger a NULL
dereference, and qfq_delete_class may cause a use-after-free.
This patch addresses the issue by:
1. Moved qfq_destroy_class into the critical section.
2. Added sch_tree_lock protection to qfq_dump_class and
qfq_dump_class_stats.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Also storing tcf_action into struct tcf_skbedit_params
makes sure there is no discrepancy in tcf_skbedit_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-12-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_police_params
makes sure there is no discrepancy in tcf_police_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-11-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_pedit_params
makes sure there is no discrepancy in tcf_pedit_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_nat_params
makes sure there is no discrepancy in tcf_nat_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_mpls_params
makes sure there is no discrepancy in tcf_mpls_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_ctinfo_params
makes sure there is no discrepancy in tcf_ctinfo_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
missed that stats_dscp_set, stats_dscp_error and stats_cpmark_set
might be written (and read) locklessly.
Use atomic64_t for these three fields, I doubt act_ctinfo is used
heavily on big SMP hosts anyway.
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pedro Tammela <pctammela@mojatatu.com>
Link: https://patch.msgid.link/20250709090204.797558-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_ct_params
makes sure there is no discrepancy in tcf_ct_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_csum_params
makes sure there is no discrepancy in tcf_csum_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also storing tcf_action into struct tcf_connmark_parms
makes sure there is no discrepancy in tcf_connmark_act().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netem_enqueue's duplication prevention logic breaks when a netem
resides in a qdisc tree with other netems - this can lead to a
soft lockup and OOM loop in netem_dequeue, as seen in [1].
Ensure that a duplicating netem cannot exist in a tree with other
netems.
Previous approaches suggested in discussions in chronological order:
1) Track duplication status or ttl in the sk_buff struct. Considered
too specific a use case to extend such a struct, though this would
be a resilient fix and address other previous and potential future
DOS bugs like the one described in loopy fun [2].
2) Restrict netem_enqueue recursion depth like in act_mirred with a
per cpu variable. However, netem_dequeue can call enqueue on its
child, and the depth restriction could be bypassed if the child is a
netem.
3) Use the same approach as in 2, but add metadata in netem_skb_cb
to handle the netem_dequeue case and track a packet's involvement
in duplication. This is an overly complex approach, and Jamal
notes that the skb cb can be overwritten to circumvent this
safeguard.
4) Prevent the addition of a netem to a qdisc tree if its ancestral
path contains a netem. However, filters and actions can cause a
packet to change paths when re-enqueued to the root from netem
duplication, leading us to the current solution: prevent a
duplicating netem from inhabiting the same tree as other netems.
[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
[2] https://lwn.net/Articles/719297/
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Signed-off-by: William Liu <will@willsroot.io>
Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250708164141.875402-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc6).
No conflicts.
Adjacent changes:
Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml
0a12c435a1d6 ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible")
b3603c0466a8 ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To prevent a potential crash in agg_dequeue (net/sched/sch_qfq.c)
when cl->qdisc->ops->peek(cl->qdisc) returns NULL, we check the return
value before using it, similar to the existing approach in sch_hfsc.c.
To avoid code duplication, the following changes are made:
1. Changed qdisc_warn_nonwc(include/net/pkt_sched.h) into a static
inline function.
2. Moved qdisc_peek_len from net/sched/sch_hfsc.c to
include/net/pkt_sched.h so that sch_qfq can reuse it.
3. Applied qdisc_peek_len in agg_dequeue to avoid crashing.
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250705212143.3982664-1-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Lion's patch [1] revealed an ancient bug in the qdisc API.
Whenever a user creates/modifies a qdisc specifying as a parent another
qdisc, the qdisc API will, during grafting, detect that the user is
not trying to attach to a class and reject. However grafting is
performed after qdisc_create (and thus the qdiscs' init callback) is
executed. In qdiscs that eventually call qdisc_tree_reduce_backlog
during init or change (such as fq, hhf, choke, etc), an issue
arises. For example, executing the following commands:
sudo tc qdisc add dev lo root handle a: htb default 2
sudo tc qdisc add dev lo parent a: handle beef fq
Qdiscs such as fq, hhf, choke, etc unconditionally invoke
qdisc_tree_reduce_backlog() in their control path init() or change() which
then causes a failure to find the child class; however, that does not stop
the unconditional invocation of the assumed child qdisc's qlen_notify with
a null class. All these qdiscs make the assumption that class is non-null.
The solution is ensure that qdisc_leaf() which looks up the parent
class, and is invoked prior to qdisc_create(), should return failure on
not finding the class.
In this patch, we leverage qdisc_leaf to return ERR_PTRs whenever the
parentid doesn't correspond to a class, so that we can detect it
earlier on and abort before qdisc_create is called.
[1] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
Fixes: 5e50da01d0ce ("[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs")
Reported-by: syzbot+d8b58d7b0ad89a678a16@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68663c93.a70a0220.5d25f.0857.GAE@google.com/
Reported-by: syzbot+5eccb463fa89309d8bdc@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68663c94.a70a0220.5d25f.0858.GAE@google.com/
Reported-by: syzbot+1261670bbdefc5485a06@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0013.GAE@google.com/
Reported-by: syzbot+15b96fc3aac35468fe77@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0014.GAE@google.com/
Reported-by: syzbot+4dadc5aecf80324d5a51@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68679e81.a70a0220.29cf51.0016.GAE@google.com/
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250707210801.372995-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refine qdisc_pkt_len_init to include headers up through
the inner transport header when computing header size
for encapsulations. Also refine net/sched/sch_cake.c
borrowed from qdisc_pkt_len_init().
Signed-off-by: Fengyuan Gong <gfengyuan@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250702160741.1204919-1-gfengyuan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tc_action_net_exit() got an rtnl exclusion in commit
a159d3c4b829 ("net_sched: acquire RTNL in tc_action_net_exit()")
Since then, commit 16af6067392c ("net: sched: implement reference
counted action release") made this RTNL exclusion obsolete for
most cases.
Only tcf_action_offload_del() might still require it.
Move the rtnl locking into tcf_idrinfo_destroy() when
an offload action is found.
Most netns do not have actions, yet deleting them is adding a lot
of pressure on RTNL, which is for many the most contended mutex
in the kernel.
We are moving to a per-netns 'rtnl', so tc_action_net_exit()
will not be able to grab 'rtnl' a single time for a batch of netns.
Before the patch:
perf probe -a rtnl_lock
perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.305 MB perf.data (25 samples) ]
After the patch:
perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.304 MB perf.data (9 samples) ]
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@nvidia.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://patch.msgid.link/20250702071230.1892674-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc5).
No conflicts.
No adjacent changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Certain classful qdiscs may invoke their classes' dequeue handler on an
enqueue operation. This may unexpectedly empty the child qdisc and thus
make an in-flight class passive via qlen_notify(). Most qdiscs do not
expect such behaviour at this point in time and may re-activate the
class eventually anyways which will lead to a use-after-free.
The referenced fix commit attempted to fix this behavior for the HFSC
case by moving the backlog accounting around, though this turned out to
be incomplete since the parent's parent may run into the issue too.
The following reproducer demonstrates this use-after-free:
tc qdisc add dev lo root handle 1: drr
tc filter add dev lo parent 1: basic classid 1:1
tc class add dev lo parent 1: classid 1:1 drr
tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
tc qdisc add dev lo parent 2:1 handle 3: netem
tc qdisc add dev lo parent 3:1 handle 4: blackhole
echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
tc class delete dev lo classid 1:1
echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
Since backlog accounting issues leading to a use-after-frees on stale
class pointers is a recurring pattern at this point, this patch takes
a different approach. Instead of trying to fix the accounting, the patch
ensures that qdisc_tree_reduce_backlog always calls qlen_notify when
the child qdisc is empty. This solves the problem because deletion of
qdiscs always involves a call to qdisc_reset() and / or
qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing
the following qdisc_tree_reduce_backlog() to report to the parent. Note
that this may call qlen_notify on passive classes multiple times. This
is not a problem after the recent patch series that made all the
classful qdiscs qlen_notify() handlers idempotent.
Fixes: 3f981138109f ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()")
Signed-off-by: Lion Ackermann <nnamrec@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the deprecated strncpy() with the two-argument version of
strscpy() as the destination is an array
and buffer should be NUL-terminated.
Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250620103653.6957-1-pranav.tyagi03@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since taprio’s taprio_dev_notifier() isn’t protected by an
RCU read-side critical section, a race with advance_sched()
can lead to a use-after-free.
Adding rcu_read_lock() inside taprio_dev_notifier() prevents this.
Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations")
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/aEzIYYxt0is9upYG@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
|
|
Gerrard Tai reported a race condition in ETS, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in TBF, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://patch.msgid.link/20250611111515.1983366-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in RED, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: 0c8d13ac9607 ("net: sched: red: delay destroying child qdisc on replace")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: 7b8e0b6e6599 ("net: sched: prio: delay destroying child qdiscs on change")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported that SFQ perturb_period has no range check yet,
and this can be used to trigger a race condition fixed in a separate patch.
We want to make sure ctl->perturb_period * HZ will not overflow
and is positive.
Tested:
tc qd add dev lo root sfq perturb -10 # negative value : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb 1000000000 # too big : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb 2000000 # acceptable value
tc -s -d qd sh dev lo
qdisc sfq 8005: root refcnt 2 limit 127p quantum 64Kb depth 127 flows 128 divisor 1024 perturb 2000000sec
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250611083501.1810459-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SFQ has an assumption of always being able to queue at least one packet.
However, after the blamed commit, sch->q.len can be inflated by packets
in sch->gso_skb, and an enqueue() on an empty SFQ qdisc can be followed
by an immediate drop.
Fix sfq_drop() to properly clear q->tail in this situation.
Tested:
ip netns add lb
ip link add dev to-lb type veth peer name in-lb netns lb
ethtool -K to-lb tso off # force qdisc to requeue gso_skb
ip netns exec lb ethtool -K in-lb gro on # enable NAPI
ip link set dev to-lb up
ip -netns lb link set dev in-lb up
ip addr add dev to-lb 192.168.20.1/24
ip -netns lb addr add dev in-lb 192.168.20.2/24
tc qdisc replace dev to-lb root sfq limit 100
ip netns exec lb netserver
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Closes: https://lore.kernel.org/netdev/9da42688-bfaa-4364-8797-e9271f3bdaef@hetzner-cloud.de/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250606165127.3629486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
Merge in late fixes to prepare for the 6.16 net-next PR.
No conflicts nor adjacent changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Savino says:
"We are writing to report that this recent patch
(141d34391abbb315d68556b7c67ad97885407547) [1]
can be bypassed, and a UAF can still occur when HFSC is utilized with
NETEM.
The patch only checks the cl->cl_nactive field to determine whether
it is the first insertion or not [2], but this field is only
incremented by init_vf [3].
By using HFSC_RSC (which uses init_ed) [4], it is possible to bypass the
check and insert the class twice in the eltree.
Under normal conditions, this would lead to an infinite loop in
hfsc_dequeue for the reasons we already explained in this report [5].
However, if TBF is added as root qdisc and it is configured with a
very low rate,
it can be utilized to prevent packets from being dequeued.
This behavior can be exploited to perform subsequent insertions in the
HFSC eltree and cause a UAF."
To fix both the UAF and the infinite loop, with netem as an hfsc child,
check explicitly in hfsc_enqueue whether the class is already in the eltree
whenever the HFSC_RSC flag is set.
[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=141d34391abbb315d68556b7c67ad97885407547
[2] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L1572
[3] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L677
[4] https://elixir.bootlin.com/linux/v6.15-rc5/source/net/sched/sch_hfsc.c#L1574
[5] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/T/#u
Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs")
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Reported-by: William Liu <will@willsroot.io>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://patch.msgid.link/20250522181448.1439717-2-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc8).
Conflicts:
80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X")
4bcc063939a5 ("ice, irdma: fix an off by one in error handling code")
c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers")
https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au
No extra adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the
child qdisc's peek() operation before incrementing sch->q.qlen and
sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may
trigger an immediate dequeue and potential packet drop. In such cases,
qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog
have not yet been updated, leading to inconsistent queue accounting. This
can leave an empty HFSC class in the active list, causing further
consequences like use-after-free.
This patch fixes the bug by moving the increment of sch->q.qlen and
sch->qstats.backlog before the call to the child qdisc's peek() operation.
This ensures that queue length and backlog are always accurate when packet
drops or dequeues are triggered during the peek.
Fixes: 12d0ad3be9c3 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline")
Reported-by: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Make sctp_compute_cksum() just use the new function skb_crc32c(),
instead of calling __skb_checksum() with a skb_checksum_ops struct that
does CRC32C. This is faster and simpler.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20250519175012.36581-6-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc7).
Conflicts:
tools/testing/selftests/drivers/net/hw/ncdevmem.c
97c4e094a4b2 ("tests/ncdevmem: Fix double-free of queue array")
2f1a805f32ba ("selftests: ncdevmem: Implement devmem TCP TX")
https://lore.kernel.org/20250514122900.1e77d62d@canb.auug.org.au
Adjacent changes:
net/core/devmem.c
net/core/devmem.h
0afc44d8cdf6 ("net: devmem: fix kernel panic when netlink socket close after module unload")
bd61848900bf ("net: devmem: Implement TX path")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sch_frag_data_storage is a per-CPU variable and relies on disabled BH
for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.
Add local_lock_t to the struct and use local_lock_nested_bh() for locking.
This change adds only lockdep coverage and does not alter the functional
behaviour for !PREEMPT_RT.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20250512092736.229935-12-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
mirred_nest_level is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
Move mirred_nest_level to struct netdev_xmit as u8, provide wrappers.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20250512092736.229935-11-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Previously, when reducing a qdisc's limit via the ->change() operation, only
the main skb queue was trimmed, potentially leaving packets in the gso_skb
list. This could result in NULL pointer dereference when we only check
sch->limit against sch->q.qlen.
This patch introduces a new helper, qdisc_dequeue_internal(), which ensures
both the gso_skb list and the main queue are properly flushed when trimming
excess packets. All relevant qdiscs (codel, fq, fq_codel, fq_pie, hhf, pie)
are updated to use this helper in their ->change() routines.
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Fixes: d4b36210c2e6 ("net: pkt_sched: PIE AQM scheme")
Reported-by: Will <willsroot@protonmail.com>
Reported-by: Savy <savy@syst3mfailure.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|