Age | Commit message (Collapse) | Author |
|
Add sriov_configure for driver ops.
Add mailbox handler wx_msg_task for txgbe.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/ECDC57CF4F2316B9+20250408091556.9640-7-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add sriov_configure for driver ops.
Add mailbox handler wx_msg_task for ngbe in
the interrupt handler.
Add the notification flow when the vfs exist.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/C9A0A43732966022+20250408091556.9640-6-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement wx_msg_task which is used to process mailbox
messages sent by vf.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/8257B39B95CDB469+20250408091556.9640-5-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reallocate queue and int resources when sriov is enabled.
Redefine macro VMDQ to make it work in VT mode.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/64B616774ABE3C5A+20250408091556.9640-4-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement sriov_configure interface for wangxun nics in libwx.
Enable VT mode and initialize vf control structure, when sriov
is enabled. Do not be allowed to disable sriov when vfs are
assigned.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/81EA45C21B0A98B0+20250408091556.9640-3-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implements the mailbox interfaces for wangxun pf drivers
ngbe and txgbe.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Link: https://patch.msgid.link/70017BD4D67614A4+20250408091556.9640-2-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It is desireable to push the hardware accelerator to also
process non-segmented TCP frames: we pass the skb->len
to the "TOE/TSO" offloader and it will handle them.
Without this quirk the driver becomes unstable and lock
up and and crash.
I do not know exactly why, but it is probably due to the
TOE (TCP offload engine) feature that is coupled with the
segmentation feature - it is not possible to turn one
part off and not the other, either both TOE and TSO are
active, or neither of them.
Not having the TOE part active seems detrimental, as if
that hardware feature is not really supposed to be turned
off.
The datasheet says:
"Based on packet parsing and TCP connection/NAT table
lookup results, the NetEngine puts the packets
belonging to the same TCP connection to the same queue
for the software to process. The NetEngine puts
incoming packets to the buffer or series of buffers
for a jumbo packet. With this hardware acceleration,
IP/TCP header parsing, checksum validation and
connection lookup are offloaded from the software
processing."
After numerous tests with the hardware locking up after
something between minutes and hours depending on load
using iperf3 I have concluded this is necessary to stabilize
the hardware.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20250408-gemini-ethernet-tso-always-v1-1-e669f932359c@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'bridge-prevent-unicast-arp-ns-packets-from-being-suppressed-by-bridge'
Amit Cohen says:
====================
bridge: Prevent unicast ARP/NS packets from being suppressed by bridge
Currently, unicast ARP requests/NS packets are replied by bridge when
suppression is enabled, then they are also forwarded, which results two
replicas of ARP reply/NA - one from the bridge and second from the target.
The purpose of ARP/ND suppression is to reduce flooding in the broadcast
domain, which is not relevant for unicast packets. In addition, the use
case of unicast ARP/NS is to poll a specific host, so it does not make
sense to have the switch answer on behalf of the host.
Forward ARP requests/NS packets and prevent the bridge from replying to
them.
Patch set overview:
Patch #1 prevents unicast ARP/NS packets from being suppressed by bridge
Patch #2 adds test cases for unicast ARP/NS with suppression enabled
====================
Link: https://patch.msgid.link/cover.1744123493.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add test cases to check that unicast ARP/NS packets are replied once, even
if ARP/ND suppression is enabled.
Without the previous patch:
$ ./test_bridge_neigh_suppress.sh
...
Unicast ARP, per-port ARP suppression - VLAN 10
-----------------------------------------------
TEST: "neigh_suppress" is on [ OK ]
TEST: Unicast ARP, suppression on, h1 filter [FAIL]
TEST: Unicast ARP, suppression on, h2 filter [ OK ]
Unicast ARP, per-port ARP suppression - VLAN 20
-----------------------------------------------
TEST: "neigh_suppress" is on [ OK ]
TEST: Unicast ARP, suppression on, h1 filter [FAIL]
TEST: Unicast ARP, suppression on, h2 filter [ OK ]
...
Unicast NS, per-port NS suppression - VLAN 10
---------------------------------------------
TEST: "neigh_suppress" is on [ OK ]
TEST: Unicast NS, suppression on, h1 filter [FAIL]
TEST: Unicast NS, suppression on, h2 filter [ OK ]
Unicast NS, per-port NS suppression - VLAN 20
---------------------------------------------
TEST: "neigh_suppress" is on [ OK ]
TEST: Unicast NS, suppression on, h1 filter [FAIL]
TEST: Unicast NS, suppression on, h2 filter [ OK ]
...
Tests passed: 156
Tests failed: 4
With the previous patch:
$ ./test_bridge_neigh_suppress.sh
...
Unicast ARP, per-port ARP suppression - VLAN 10
-----------------------------------------------
TEST: "neigh_suppress" is on [ OK ]
TEST: Unicast ARP, suppression on, h1 filter [ OK ]
TEST: Unicast ARP, suppression on, h2 filter [ OK ]
Unicast ARP, per-port ARP suppression - VLAN 20
-----------------------------------------------
TEST: "neigh_suppress" is on [ OK ]
TEST: Unicast ARP, suppression on, h1 filter [ OK ]
TEST: Unicast ARP, suppression on, h2 filter [ OK ]
...
Unicast NS, per-port NS suppression - VLAN 10
---------------------------------------------
TEST: "neigh_suppress" is on [ OK ]
TEST: Unicast NS, suppression on, h1 filter [ OK ]
TEST: Unicast NS, suppression on, h2 filter [ OK ]
Unicast NS, per-port NS suppression - VLAN 20
---------------------------------------------
TEST: "neigh_suppress" is on [ OK ]
TEST: Unicast NS, suppression on, h1 filter [ OK ]
TEST: Unicast NS, suppression on, h2 filter [ OK ]
...
Tests passed: 160
Tests failed: 0
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/dc240b9649b31278295189f412223f320432c5f2.1744123493.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When Proxy ARP or ARP/ND suppression are enabled, ARP/NS packets can be
handled by bridge in br_do_proxy_suppress_arp()/br_do_suppress_nd().
For broadcast packets, they are replied by bridge, but later they are not
flooded. Currently, unicast packets are replied by bridge when suppression
is enabled, and they are also forwarded, which results two replicas of
ARP reply/NA - one from the bridge and second from the target.
RFC 1122 describes use case for unicat ARP packets - "unicast poll" -
actively poll the remote host by periodically sending a point-to-point ARP
request to it, and delete the entry if no ARP reply is received from N
successive polls.
The purpose of ARP/ND suppression is to reduce flooding in the broadcast
domain. If a host is sending a unicast ARP/NS, then it means it already
knows the address and the switches probably know it as well and there
will not be any flooding.
In addition, the use case of unicast ARP/NS is to poll a specific host,
so it does not make sense to have the switch answer on behalf of the host.
According to RFC 9161:
"A PE SHOULD reply to broadcast/multicast address resolution messages,
i.e., ARP Requests, ARP probes, NS messages, as well as DAD NS messages.
An ARP probe is an ARP Request constructed with an all-zero sender IP
address that may be used by hosts for IPv4 Address Conflict Detection as
specified in [RFC5227]. A PE SHOULD NOT reply to unicast address resolution
requests (for instance, NUD NS messages)."
Forward such requests and prevent the bridge from replying to them.
Reported-by: Denis Yulevych <denisyu@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/6bf745a149ddfe5e6be8da684a63aa574a326f8d.1744123493.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
txq_trans_update() currently uses txq->xmit_lock_owner
to conditionally update txq->trans_start.
For regular devices, txq->xmit_lock_owner is updated
from HARD_TX_LOCK() and HARD_TX_UNLOCK(), and this apparently
causes cpu stalls.
Using dev->lltx, which sits in a read-mostly cache-line,
and already used in HARD_TX_LOCK() and HARD_TX_UNLOCK()
helps cpu prediction.
On an AMD EPYC 7B12 dual socket server, tcp_rr with 128 threads
and 30,000 flows gets a 5 % increase in throughput.
As explained in commit 95ecba62e2fd ("net: fix races in
netdev_tx_sent_queue()/dev_watchdog()") I am planning
to no longer update txq->trans_start in the fast path
in a followup patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250408202742.2145516-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The cn10k_free_matchall_ipolicer() calls the cn10k_map_unmap_rq_policer()
for each queue in a for loop without checking for any errors.
Check the return value of the cn10k_map_unmap_rq_policer() function during
each loop, and report a warning if the function fails.
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250408032602.2909-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Recent change [0] resulted in a "BUG: using __this_cpu_read() in
preemptible" splat [1]. PREEMPT kernels have additional requirements
on what can and can not run with/without preemption enabled.
Expose those constrains in the debug kernels.
0: https://lore.kernel.org/netdev/20250314120048.12569-2-justin.iurman@uliege.be/
1: https://lore.kernel.org/netdev/20250402094458.006ba2a7@kernel.org/T/#mbf72641e9d7d274daee9003ef5edf6833201f1bc
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250402172305.1775226-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The __get_unaligned_cpu32 function is deprecated. So, replace it with
the more generic get_unaligned and just cast the input parameter.
Signed-off-by: Julian Vetter <julian@outer-limits.org>
Link: https://patch.msgid.link/20250408091946.2266271-1-julian@outer-limits.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The __get_unaligned_cpu32 function is deprecated. So, replace it with
the more generic get_unaligned and just cast the input parameter.
Signed-off-by: Julian Vetter <julian@outer-limits.org>
Link: https://patch.msgid.link/20250408091548.2263911-1-julian@outer-limits.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
net: depend on instance lock for queue related netlink ops
netdev-genl used to be protected by rtnl_lock. In previous release
we already switched the queue management ops (for Rx zero-copy) to
the instance lock. This series converts other ops to depend on the
instance lock when possible.
Unfortunately queue related state is hard to lock (unlike NAPI)
as the process of switching the number of queues usually involves
a large reconfiguration of the driver. The reconfig process has
historically been under rtnl_lock, but for drivers which opt into
ops locking it is also under the instance lock. Leverage that
and conditionally take rtnl_lock or instance lock depending
on the device capabilities.
v1: https://lore.kernel.org/20250407190117.16528-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250408195956.412733-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We mostly needed rtnl_lock in qstat to make sure the queue count
is stable while we work. For "ops locked" drivers the instance
lock protects the queue count, so we don't have to take rtnl_lock.
For currently ops-locked drivers: netdevsim and bnxt need
the protection from netdev going down while we dump, which
instance lock provides. gve doesn't care.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250408195956.412733-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Explicitly list all the ops structs and what locking they provide.
Use "ops locked" as a term for drivers which have ops called under
the instance lock.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250408195956.412733-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Writes to XDP features are now protected by netdev->lock.
Other things we report are based on ops which don't change
once device has been registered. It is safe to stop taking
rtnl_lock, and depend on netdev->lock instead.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250408195956.412733-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Protect xdp_features with netdev->lock. This way pure readers
no longer have to take rtnl_lock to access the field.
This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
Looks like that's fine for bonding, the only "real" listener,
it's the same as ethtool feature change.
In terms of normal drivers - only GVE need special consideration
(other drivers don't use instance lock or don't support XDP).
It calls xdp_set_features_flag() helper from gve_init_priv() which
in turn is called from gve_reset_recovery() (locked), or prior
to netdev registration. So switch to _locked.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250408195956.412733-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Netdev queue dump accesses: NAPI, memory providers, XSk pointers.
All three are "ops protected" now, switch to the op compat locking.
rtnl lock does not have to be taken for "ops locked" devices.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250408195956.412733-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add helpers to "lock a netdev in a backward-compatible way",
which for ops-locked netdevs will mean take the instance lock.
For drivers which haven't opted into the ops locking we'll take
rtnl_lock.
The scoped foreach is dropping and re-taking the lock for each
device, even if prev and next are both under rtnl_lock.
I hope that's fine since we expect that netdev nl to be mostly
supported by modern drivers, and modern drivers should also
opt into the instance locking.
Note that these helpers are mostly needed for queue related state,
because drivers modify queue config in their ops in a non-atomic
way. Or differently put, queue changes don't have a clear-cut API
like NAPI configuration. Any state that can should just use the
instance lock directly, not the "compat" hacks.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250408195956.412733-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Read accesses go via xsk_get_pool_from_qid(), the call coming
from the core and gve look safe (other "ops locked" drivers
don't support XSK).
Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid().
Former is already under the ops lock, latter is not (both coming from
the workqueue via xp_clear_dev() and NETDEV_UNREGISTER via xsk_notifier()).
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250408195956.412733-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netdev_get_by_index_lock() performs following steps:
rcu_lock();
dev = lookup(netns, ifindex);
dev_get(dev);
rcu_unlock();
[... lock & validate the dev ...]
return dev
Validation right now only checks if the device is registered but since
the lookup is netns-aware we must also protect against the device
switching netns right after we dropped the RCU lock. Otherwise
the caller in netns1 may get a pointer to a device which has just
switched to netns2.
We can't hold the lock for the entire netns change process (because of
the NETDEV_UNREGISTER notifier), and there's no existing marking to
indicate that the netns is unlisted because of netns move, so add one.
AFAIU none of the existing netdev_get_by_index_lock() callers can
suffer from this problem (NAPI code double checks the netns membership
and other callers are either under rtnl_lock or not ns-sensitive),
so this patch does not have to be treated as a fix.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250408195956.412733-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
__skb_try_recv_from_queue() deals with a queue, @sk is not used
since commit e427cad6eee4 ("net: datagram: drop 'destructor'
argument from several helpers"). Remove sk from function parameters,
adapt callers.
No functional change intended.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250407-cleanup-drop-param-sk-v1-1-cd076979afac@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Paolo Abeni says:
====================
udp_tunnel: GRO optimizations
The UDP tunnel GRO stage is source of measurable overhead for workload
based on UDP-encapsulated traffic: each incoming packets requires a full
UDP socket lookup and an indirect call.
In the most common setups a single UDP tunnel device is used. In such
case we can optimize both the lookup and the indirect call.
Patch 1 tracks per netns the active UDP tunnels and replaces the socket
lookup with a single destination port comparison when possible.
Patch 2 tracks the different types of UDP tunnels and replaces the
indirect call with a static one when there is a single UDP tunnel type
active.
I measure ~10% performance improvement in TCP over UDP tunnel stream
tests on top of this series.
v4: https://lore.kernel.org/cover.1741718157.git.pabeni@redhat.com
v3: https://lore.kernel.org/cover.1741632298.git.pabeni@redhat.com
v2: https://lore.kernel.org/cover.1741338765.git.pabeni@redhat.com
v1: https://lore.kernel.org/cover.1741275846.git.pabeni@redhat.com
====================
Link: https://patch.msgid.link/cover.1744040675.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It's quite common to have a single UDP tunnel type active in the
whole system. In such a case we can replace the indirect call for
the UDP tunnel GRO callback with a static call.
Add the related accounting in the control path and switch to static
call when possible. To keep the code simple use a static array for
the registered tunnel types, and size such array based on the kernel
config.
Note that there are valid kernel configurations leading to
UDP_MAX_TUNNEL_TYPES == 0 even with IS_ENABLED(CONFIG_NET_UDP_TUNNEL),
Explicitly skip the accounting in such a case, to avoid compile warning
when accessing "udp_tunnel_gro_types".
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/53d156cdfddcc9678449e873cc83e68fa1582653.1744040675.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Most UDP tunnels bind a socket to a local port, with ANY address, no
peer and no interface index specified.
Additionally it's quite common to have a single tunnel device per
namespace.
Track in each namespace the UDP tunnel socket respecting the above.
When only a single one is present, store a reference in the netns.
When such reference is not NULL, UDP tunnel GRO lookup just need to
match the incoming packet destination port vs the socket local port.
The tunnel socket never sets the reuse[port] flag[s]. When bound to no
address and interface, no other socket can exist in the same netns
matching the specified local port.
Matching packets with non-local destination addresses will be
aggregated, and eventually segmented as needed - no behavior changes
intended.
Restrict the optimization to kernel sockets only: it covers all the
relevant use-cases, and user-space owned sockets could be disconnected
and rebound after setup_udp_tunnel_sock(), breaking the uniqueness
assumption
Note that the UDP tunnel socket reference is stored into struct
netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep
all the fastpath-related netns fields in the same struct and allow
cacheline-based optimization. Currently both the IPv4 and IPv6 socket
pointer share the same cacheline as the `udp_table` field.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/41d16bc8d1257d567f9344c445b4ae0b4a91ede4.1744040675.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Recently we had some issues in parallel TDC where some of IFE tests are
failing due to some of IFE's submodules (like act_meta_skbtcindex and
act_meta_skbprio) taking too long to load [1]. To avoid that issue,
pre-load IFE and all its submodules before running any of the tests in
tdc.sh
[1] https://lore.kernel.org/netdev/e909b2a0-244e-4141-9fa9-1b7d96ab7d71@mojatatu.com/T/#u
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250407215656.2535990-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Let's pass the queue index to netif_napi_add_config() to preserve
per-NAPI config.
Test:
Set 100 to defer-hard-irqs (default is 0) and check the value after
link down & up.
$ cat /sys/class/net/enp39s0/napi_defer_hard_irqs
0
$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 65,
'ifindex': 2,
'irq': 29,
'irq-suspend-timeout': 0}]
$ sudo ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 65, "defer-hard-irqs": 100}'
$ sudo ip link set enp39s0 down && sudo ip link set enp39s0 up
Without patch:
$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'defer-hard-irqs': 0, <------------------- Reset to 0
'gro-flush-timeout': 0,
'id': 66, <------------------------------- New ID
'ifindex': 2,
'irq': 29,
'irq-suspend-timeout': 0}]
With patch:
$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'defer-hard-irqs': 100, <--------------+-- Preserved
'gro-flush-timeout': 0, |
'id': 65, <----------------------------'
'ifindex': 2,
'irq': 29,
'irq-suspend-timeout': 0}]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://patch.msgid.link/20250407164802.25184-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
rps: misc changes
Minor changes in rps:
skb_flow_limit() is probably unused these days,
and data-races are quite theoretical.
====================
Link: https://patch.msgid.link/20250407163602.170356-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an rcu_head to sd_flow_limit and rps_sock_flow_table structs
to use the more conventional and predictable k[v]free_rcu().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250407163602.170356-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
softnet_seq_show() reads several fields that might be updated
concurrently. Add READ_ONCE() and WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250407163602.170356-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
softnet_seq_show() can read fl->count while another cpu
updates this field from skb_flow_limit().
Make this field an 'unsigned int', as its only consumer
only deals with 32 bit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250407163602.170356-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As explained in commit f3483c8e1da6 ("net: rfs: hash function change"),
masking low order bits of skb_get_hash(skb) has low entropy.
A NIC with 32 RX queues uses the 5 low order bits of rss key
to select a queue. This means all packets landing to a given
queue share the same 5 low order bits.
Switch to hash_32() to reduce hash collisions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250407163602.170356-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use SPDX-License-Identifier accross all the files of the xgbe driver to
ensure compliance with Linux kernel standards, thus removing the
boiler-plate template license text.
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250407102913.3063691-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Remove the double negation and simplify the if condition.
No functional changes intended.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250407091442.743478-1-thorsten.blum@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The __get_unaligned_cpu32 function is deprecated. So, replace it with
the more generic get_unaligned and just cast the input parameter.
Signed-off-by: Julian Vetter <julian@outer-limits.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250407083306.1553921-1-julian@outer-limits.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If the destination buffer has a fixed length, strscpy_pad()
automatically determines its size using sizeof() when the argument is
omitted. This makes the explicit sizeof() calls unnecessary - remove
them.
No functional changes intended.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250407082607.741919-2-thorsten.blum@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
Current release - regressions:
- four fixes for the netdev per-instance locking
Current release - new code bugs:
- consolidate more code between existing Rx zero-copy and uring so
that the latter doesn't miss / have to duplicate the safety checks
Previous releases - regressions:
- ipv6: fix omitted Netlink attributes when using SKIP_STATS
Previous releases - always broken:
- net: fix geneve_opt length integer overflow
- udp: fix multiple wrap arounds of sk->sk_rmem_alloc when it
approaches INT_MAX
- dsa: mvpp2: add a lock to avoid corruption of the shared TCAM
- dsa: airoha: fix issues with traffic QoS configuration / offload,
and flow table offload
Misc:
- touch up the Netlink YAML specs of old families to make them usable
for user space C codegen"
* tag 'net-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
selftests: net: amt: indicate progress in the stress test
netlink: specs: rt_route: pull the ifa- prefix out of the names
netlink: specs: rt_addr: pull the ifa- prefix out of the names
netlink: specs: rt_addr: fix get multi command name
netlink: specs: rt_addr: fix the spec format / schema failures
net: avoid false positive warnings in __net_mp_close_rxq()
net: move mp dev config validation to __net_mp_open_rxq()
net: ibmveth: make veth_pool_store stop hanging
arcnet: Add NULL check in com20020pci_probe()
ipv6: Do not consider link down nexthops in path selection
ipv6: Start path selection from the first nexthop
usbnet:fix NPE during rx_complete
net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROP
net: fix geneve_opt length integer overflow
io_uring/zcrx: fix selftests w/ updated netdev Python helpers
selftests: net: use netdevsim in netns test
docs: net: document netdev notifier expectations
net: dummy: request ops lock
netdevsim: add dummy device notifiers
net: rename rtnl_net_debug to lock_debug
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small collection of fixes that came in during the merge window,
everything is driver specific with nothing standing out particularly"
* tag 'spi-fix-v6.15-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: bcm2835: Restore native CS probing when pinctrl-bcm2835 is absent
spi: bcm2835: Do not call gpiod_put() on invalid descriptor
spi: cadence-qspi: revert "Improve spi memory performance"
spi: cadence: Fix out-of-bounds array access in cdns_mrvl_xspi_setup_clock()
spi: fsl-qspi: use devm function instead of driver remove
spi: SPI_QPIC_SNAND should be tristate and depend on MTD
spi-rockchip: Fix register out of bounds access
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull more SoC driver updates from Arnd Bergmann:
"This is the promised follow-up to the soc drivers branch, adding minor
updates to omap and freescale drivers.
Most notably, Ioana Ciornei takes over maintenance of the DPAA bus
driver used in some NXP (originally Freescale) chips"
* tag 'soc-drivers-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
bus: fsl-mc: Remove deadcode
MAINTAINERS: add the linuppc-dev list to the fsl-mc bus entry
MAINTAINERS: fix nonexistent dtbinding file name
MAINTAINERS: add myself as maintainer for the fsl-mc bus
irqdomain: soc: Switch to irq_find_mapping()
Input: tsc2007 - accept standard properties
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- thinkpad_acpi:
- Fix NULL pointer dereferences while probing
- Disable ACPI fan access for T495* and E560
- ISST: Correct command storage data length
* tag 'platform-drivers-x86-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
MAINTAINERS: consistently use my dedicated email address
platform/x86: ISST: Correct command storage data length
platform/x86: thinkpad_acpi: disable ACPI fan access for T495* and E560
platform/x86: thinkpad_acpi: Fix NULL pointer dereferences while probing
|
|
Our CI expects output from the test at least once every 10 minutes.
The AMT test when running on debug kernel is just on the edge
of that time for the stress test. Improve the output:
- print the name of the test first, before starting it,
- output a dot every 10% of the way.
Output after:
TEST: amt discovery [ OK ]
TEST: IPv4 amt multicast forwarding [ OK ]
TEST: IPv6 amt multicast forwarding [ OK ]
TEST: IPv4 amt traffic forwarding torture .......... [ OK ]
TEST: IPv6 amt traffic forwarding torture .......... [ OK ]
Reviewed-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250403145636.2891166-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
netlink: specs: rt_addr: fix problems revealed by C codegen
I put together basic YNL C support for classic netlink. This revealed
a few problems in the rt_addr spec.
v1: https://lore.kernel.org/20250401012939.2116915-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250403013706.2828322-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
YAML specs don't normally include the C prefix name in the name
of the YAML attr. Remove the ifa- prefix from all attributes
in route-attrs and metrics and specify name-prefix instead.
This is a bit risky, hopefully there aren't many users out there.
Fixes: 023289b4f582 ("doc/netlink: Add spec for rt route messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
YAML specs don't normally include the C prefix name in the name
of the YAML attr. Remove the ifa- prefix from all attributes
in addr-attrs and specify name-prefix instead.
This is a bit risky, hopefully there aren't many users out there.
Fixes: dfb0f7d9d979 ("doc/netlink: Add spec for rt addr messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Command names should match C defines, codegens may depend on it.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The spec is mis-formatted, schema validation says:
Failed validating 'type' in schema['properties']['operations']['properties']['list']['items']['properties']['dump']['properties']['request']['properties']['value']:
{'minimum': 0, 'type': 'integer'}
On instance['operations']['list'][3]['dump']['request']['value']:
'58 - ifa-family'
The ifa-family clearly wants to be part of an attribute list.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Yuyang Huang <yuyanghuang@google.com>
Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support")
Link: https://patch.msgid.link/20250403013706.2828322-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
net: make memory provider install / close paths more common
We seem to be fixing bugs in config path for devmem which also exist
in the io_uring ZC path. Let's try to make the two paths more common,
otherwise this is bound to keep happening.
Found by code inspection and compile tested only.
v1: https://lore.kernel.org/20250331194201.2026422-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250403013405.2827250-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|