Age | Commit message (Collapse) | Author |
|
- minor code cleanup (Colin Ian King)
|
|
- SteelSeries Arctis 9 support (Christian Mayer)
|
|
- support for MD/GEN 6B controller (Ryan McClelland)
|
|
- improved support for Thinkpad-X12-TAB-1/2 (Vishnu Sankar)
|
|
- newly added support for Intel Touch Host Controller (Even Xu, Xinpeng Sun)
|
|
- dead code removal in intel-ish-hid driver (Dr. David Alan Gilbert)
|
|
- hid-core fix for long-standing cornercase of Resolution Multiplier not being
present in any of the Logical Collections in the device HID report descriptor
(Alan Stern)
|
|
- constification of 'struct bin_attribute' in various HID driver (Thomas Weißschuh)
|
|
This merges the vsnprintf internal cleanups I did, which were triggered
by a combination of performance issues (see for example commit
f9ed1f7c2e26: "genirq/proc: Use seq_put_decimal_ull_width() for decimal
values") and discussion about tracing abusing the vsnprintf code in odd
ways.
The intent was to improve code generation, but also to possibly
eventually expose the cleaned-up printf format decoding state machine.
It certainly didn't get to the point where we'd want to expose the
format decoding to external users, but it's an improvement over what we
used to have. Several of the complex case statements have been
simplified, or removed entirely to be replaced by simple table lookups.
* branch 'vsnprintf':
vsnprintf: fix the number base for non-numeric formats
vsnprintf: fix up kerneldoc for argument name changes
vsprintf: don't make the 'binary' version pack small integer arguments
vsnprintf: collapse the number format state into one single state
vsnprintf: mark the indirect width and precision cases unlikely
vsnprintf: inline skip_atoi() again
vsprintf: deal with format specifiers with a lookup table
vsprintf: deal with format flags with a simple lookup table
vsprintf: associate the format state with the format pointer
vsprintf: fix calling convention for format_decode()
vsprintf: avoid nested switch statement on same variable
vsprintf: simplify number handling
|
|
To ensure that resources such as OPP tables or OPP nodes are not freed
while in use by the Rust implementation, it is necessary to increment
their reference count from Rust code.
This commit introduces a new helper function,
dev_pm_opp_get_opp_table_ref(), to increment the reference count of an
OPP table and declares the existing helper dev_pm_opp_get() in pm_opp.h.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Mark serialize() noinstr so that it can be used from instrumentation-
free code
- Make sure FRED's RSP0 MSR is synchronized with its corresponding
per-CPU value in order to avoid double faults in hotplug scenarios
- Disable EXECMEM_ROX on x86 for now because it didn't receive proper
x86 maintainers review, went in and broke a bunch of things
* tag 'x86_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Make serialize() always_inline
x86/fred: Fix the FRED RSP0 MSR out of sync with its per-CPU cache
x86: Disable EXECMEM_ROX support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Borislav Petkov:
- Reset hrtimers correctly when a CPU hotplug state traversal happens
"half-ways" and leaves hrtimers not (re-)initialized properly
- Annotate accesses to a timer group's ignore flag to prevent KCSAN
from raising data_race warnings
- Make sure timer group initialization is visible to timer tree walkers
and avoid a hypothetical race
- Fix another race between CPU hotplug and idle entry/exit where timers
on a fully idle system are getting ignored
- Fix a case where an ignored signal is still being handled which it
shouldn't be
* tag 'timers_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimers: Handle CPU state correctly on hotplug
timers/migration: Annotate accesses to ignore flag
timers/migration: Enforce group initialization visibility to tree walkers
timers/migration: Fix another race between hotplug and idle entry/exit
signal/posixtimers: Handle ignore/blocked sequences correctly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix an OF node leak in irqchip init's error handling path
- Fix sunxi systems to wake up from suspend with an NMI by
pressing the power button
- Do not spuriously enable interrupts in gic-v3 in a nested
interrupts-off section
- Make sure gic-v3 handles properly a failure to enter a
low power state
* tag 'irq_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Plug a OF node reference leak in platform_irqchip_probe()
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()
irqchip/gic-v3: Handle CPU_PM_ENTER_FAILED correctly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Do not adjust the weight of empty group entities and avoid
scheduling artifacts
- Avoid scheduling lag by computing lag properly and thus address
an EEVDF entity placement issue
* tag 'sched_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE
sched/fair: Fix EEVDF entity placement bug causing scheduling lag
|
|
tcp rst/fin packet triggers an immediate teardown of the flow which
results in sending flows back to the classic forwarding path.
This behaviour was introduced by:
da5984e51063 ("netfilter: nf_flow_table: add support for sending flows back to the slow path")
b6f27d322a0a ("netfilter: nf_flow_table: tear down TCP flows if RST or FIN was seen")
whose goal is to expedite removal of flow entries from the hardware
table. Before these patches, the flow was released after the flow entry
timed out.
However, this approach leads to packet races when restoring the
conntrack state as well as late flow re-offload situations when the TCP
connection is ending.
This patch adds a new CLOSING state that is is entered when tcp rst/fin
packet is seen. This allows for an early removal of the flow entry from
the hardware table. But the flow entry still remains in software, so tcp
packets to shut down the flow are not sent back to slow path.
If syn packet is seen from this new CLOSING state, then this flow enters
teardown state, ct state is set to TCP_CONNTRACK_CLOSE state and packet
is sent to slow path, so this TCP reopen scenario can be handled by
conntrack. TCP_CONNTRACK_CLOSE provides a small timeout that aims at
quickly releasing this stale entry from the conntrack table.
Moreover, skip hardware re-offload from flowtable software packet if the
flow is in CLOSING state.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Tear down the flow entry in the unlikely case that the interface mtu
changes, this gives the flow a chance to refresh the cached mtu,
otherwise such refresh does not occur until flow entry expires.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Offload nf_conn entries may not see traffic for a very long time.
To prevent incorrect 'ct is stale' checks during nf_conntrack table
lookup, the gc worker extends the timeout nf_conn entries marked for
offload to a large value.
The existing logic suffers from a few problems.
Garbage collection runs without locks, its unlikely but possible
that @ct is removed right after the 'offload' bit test.
In that case, the timeout of a new/reallocated nf_conn entry will
be increased.
Prevent this by obtaining a reference count on the ct object and
re-check of the confirmed and offload bits.
If those are not set, the ct is being removed, skip the timeout
extension in this case.
Parallel teardown is also problematic:
cpu1 cpu2
gc_worker
calls flow_offload_teardown()
tests OFFLOAD bit, set
clear OFFLOAD bit
ct->timeout is repaired (e.g. set to timeout[UDP_CT_REPLIED])
nf_ct_offload_timeout() called
expire value is fetched
<INTERRUPT>
-> NF_CT_DAY timeout for flow that isn't offloaded
(and might not see any further packets).
Use cmpxchg: if ct->timeout was repaired after the 2nd 'offload bit' test
passed, then ct->timeout will only be updated of ct->timeout was not
altered in between.
As we already have a gc worker for flowtable entries, ct->timeout repair
can be handled from the flowtable gc worker.
This avoids having flowtable specific logic in the conntrack core
and avoids checking entries that were never offloaded.
This allows to remove the nf_ct_offload_timeout helper.
Its safe to use in the add case, but not on teardown.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Its not used (and could be NULL), so remove it.
This allows to use nf_ct_refresh in places where we don't have
an skb without having to double-check that skb == NULL would be safe.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The conntrack entry is already public, there is a small chance that another
CPU is handling a packet in reply direction and racing with the tcp state
update.
Move this under ct spinlock.
This is done once, when ct is about to be offloaded, so this should
not result in a noticeable performance hit.
Fixes: 8437a6209f76 ("netfilter: nft_flow_offload: set liberal tracking mode for tcp")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This state reset is racy, no locks are held here.
Since commit
8437a6209f76 ("netfilter: nft_flow_offload: set liberal tracking mode for tcp"),
the window checks are disabled for normal data packets, but MAXACK flag
is checked when validating TCP resets.
Clear the flag so tcp reset validation checks are ignored.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
With conditional chain deletion gone, callback code simplifies: Instead
of filling an nft_ctx object, just pass basechain to the per-chain
function. Also plain list_for_each_entry() is safe now.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Do not drop a netdev-family chain if the last interface it is registered
for vanishes. Users dumping and storing the ruleset upon shutdown to
restore it upon next boot may otherwise lose the chain and all contained
rules. They will still lose the list of devices, a later patch will fix
that. For now, this aligns the event handler's behaviour with that for
flowtables.
The controversal situation at netns exit should be no problem here:
event handler will unregister the hooks, core nftables cleanup code will
drop the chain itself.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The 1:1 relationship between nft_hook and nf_hook_ops is about to break,
so choose the stored ifname to uniquely identify hooks.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The stored ifname and ops.dev->name may deviate after creation due to
interface name changes. Prefer the more deterministic stored name in
dumps which also helps avoiding inadvertent changes to stored ruleset
dumps.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Prepare for hooks with NULL ops.dev pointer (due to non-existent device)
and store the interface name and length as specified by the user upon
creation. No functional change intended.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When checking for duplicate hooks in nft_register_flowtable_net_hooks(),
comparing ops.pf value is pointless as it is always NFPROTO_NETDEV with
flowtable hooks.
Dropping the check leaves the search identical to the one in
nft_hook_list_find() so call that function instead of open coding.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The SKB_DROP_REASON_IP_INADDRERRORS drop reason is never returned from
any function, as such it cannot be returned from the ip_route_input call
tree. The 'reason != SKB_DROP_REASON_IP_INADDRERRORS' conditional is
thus always true.
Looking back at history, commit 50038bf38e65 ("net: ip: make
ip_route_input() return drop reasons") changed the ip_route_input
returned value check in br_nf_pre_routing_finish from -EHOSTUNREACH to
SKB_DROP_REASON_IP_INADDRERRORS. It turns out -EHOSTUNREACH could not be
returned either from the ip_route_input call tree and this since commit
251da4130115 ("ipv4: Cache ip_error() routes even when not
forwarding.").
Not a fix as this won't change the behavior. While at it use
kfree_skb_reason.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The existing rbtree implementation uses singleton elements to represent
ranges, however, userspace provides a set size according to the number
of ranges in the set.
Adjust provided userspace set size to the number of singleton elements
in the kernel by multiplying the range by two.
Check if the no-match all-zero element is already in the set, in such
case release one slot in the set size.
Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Output of io_uring_show_fdinfo() has several problems:
* racy use of ->d_iname
* junk if the name is long - in that case it's not stored in ->d_iname
at all
* lack of quoting (names can contain newlines, etc. - or be equal to "<none>",
for that matter).
* lines for empty slots are pointless noise - we already have the total
amount, so having just the non-empty ones would carry the same information.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Reorder includes for distributed-arp-table.c, by Sven Eckelmann
- Fix translation table change handling, by Remi Pommarel (2 patches)
- Map VID 0 to untagged TT VLAN, by Sven Eckelmann
- Update MAINTAINERS/mailmap e-mail addresses, by the respective authors
(4 patches)
- netlink: reduce duplicate code by returning interfaces,
by Linus Lüssing
* tag 'batadv-next-pullrequest-20250117' of git://git.open-mesh.org/linux-merge:
batman-adv: netlink: reduce duplicate code by returning interfaces
MAINTAINERS: mailmap: add entries for Antonio Quartulli
mailmap: add entries for Sven Eckelmann
mailmap: add entries for Simon Wunderlich
MAINTAINERS: update email address of Marek Linder
batman-adv: Map VID 0 to untagged TT VLAN
batman-adv: Don't keep redundant TT change events
batman-adv: Remove atomic usage for tt.local_changes
batman-adv: Reorder includes for distributed-arp-table.c
batman-adv: Start new development cycle
====================
Link: https://patch.msgid.link/20250117123910.219278-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- btusb: Add new VID/PID 13d3/3610 for MT7922
- btusb: Add new VID/PID 13d3/3628 for MT7925
- btusb: Add MT7921e device 13d3:3576
- btusb: Add RTL8851BE device 13d3:3600
- btusb: Add ID 0x2c7c:0x0130 for Qualcomm WCN785x
- btusb: add sysfs attribute to control USB alt setting
- qca: Expand firmware-name property
- qca: Fix poor RF performance for WCN6855
- L2CAP: handle NULL sock pointer in l2cap_sock_alloc
- Allow reset via sysfs
- ISO: Allow BIG re-sync
- dt-bindings: Utilize PMU abstraction for WCN6750
- MGMT: Mark LL Privacy as stable
* tag 'for-net-next-2025-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (23 commits)
Bluetooth: MGMT: Fix slab-use-after-free Read in mgmt_remove_adv_monitor_sync
Bluetooth: qca: Fix poor RF performance for WCN6855
Bluetooth: Allow reset via sysfs
Bluetooth: Get rid of cmd_timeout and use the reset callback
Bluetooth: Remove the cmd timeout count in btusb
Bluetooth: Use str_enable_disable-like helpers
Bluetooth: btmtk: Remove resetting mt7921 before downloading the fw
Bluetooth: L2CAP: handle NULL sock pointer in l2cap_sock_alloc
Bluetooth: btusb: Add RTL8851BE device 13d3:3600
dt-bindings: bluetooth: Utilize PMU abstraction for WCN6750
Bluetooth: btusb: Add MT7921e device 13d3:3576
Bluetooth: btrtl: check for NULL in btrtl_setup_realtek()
Bluetooth: btbcm: Fix NULL deref in btbcm_get_board_name()
Bluetooth: qca: Expand firmware-name to load specific rampatch
Bluetooth: qca: Update firmware-name to support board specific nvm
dt-bindings: net: bluetooth: qca: Expand firmware-name property
Bluetooth: btusb: Add new VID/PID 13d3/3628 for MT7925
Bluetooth: btusb: Add new VID/PID 13d3/3610 for MT7922
Bluetooth: btusb: add sysfs attribute to control USB alt setting
Bluetooth: btusb: Add ID 0x2c7c:0x0130 for Qualcomm WCN785x
...
====================
Link: https://patch.msgid.link/20250117213203.3921910-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.14
Most likely the last "new features" pull request for v6.14 and this is
a bigger one. Multi-Link Operation (MLO) work continues both in stack
in drivers. Few new devices supported and usual fixes all over.
Major changes:
cfg80211
* Emergency Preparedness Communication Services (EPCS) station mode support
mac80211
* an option to filter a sta from being flushed
* some support for RX Operating Mode Indication (OMI) power saving
* support for adding and removing station links for MLO
iwlwifi
* new device ids
* rework firmware error handling and restart
rtw88
* RTL8812A: RFE type 2 support
* LED support
rtw89
* variant info to support RTL8922AE-VS
mt76
* mt7996: single wiphy multiband support (preparation for MLO)
* mt7996: support for more variants
* mt792x: P2P_DEVICE support
* mt7921u: TP-Link TXE50UH support
ath12k
* enable MLO for QCN9274 (although it seems to be broken with dual
band devices)
* MLO radar detection support
* debugfs: transmit buffer OFDMA, AST entry and puncture stats
* tag 'wireless-next-2025-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (322 commits)
wifi: brcmfmac: fix NULL pointer dereference in brcmf_txfinalize()
wifi: rtw88: add RTW88_LEDS depends on LEDS_CLASS to Kconfig
wifi: wilc1000: unregister wiphy only after netdev registration
wifi: cfg80211: adjust allocation of colocated AP data
wifi: mac80211: fix memory leak in ieee80211_mgd_assoc_ml_reconf()
wifi: ath12k: fix key cache handling
wifi: ath12k: Fix uninitialized variable access in ath12k_mac_allocate() function
wifi: ath12k: Remove ath12k_get_num_hw() helper function
wifi: ath12k: Refactor the ath12k_hw get helper function argument
wifi: ath12k: Refactor ath12k_hw set helper function argument
wifi: mt76: mt7996: add implicit beamforming support for mt7992
wifi: mt76: mt7996: fix beacon command during disabling
wifi: mt76: mt7996: fix ldpc setting
wifi: mt76: mt7996: fix definition of tx descriptor
wifi: mt76: connac: adjust phy capabilities based on band constraints
wifi: mt76: mt7996: fix incorrect indexing of MIB FW event
wifi: mt76: mt7996: fix HE Phy capability
wifi: mt76: mt7996: fix the capability of reception of EHT MU PPDU
wifi: mt76: mt7996: add max mpdu len capability
wifi: mt76: mt7921: avoid undesired changes of the preset regulatory domain
...
====================
Link: https://patch.msgid.link/20250117203529.72D45C4CEDD@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After 1b23cdbd2bbc ("net: protect netdev->napi_list with netdev_lock()")
it makes sense to iterate through dev->napi_list while holding
the device lock.
Also call synchronize_net() at most one time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250117232113.1612899-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have some leftovers from the switch to linkmode bitmaps which
- have never been used
- are not used any longer
- have no user outside phy_device.c
So remove them.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/5493b96e-88bb-4230-a911-322659ec5167@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Lion Ackermann was able to create a UAF which can be abused for privilege
escalation with the following script
Step 1. create root qdisc
tc qdisc add dev lo root handle 1:0 drr
step2. a class for packet aggregation do demonstrate uaf
tc class add dev lo classid 1:1 drr
step3. a class for nesting
tc class add dev lo classid 1:2 drr
step4. a class to graft qdisc to
tc class add dev lo classid 1:3 drr
step5.
tc qdisc add dev lo parent 1:1 handle 2:0 plug limit 1024
step6.
tc qdisc add dev lo parent 1:2 handle 3:0 drr
step7.
tc class add dev lo classid 3:1 drr
step 8.
tc qdisc add dev lo parent 3:1 handle 4:0 pfifo
step 9. Display the class/qdisc layout
tc class ls dev lo
class drr 1:1 root leaf 2: quantum 64Kb
class drr 1:2 root leaf 3: quantum 64Kb
class drr 3:1 root leaf 4: quantum 64Kb
tc qdisc ls
qdisc drr 1: dev lo root refcnt 2
qdisc plug 2: dev lo parent 1:1
qdisc pfifo 4: dev lo parent 3:1 limit 1000p
qdisc drr 3: dev lo parent 1:2
step10. trigger the bug <=== prevented by this patch
tc qdisc replace dev lo parent 1:3 handle 4:0
step 11. Redisplay again the qdiscs/classes
tc class ls dev lo
class drr 1:1 root leaf 2: quantum 64Kb
class drr 1:2 root leaf 3: quantum 64Kb
class drr 1:3 root leaf 4: quantum 64Kb
class drr 3:1 root leaf 4: quantum 64Kb
tc qdisc ls
qdisc drr 1: dev lo root refcnt 2
qdisc plug 2: dev lo parent 1:1
qdisc pfifo 4: dev lo parent 3:1 refcnt 2 limit 1000p
qdisc drr 3: dev lo parent 1:2
Observe that a) parent for 4:0 does not change despite the replace request.
There can only be one parent. b) refcount has gone up by two for 4:0 and
c) both class 1:3 and 3:1 are pointing to it.
Step 12. send one packet to plug
echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10001))
step13. send one packet to the grafted fifo
echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10003))
step14. lets trigger the uaf
tc class delete dev lo classid 1:3
tc class delete dev lo classid 1:1
The semantics of "replace" is for a del/add _on the same node_ and not
a delete from one node(3:1) and add to another node (1:3) as in step10.
While we could "fix" with a more complex approach there could be
consequences to expectations so the patch takes the preventive approach of
"disallow such config".
Joint work with Lion Ackermann <nnamrec@gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250116013713.900000-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The following trace can be seen if a device is being unregistered while
its number of channels are being modified.
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120
CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771
RIP: 0010:__mutex_lock+0xc8a/0x1120
Call Trace:
<TASK>
ethtool_check_max_channel+0x1ea/0x880
ethnl_set_channels+0x3c3/0xb10
ethnl_default_set_doit+0x306/0x650
genl_family_rcv_msg_doit+0x1e3/0x2c0
genl_rcv_msg+0x432/0x6f0
netlink_rcv_skb+0x13d/0x3b0
genl_rcv+0x28/0x40
netlink_unicast+0x42e/0x720
netlink_sendmsg+0x765/0xc20
__sys_sendto+0x3ac/0x420
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is because unregister_netdevice_many_notify might run before the
rtnl lock section of ethnl operations, eg. set_channels in the above
example. In this example the rss lock would be destroyed by the device
unregistration path before being used again, but in general running
ethnl operations while dismantle has started is not a good idea.
Fix this by denying any operation on devices being unregistered. A check
was already there in ethnl_ops_begin, but not wide enough.
Note that the same issue cannot be seen on the ioctl version
(__dev_ethtool) because the device reference is retrieved from within
the rtnl lock section there. Once dismantle started, the net device is
unlisted and no reference will be found.
Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250116092159.50890-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot complained that free_netdev() was calling netif_napi_del()
after dev->lock mutex has been destroyed.
This fires a warning for CONFIG_DEBUG_MUTEXES=y builds.
Move mutex_destroy(&dev->lock) near the end of free_netdev().
[1]
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 0 PID: 5971 at kernel/locking/mutex.c:564 __mutex_lock_common kernel/locking/mutex.c:564 [inline]
WARNING: CPU: 0 PID: 5971 at kernel/locking/mutex.c:564 __mutex_lock+0xdac/0xee0 kernel/locking/mutex.c:735
Modules linked in:
CPU: 0 UID: 0 PID: 5971 Comm: syz-executor Not tainted 6.13.0-rc7-syzkaller-01131-g8d20dcda404d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
RIP: 0010:__mutex_lock_common kernel/locking/mutex.c:564 [inline]
RIP: 0010:__mutex_lock+0xdac/0xee0 kernel/locking/mutex.c:735
Code: 0f b6 04 38 84 c0 0f 85 1a 01 00 00 83 3d 6f 40 4c 04 00 75 19 90 48 c7 c7 60 84 0a 8c 48 c7 c6 00 85 0a 8c e8 f5 dc 91 f5 90 <0f> 0b 90 90 90 e9 c7 f3 ff ff 90 0f 0b 90 e9 29 f8 ff ff 90 0f 0b
RSP: 0018:ffffc90003317580 EFLAGS: 00010246
RAX: ee0f97edaf7b7d00 RBX: ffff8880299f8cb0 RCX: ffff8880323c9e00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003317710 R08: ffffffff81602ac2 R09: 1ffff110170c519a
R10: dffffc0000000000 R11: ffffed10170c519b R12: 0000000000000000
R13: 0000000000000000 R14: 1ffff92000662ec4 R15: dffffc0000000000
FS: 000055557a046500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd581d46ff8 CR3: 000000006f870000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
netdev_lock include/linux/netdevice.h:2691 [inline]
__netif_napi_del include/linux/netdevice.h:2829 [inline]
netif_napi_del include/linux/netdevice.h:2848 [inline]
free_netdev+0x2d9/0x610 net/core/dev.c:11621
netdev_run_todo+0xf21/0x10d0 net/core/dev.c:11189
nsim_destroy+0x3c3/0x620 drivers/net/netdevsim/netdev.c:1028
__nsim_dev_port_del+0x14b/0x1b0 drivers/net/netdevsim/dev.c:1428
nsim_dev_port_del_all drivers/net/netdevsim/dev.c:1440 [inline]
nsim_dev_reload_destroy+0x28a/0x490 drivers/net/netdevsim/dev.c:1661
nsim_drv_remove+0x58/0x160 drivers/net/netdevsim/dev.c:1676
device_remove drivers/base/dd.c:567 [inline]
Fixes: 1b23cdbd2bbc ("net: protect netdev->napi_list with netdev_lock()")
Reported-by: syzbot+85ff1051228a04613a32@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/678add43.050a0220.303755.0016.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250117224626.1427577-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
W=1 builds with gcc 14.2.1 report:
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:4193:32: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size 27 [-Werror=format-truncation=]
4193 | "/pkg %s", buf);
It's upset that we let buf be full length but then we use 5
characters for "/pkg ".
The builds is also clear with clang version 19.1.5 now.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250117183726.1481524-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The number of SYN + MPC retransmissions before falling back to TCP was
fixed to 2. This is certainly a good default value, but having a fixed
number can be a problem in some environments.
The current behaviour means that if all packets are dropped, there will
be:
- The initial SYN + MPC
- 2 retransmissions with MPC
- The next ones will be without MPTCP.
So typically ~3 seconds before falling back to TCP. In some networks
where some temporally blackholes are unfortunately frequent, or when a
client tries to initiate connections while the network is not ready yet,
this can cause new connections not to have MPTCP connections.
In such environments, it is now possible to increase the number of SYN
retransmissions with MPTCP options to make sure MPTCP is used.
Interesting values are:
- 0: the first retransmission will be done without MPTCP options: quite
aggressive, but also a higher risk of detecting false-positive
MPTCP blackholes.
- >= 128: all SYN retransmissions will keep the MPTCP options: back to
the < 6.12 behaviour.
The default behaviour is not changed here.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250117-net-next-mptcp-syn_retrans_before_tcp_fallback-v1-1-ab4b187099b0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Shinas Rasheed says:
====================
Fix race conditions in ndo_get_stats64
Fix race conditions in ndo_get_stats64 by storing tx/rx stats
locally and not availing per queue resources which could be torn
down during interface stop. Also remove stats fetch from
firmware which is currently unnecessary
====================
Link: https://patch.msgid.link/20250117094653.2588578-1-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update tx/rx stats locally, so that ndo_get_stats64()
can use that and not rely on per queue resources to obtain statistics.
The latter used to cause race conditions when the device stopped.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-5-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The firmware stats fetch call that happens in ndo_get_stats64()
is currently not required, and causes a warning to issue.
The corresponding warn log for the PF is given below:
[ 123.316837] ------------[ cut here ]------------
[ 123.316840] Voluntary context switch within RCU read-side critical section!
[ 123.316917] pc : rcu_note_context_switch+0x2e4/0x300
[ 123.316919] lr : rcu_note_context_switch+0x2e4/0x300
[ 123.316947] Call trace:
[ 123.316949] rcu_note_context_switch+0x2e4/0x300
[ 123.316952] __schedule+0x84/0x584
[ 123.316955] schedule+0x38/0x90
[ 123.316956] schedule_timeout+0xa0/0x1d4
[ 123.316959] octep_send_mbox_req+0x190/0x230 [octeon_ep]
[ 123.316966] octep_ctrl_net_get_if_stats+0x78/0x100 [octeon_ep]
[ 123.316970] octep_get_stats64+0xd4/0xf0 [octeon_ep]
[ 123.316975] dev_get_stats+0x4c/0x114
[ 123.316977] dev_seq_printf_stats+0x3c/0x11c
[ 123.316980] dev_seq_show+0x1c/0x40
[ 123.316982] seq_read_iter+0x3cc/0x4e0
[ 123.316985] seq_read+0xc8/0x110
[ 123.316987] proc_reg_read+0x9c/0xec
[ 123.316990] vfs_read+0xc8/0x2ec
[ 123.316993] ksys_read+0x70/0x100
[ 123.316995] __arm64_sys_read+0x20/0x30
[ 123.316997] invoke_syscall.constprop.0+0x7c/0xd0
[ 123.317000] do_el0_svc+0xb4/0xd0
[ 123.317002] el0_svc+0xe8/0x1f4
[ 123.317005] el0t_64_sync_handler+0x134/0x150
[ 123.317006] el0t_64_sync+0x17c/0x180
[ 123.317008] ---[ end trace 63399811432ab69b ]---
Fixes: c3fad23cdc06 ("octeon_ep_vf: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-4-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update tx/rx stats locally, so that ndo_get_stats64()
can use that and not rely on per queue resources to obtain statistics.
The latter used to cause race conditions when the device stopped.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-3-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The firmware stats fetch call that happens in ndo_get_stats64()
is currently not required, and causes a warning to issue.
The warn log is given below:
[ 123.316837] ------------[ cut here ]------------
[ 123.316840] Voluntary context switch within RCU read-side critical section!
[ 123.316917] pc : rcu_note_context_switch+0x2e4/0x300
[ 123.316919] lr : rcu_note_context_switch+0x2e4/0x300
[ 123.316947] Call trace:
[ 123.316949] rcu_note_context_switch+0x2e4/0x300
[ 123.316952] __schedule+0x84/0x584
[ 123.316955] schedule+0x38/0x90
[ 123.316956] schedule_timeout+0xa0/0x1d4
[ 123.316959] octep_send_mbox_req+0x190/0x230 [octeon_ep]
[ 123.316966] octep_ctrl_net_get_if_stats+0x78/0x100 [octeon_ep]
[ 123.316970] octep_get_stats64+0xd4/0xf0 [octeon_ep]
[ 123.316975] dev_get_stats+0x4c/0x114
[ 123.316977] dev_seq_printf_stats+0x3c/0x11c
[ 123.316980] dev_seq_show+0x1c/0x40
[ 123.316982] seq_read_iter+0x3cc/0x4e0
[ 123.316985] seq_read+0xc8/0x110
[ 123.316987] proc_reg_read+0x9c/0xec
[ 123.316990] vfs_read+0xc8/0x2ec
[ 123.316993] ksys_read+0x70/0x100
[ 123.316995] __arm64_sys_read+0x20/0x30
[ 123.316997] invoke_syscall.constprop.0+0x7c/0xd0
[ 123.317000] do_el0_svc+0xb4/0xd0
[ 123.317002] el0_svc+0xe8/0x1f4
[ 123.317005] el0t_64_sync_handler+0x134/0x150
[ 123.317006] el0t_64_sync+0x17c/0x180
[ 123.317008] ---[ end trace 63399811432ab69b ]---
Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-2-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sean Anderson says:
====================
net: xilinx: axienet: Report an error for bad coalesce settings
====================
Link: https://patch.msgid.link/20250116232954.2696930-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Instead of silently ignoring invalid/unsupported settings, report an
error. Additionally, relax the check for non-zero usecs to apply only
when it will be used (i.e. when frames != 1).
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250116232954.2696930-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Instead of using literals, add some symbolic constants for the IRQ delay
timer calculation.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250116232954.2696930-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update header inclusions to follow IWYU (Include What You Use)
principle.
In this case replace of_gpio.h, which is subject to remove by the GPIOLIB
subsystem, with the respective headers that are being used by the driver.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20250116153119.148097-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
of_gpio.h is deprecated and subject to remove. The drivers in question
don't use it, simply remove the unused header.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|