Age | Commit message (Collapse) | Author |
|
The Linux client assumes that all filehandles are non-volatile for
renames within the same directory (otherwise sillyrename cannot work).
However, the existence of the Linux 'subtree_check' export option has
meant that nfs_rename() has always assumed it needs to flush writes
before attempting to rename.
Since NFSv4 does allow the client to query whether or not the server
exhibits this behaviour, and since knfsd does actually set the
appropriate flag when 'subtree_check' is enabled on an export, it
should be OK to optimise away the write flushing behaviour in the cases
where it is clearly not needed.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
If there isn't a valid layout, or the layout stateid has changed, the
cleanup after a layout return should clear out the old data.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If there are still layout segments in the layout plh_return_lsegs list
after a layout return, we should be resetting the state to ensure they
eventually get returned as well.
Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"Fix to zone block devices to make the maximum segment count match what
the block layer is capable of"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES buffer
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- fixes for atomic writes (Alan Adamson)
- fixes for polled CQs in nvmet-epf (Damien Le Moal)
- fix for polled CQs in nvme-pci (Keith Busch)
- fix compile on odd configs that need to be forced to inline
(Kees Cook)
- one more quirk (Ilya Guterman)
- Fix for missing allocation of an integrity buffer for some cases
- Fix for a regression with ublk command cancelation
* tag 'block-6.15-20250515' of git://git.kernel.dk/linux:
ublk: fix dead loop when canceling io command
nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro
nvme: all namespaces in a subsystem must adhere to a common atomic write size
nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathing
nvmet: pci-epf: remove NVMET_PCI_EPF_Q_IS_SQ
nvmet: pci-epf: improve debug message
nvmet: pci-epf: cleanup nvmet_pci_epf_raise_irq()
nvmet: pci-epf: do not fall back to using INTX if not supported
nvmet: pci-epf: clear completion queue IRQ flag on delete
nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable
nvme-pci: make nvme_pci_npages_prp() __always_inline
block: always allocate integrity buffer when required
|
|
Pull io_uring fixes from Jens Axboe:
- Fix a regression with highmem and mapping of regions, where
the coalescing code assumes any page is directly mapped
- Fix an issue with HYBRID_IOPOLL and passthrough commands,
where the timer wasn't always setup correctly
- Fix an issue with fdinfo not correctly locking around reading
the rings, which can be an issue if the ring is being resized
at the same time
* tag 'io_uring-6.15-20250515' of git://git.kernel.dk/linux:
io_uring/fdinfo: grab ctx->uring_lock around io_uring_show_fdinfo()
io_uring/memmap: don't use page_address() on a highmem page
io_uring/uring_cmd: fix hybrid polling initialization issue
|
|
Pull xfs fixes from Carlos Maiolino:
"This includes a bug fix for a possible data corruption vector on the
zoned allocator garbage collector"
* tag 'xfs-fixes-6.15-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Fix comment on xfs_trans_ail_update_bulk()
xfs: Fix a comment on xfs_ail_delete
xfs: Fail remount with noattr2 on a v5 with v4 enabled
xfs: fix zoned GC data corruption due to wrong bv_offset
xfs: free up mp->m_free[0].count in error case
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix ACPI PPTT parsing code to address a regression introduced recently
and add more sanity checking of data supplied by the platform firmware
to avoid using invalid data (Jeremy Linton)"
* tag 'acpi-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PPTT: Fix processor subtable walk
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small driver specific fixes, the most substantial one being the
Tegra one which fixes spurious errors with default delays for chip
select hold times"
* tag 'spi-fix-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-sun4i: fix early activation
spi: tegra114: Use value to check for invalid delays
spi: loopback-test: Do not split 1024-byte hexdumps
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"This fixes an invalid memory access in the MAX20086 driver which could
occur during error handling for failed probe due to a hidden use of
devres in the core DT parsing code"
* tag 'regulator-fix-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max20086: fix invalid memory access
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an interrupt storm on system wake-up in gpio-pca953x
- fix an out-of-bounds write in gpio-virtuser
- update MAINTAINERS with an entry for the sloppy logic analyzer
* tag 'gpio-fixes-for-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: virtuser: fix potential out-of-bound write
gpio: pca953x: fix IRQ storm on system wake up
MAINTAINERS: add me as maintainer for the gpio sloppy logic analyzer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A handful small fixes. The only significant change is the fix for MIDI
2.0 UMP handling in ALSA sequencer, but as MIDI 2.0 stuff is still new
and rarely used, the impact should be pretty limited.
Other than that, quirks for USB-audio and a few cosmetic fixes and
changes in drivers that should be safe to apply"
* tag 'sound-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add sample rate quirk for Microdia JP001 USB Camera
ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2()
ALSA: sh: SND_AICA should depend on SH_DMA_API
ALSA: usb-audio: Add sample rate quirk for Audioengine D1
ALSA: ump: Fix a typo of snd_ump_stream_msg_device_info
ALSA/hda: intel-sdw-acpi: Correct sdw_intel_acpi_scan() function parameter
ALSA: seq: Fix delivery of UMP events to group ports
|
|
Similar to many other Lenovo models with AMD chips, the Lenovo
Yoga Pro 7 14ASP9 (product name 83HN) requires a specific quirk
to ensure internal mic detection. This patch adds a quirk fixing this.
Signed-off-by: Talhah Peerbhai <talhah.peerbhai@gmail.com>
Link: https://patch.msgid.link/20250515222741.144616-1-talhah.peerbhai@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
XCVR driver is not only used for i.MX8MP platform, so update driver name
to make it more generic.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://patch.msgid.link/20250516080334.3272878-1-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
In the recently added snd_soc_dlc_is_dummy(), the helper uses the .name
and .dai_name fields without checking their validity.
For .name, this field is NULL if the component is matched by .of_node
instead. In fact, only one of these fields may be set. This caused a
NULL pointer dereference on MediaTek MT8195 and MT8188 platforms with
the subsequent conversion to snd_soc_dlc_is_dummy() in their machine
drivers. The codecs are all matches through the device tree, so their
.name fields are empty.
For .dai_name, there are cases where this field is empty, such as for
the following component definitions:
#define COMP_EMPTY() { }
#define COMP_PLATFORM(_name) { .name = _name }
#define COMP_AUX(_name) { .name = _name }
#define COMP_CODEC_CONF(_name) { .name = _name }
Or the single link CPU DAI case in the simple audio card family, as
covered by simple_util_canonicalize_cpu(), in which the .dai_name
field is explicitly cleared.
To fix this, check the validity of the fields before using them in
string comparison.
Fixes: 3e021f3b8115 ("ASoC: soc-utils: add snd_soc_dlc_is_dummy()")
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/20250516050549.407133-1-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Commit bbeb69ce3013 ("x86/mm: Remove CONFIG_HIGHMEM64G support") introduces
a new warning message MSG_HIGHMEM_TRIMMED, which accidentally introduces a
duplicated 'for for' in the warning message.
Remove this duplicated word.
This was noticed while reviewing for references to obsolete kernel build
config options.
Fixes: bbeb69ce3013 ("x86/mm: Remove CONFIG_HIGHMEM64G support")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20250516090810.556623-1-lukas.bulwahn@redhat.com
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Core Changes:
- Add timeslicing and allocation restriction for SVM
Driver Changes:
- Fix shrinker debugfs name
- Add HW workaround to Xe2
- Fix SVM when mixing GPU and CPU atomics
- Fix per client engine utilization due to active contexts
not saving timestamp with lite restore enabled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/qil4scyn6ucnt43u5ju64bi7r7n5r36k4pz5rsh2maz7isle6g@lac3jpsjrrvs
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
dma-buf:
- Avoid memory reordering in fence handling
ivpu:
- Fix buffer size in debugfs code
meson:
- Avoid integer overflow in mode-clock calculations
panel-mipi-dbi:
- Fix output with drm_client_setup_with_fourcc()
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250515125534.GA41174@linux.fritz.box
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.15-2025-05-14:
amdgpu:
- Fix CSA unmap
- Fix MALL size reporting on GFX11.5
- AUX fix
- DCN 3.5 fix
- VRR fix
- DP MST fix
- DML 2.1 fixes
- Silence DP AUX spam
- DCN 4.0.1 cursor fix
- VCN 4.0.5 fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250514185117.758496-1-alexander.deucher@amd.com
|
|
Pull bcachefs fixes from Kent Overstreet:
"The main user reported ones are:
- Fix a btree iterator locking inconsistency that's been causing us
to go emergency read-only in evacuate: "Fix broken btree_path lock
invariants in next_node()"
- Minor btree node cache reclaim tweak that should help with OOMs:
don't set btree nodes as accessed on fill
- Fix a bch2_bkey_clear_rebalance() issue that was causing rebalance
to do needless work"
* tag 'bcachefs-2025-05-15' of git://evilpiepirate.org/bcachefs:
bcachefs: fix wrong arg to fsck_err()
bcachefs: Fix missing commit in backpointer to missing target
bcachefs: Fix accidental O(n^2) in fiemap
bcachefs: Fix set_should_be_locked() call in peek_slot()
bcachefs: Fix self deadlock
bcachefs: Don't set btree nodes as accessed on fill
bcachefs: Fix livelock in journal_entry_open()
bcachefs: Fix broken btree_path lock invariants in next_node()
bcachefs: Don't strip rebalance_opts from indirect extents
|
|
Pull rdma fixes from Jason Gunthorpe:
"Four small fixes for crashes:
- Double free in rxe
- UAF in irdma from early freeing the rf
- Off by one undoing the IRQ allocations during error unwind in irdma
- Another race with device rename and uevent generation. uevents
accesses the struct device name and UAF when it is changed"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/core: Fix "KASAN: slab-use-after-free Read in ib_register_device" problem
ice, irdma: fix an off by one in error handling code
irdma: free iwdev->rf after removing MSI-X
RDMA/rxe: Fix slab-use-after-free Read in rxe_queue_cleanup bug
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"This fixes a KUnit issue, simplifies code, and adds new tests"
* tag 'landlock-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Improve bit operations in audit code
landlock: Remove KUnit test that triggers a warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- fix a few potential memory leaks in the wacom driver (Qasim Ijaz)
- AMD SFH fixes when there is only one SRA sensor (Mario Limonciello)
- HID-BPF dispatch UAF fix that happens on removal of the Logitech DJ
receiver (Rong Zhang)
- various minor fixes and usual device ID additions
* tag 'hid-for-linus-2025051501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: bpf: abort dispatch if device destroyed
HID: quirks: Add ADATA XPG alpha wireless mouse support
HID: hid-steam: Remove the unused variable connected
HID: amd_sfh: Avoid clearing reports for SRA sensor
HID: amd_sfh: Fix SRA sensor when it's the only sensor
HID: wacom: fix shift OOB in kfifo allocation for zero pktlen
HID: uclogic: Add NULL check in uclogic_input_configured()
HID: wacom: fix memory leak on size mismatch in wacom_wac_queue_flush()
HID: wacom: handle kzalloc() allocation failure in wacom_wac_queue_flush()
HID: thrustmaster: fix memory leak in thrustmaster_interrupts()
HID: hid-appletb-kbd: Fix wrong date and kernel version in sysfs interface docs
HID: bpf: fix BTN_STYLUS for the XP Pen ACK05 remote
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth and wireless.
A few more fixes for the locking changes trickling in. Nothing too
alarming, I suspect those will continue for another release. Other
than that things are slowing down nicely.
Current release - fix to a fix:
- Bluetooth: hci_event: use key encryption size when its known
- tools: ynl-gen: allow multi-attr without nested-attributes again
Current release - regressions:
- locking fixes:
- lock lower level devices when updating features
- eth: bnxt_en: bring back rtnl_lock() in the bnxt_open() path
- devmem: fix panic when Netlink socket closes after module unload
Current release - new code bugs:
- eth: txgbe: fixes for FW communication on new AML devices
Previous releases - always broken:
- sched: flush gso_skb list too during ->change(), avoid potential
null-deref on reconfig
- wifi: mt76: disable NAPI on driver removal
- hv_netvsc: fix error 'nvsp_rndis_pkt_complete error status: 2'"
* tag 'net-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
net: devmem: fix kernel panic when netlink socket close after module unload
tsnep: fix timestamping with a stacked DSA driver
net/tls: fix kernel panic when alloc_page failed
bnxt_en: bring back rtnl_lock() in the bnxt_open() path
mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devices
wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request
octeontx2-pf: Do not reallocate all ntuple filters
wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtbl
wifi: mt76: disable napi on driver removal
Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()
hv_netvsc: Remove rmsg_pgcnt
hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages
Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges
octeontx2-af: Fix CGX Receive counters
net: ethernet: mtk_eth_soc: fix typo for declaration MT7988 ESW capability
net: libwx: Fix FW mailbox unknown command
net: libwx: Fix FW mailbox reply timeout
net: txgbe: Fix to calculate EEPROM checksum for AML devices
octeontx2-pf: macsec: Fix incorrect max transmit size in TX secy
...
|
|
Commit:
f40139fde527 ("ublk: fix race between io_uring_cmd_complete_in_task and
ublk_cancel_cmd")
adds a request state check in ublk_cancel_cmd(), and if the request is
started, skips canceling this uring_cmd.
However, the current uring_cmd may be in ACTIVE state, without block
request coming to the uring command. Meantime, if the cached request in
tag_set.tags[tag] has been delivered to ublk server and reycycled, then
this uring_cmd can't be canceled.
ublk requests are aborted in ublk char device release handler, which
depends on canceling all ACTIVE uring_cmd. So it causes a dead loop.
Fix this issue by not taking a stale request into account when canceling
uring_cmd in ublk_cancel_cmd().
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-block/mruqwpf4tqenkbtgezv5oxwq7ngyq24jzeyqy4ixzvivatbbxv@4oh2wzz4e6qn/
Fixes: f40139fde527 ("ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250515162601.77346-1-ming.lei@redhat.com
[axboe: rewording of commit message]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently, using PEBS-via-PT with a sample frequency instead of a sample
period, causes a segfault. For example:
BUG: kernel NULL pointer dereference, address: 0000000000000195
<NMI>
? __die_body.cold+0x19/0x27
? page_fault_oops+0xca/0x290
? exc_page_fault+0x7e/0x1b0
? asm_exc_page_fault+0x26/0x30
? intel_pmu_pebs_event_update_no_drain+0x40/0x60
? intel_pmu_pebs_event_update_no_drain+0x32/0x60
intel_pmu_drain_pebs_icl+0x333/0x350
handle_pmi_common+0x272/0x3c0
intel_pmu_handle_irq+0x10a/0x2e0
perf_event_nmi_handler+0x2a/0x50
That happens because intel_pmu_pebs_event_update_no_drain() assumes all the
pebs_enabled bits represent counter indexes, which is not always the case.
In this particular case, bits 60 and 61 are set for PEBS-via-PT purposes.
The behaviour of PEBS-via-PT with sample frequency is questionable because
although a PMI is generated (PEBS_PMI_AFTER_EACH_RECORD), the period is not
adjusted anyway.
Putting that aside, fix intel_pmu_pebs_event_update_no_drain() by passing
the mask of counter bits instead of 'size'. Note, prior to the Fixes
commit, 'size' would be limited to the maximum counter index, so the issue
was not hit.
Fixes: 722e42e45c2f1 ("perf/x86: Support counter mask")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20250508134452.73960-1-adrian.hunter@intel.com
|
|
Kernel panic occurs when a devmem TCP socket is closed after NIC module
is unloaded.
This is Devmem TCP unregistration scenarios. number is an order.
(a)netlink socket close (b)pp destroy (c)uninstall result
1 2 3 OK
1 3 2 (d)Impossible
2 1 3 OK
3 1 2 (e)Kernel panic
2 3 1 (d)Impossible
3 2 1 (d)Impossible
(a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
closed.
(b) page_pool_destroy() is called when the interface is down.
(c) mp_ops->uninstall() is called when an interface is unregistered.
(d) There is no scenario in mp_ops->uninstall() is called before
page_pool_destroy().
Because unregister_netdevice_many_notify() closes interfaces first
and then calls mp_ops->uninstall().
(e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
netdev_lock().
But if the interface module has already been removed, net_device
pointer is invalid, so it causes kernel panic.
In summary, there are only 3 possible scenarios.
A. sk close -> pp destroy -> uninstall.
B. pp destroy -> sk close -> uninstall.
C. pp destroy -> uninstall -> sk close.
Case C is a kernel panic scenario.
In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
binding->dev to NULL.
It indicates an bound net_device was unregistered.
It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
if binding->dev is NULL.
A new binding->lock is added to protect a dev of a binding.
So, lock ordering is like below.
priv->lock
netdev_lock(dev)
binding->lock
Tests:
Scenario A:
./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
-v 7 -t 1 -q 1 &
pid=$!
sleep 10
kill $pid
ip link set $interface down
modprobe -rv $module
Scenario B:
./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
-v 7 -t 1 -q 1 &
pid=$!
sleep 10
ip link set $interface down
kill $pid
modprobe -rv $module
Scenario C:
./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
-v 7 -t 1 -q 1 &
pid=$!
sleep 10
modprobe -rv $module
sleep 5
kill $pid
Splat looks like:
Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G B W 6.15.0-rc1+ #2 PREEMPT(undef) 0947ec89efa0fd68838b78e36aa1617e97ff5d7f
Tainted: [B]=BAD_PAGE, [W]=WARN
RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
FS: 0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
...
netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
...
netlink_release (net/netlink/af_netlink.c:737)
...
__sock_release (net/socket.c:647)
sock_close (net/socket.c:1393)
Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250514154028.1062909-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This driver is susceptible to a form of the bug explained in commit
c26a2c2ddc01 ("gianfar: Fix TX timestamping with a stacked DSA driver")
and in Documentation/networking/timestamping.rst section "Other caveats
for MAC drivers", specifically it timestamps any skb which has
SKBTX_HW_TSTAMP, and does not consider if timestamping has been enabled
in adapter->hwtstamp_config.tx_type.
Evaluate the proper TX timestamping condition only once on the TX
path (in tsnep_xmit_frame_ring()) and store the result in an additional
TX entry flag. Evaluate the new TX entry flag in the TX confirmation path
(in tsnep_tx_poll()).
This way SKBTX_IN_PROGRESS is set by the driver as required, but never
evaluated. SKBTX_IN_PROGRESS shall not be evaluated as it can be set
by a stacked DSA driver and evaluating it would lead to unwanted
timestamps.
Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver")
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250514195657.25874-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We cannot set frag_list to NULL pointer when alloc_page failed.
It will be used in tls_strp_check_queue_ok when the next time
tls_strp_read_sock is called.
This is because we don't reset full_len in tls_strp_flush_anchor_copy()
so the recv path will try to continue handling the partial record
on the next call but we dettached the rcvq from the frag list.
Alternative fix would be to reset full_len.
Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000028
Call trace:
tls_strp_check_rcv+0x128/0x27c
tls_strp_data_ready+0x34/0x44
tls_data_ready+0x3c/0x1f0
tcp_data_ready+0x9c/0xe4
tcp_data_queue+0xf6c/0x12d0
tcp_rcv_established+0x52c/0x798
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Signed-off-by: Pengtao He <hept.hept.hept@gmail.com>
Link: https://patch.msgid.link/20250514132013.17274-1-hept.hept.hept@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Couple of stragglers:
- mac80211: fix syzbot/ubsan in scan counted-by
- mt76: fix NAPI handling on driver remove
- mt67: fix multicast/ipv6 receive
* tag 'wireless-2025-05-15' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request
wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtbl
wifi: mt76: disable napi on driver removal
====================
Link: https://patch.msgid.link/20250515121749.61912-4-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Error recovery, PCIe AER, resume, and TX timeout will invoke bnxt_open()
with netdev_lock only. This will cause RTNL assert failure in
netif_set_real_num_tx_queues(), netif_set_real_num_tx_queues(),
and netif_set_real_num_tx_queues().
Example error recovery assert:
RTNL: assertion failed at net/core/dev.c (3178)
WARNING: CPU: 3 PID: 3392 at net/core/dev.c:3178 netif_set_real_num_tx_queues+0x1fd/0x210
Call Trace:
<TASK>
? __pfx_bnxt_msix+0x10/0x10 [bnxt_en]
__bnxt_open_nic+0x1ef/0xb20 [bnxt_en]
bnxt_open+0xda/0x130 [bnxt_en]
bnxt_fw_reset_task+0x21f/0x780 [bnxt_en]
process_scheduled_works+0x9d/0x400
For now, bring back rtnl_lock() in all these code paths that can invoke
bnxt_open(). In the bnxt_queue_start() error path, we don't have
rtnl_lock held so we just change it to call netif_close() instead of
bnxt_reset_task() for simplicity. This error path is unlikely so it
should be fine.
Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250514062908.2766677-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver only offloads neighbors that are constructed on top of net
devices registered by it or their uppers (which are all Ethernet). The
device supports GRE encapsulation and decapsulation of forwarded
traffic, but the driver will not offload dummy neighbors constructed on
top of GRE net devices as they are not uppers of its net devices:
# ip link add name gre1 up type gre tos inherit local 192.0.2.1 remote 198.51.100.1
# ip neigh add 0.0.0.0 lladdr 0.0.0.0 nud noarp dev gre1
$ ip neigh show dev gre1 nud noarp
0.0.0.0 lladdr 0.0.0.0 NOARP
(Note that the neighbor is not marked with 'offload')
When the driver is reloaded and the existing configuration is replayed,
the driver does not perform the same check regarding existing neighbors
and offloads the previously added one:
# devlink dev reload pci/0000:01:00.0
$ ip neigh show dev gre1 nud noarp
0.0.0.0 lladdr 0.0.0.0 offload NOARP
If the neighbor is later deleted, the driver will ignore the
notification (given the GRE net device is not its upper) and will
therefore keep referencing freed memory, resulting in a use-after-free
[1] when the net device is deleted:
# ip neigh del 0.0.0.0 lladdr 0.0.0.0 dev gre1
# ip link del dev gre1
Fix by skipping neighbor replay if the net device for which the replay
is performed is not our upper.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x1ea/0x200
Read of size 8 at addr ffff888155b0e420 by task ip/2282
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x6f/0xa0
print_address_description.constprop.0+0x6f/0x350
print_report+0x108/0x205
kasan_report+0xdf/0x110
mlxsw_sp_neigh_entry_update+0x1ea/0x200
mlxsw_sp_router_rif_gone_sync+0x2a8/0x440
mlxsw_sp_rif_destroy+0x1e9/0x750
mlxsw_sp_netdevice_ipip_ol_event+0x3c9/0xdc0
mlxsw_sp_router_netdevice_event+0x3ac/0x15e0
notifier_call_chain+0xca/0x150
call_netdevice_notifiers_info+0x7f/0x100
unregister_netdevice_many_notify+0xc8c/0x1d90
rtnl_dellink+0x34e/0xa50
rtnetlink_rcv_msg+0x6fb/0xb70
netlink_rcv_skb+0x131/0x360
netlink_unicast+0x426/0x710
netlink_sendmsg+0x75a/0xc20
__sock_sendmsg+0xc1/0x150
____sys_sendmsg+0x5aa/0x7b0
___sys_sendmsg+0xfc/0x180
__sys_sendmsg+0x121/0x1b0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes: 8fdb09a7674c ("mlxsw: spectrum_router: Replay neighbours when RIF is made")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/c53c02c904fde32dad484657be3b1477884e9ad6.1747225701.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When starting the local sync timer to synchronize the state of a remote
CPU it should be added on the CPU to be synchronized, not the initiating
CPU. This results in interrupt delivery being delayed until the timer
eventually runs (due to another mask/unmask/migrate operation) on the
target CPU.
Fixes: 0f67911e821c ("irqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector")
Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/all/20250514171320.3494917-1-abrestic@rivosinc.com
|
|
Pull NVMe fixes from Christoph:
"nvme fixes for linux 6.15
- fixes for atomic writes (Alan Adamson)
- fixes for polled CQs in nvmet-epf (Damien Le Moal)
- fix for polled CQs in nvme-pci (Keith Busch)
- fix compile on odd configs that need to be forced to inline
(Kees Cook)
- one more quirk (Ilya Guterman)"
* tag 'nvme-6.15-2025-05-15' of git://git.infradead.org/nvme:
nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro
nvme: all namespaces in a subsystem must adhere to a common atomic write size
nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathing
nvmet: pci-epf: remove NVMET_PCI_EPF_Q_IS_SQ
nvmet: pci-epf: improve debug message
nvmet: pci-epf: cleanup nvmet_pci_epf_raise_irq()
nvmet: pci-epf: do not fall back to using INTX if not supported
nvmet: pci-epf: clear completion queue IRQ flag on delete
nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable
nvme-pci: make nvme_pci_npages_prp() __always_inline
|
|
Merge series from Chen-Yu Tsai <wenst@chromium.org>:
This series is meant as an example on how to use macros and range cases
to shorten the MediaTek audio frontend drivers. The drivers have large
tables describing the registers and register fields for every supported
audio DMA interface. (Some are actually skipped!) There's a lot of
duplication which can be eliminated using macros. This should serve as
a reference for the MT8196 AFE driver that I had commented on.
The three patches tackle separate tables in the driver. The remaining
one that could be tackled is the list of DAIs; but that one has more
differences between each entry, so I haven't done it yet.
|
|
Felix Fietkau says:
===================
mt76 fix for 6.15
- disable napi on driver removal to fix warning
- fix multicast rx regression on mt7925
===================
Link: https://patch.msgid.link/3b526d06-b717-4d47-817c-a9f47b796a31@nbd.name/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Make sure that n_channels is set after allocating the
struct cfg80211_registered_device::int_scan_req member. Seen with
syzkaller:
UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:1208:5
index 0 is out of range for type 'struct ieee80211_channel *[] __counted_by(n_channels)' (aka 'struct ieee80211_channel *[]')
This was missed in the initial conversions because I failed to locate
the allocation likely due to the "sizeof(void *)" not matching the
"channels" array type.
Reported-by: syzbot+4bcdddd48bb6f0be0da1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/680fd171.050a0220.2b69d1.045e.GAE@google.com/
Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by")
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/20250509184641.work.542-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Microdia JP001 does not support reading the sample rate which leads to
many lines of "cannot get freq at ep 0x84".
This patch adds the USB ID to quirks.c and avoids those error messages.
usb 7-4: New USB device found, idVendor=0c45, idProduct=636b, bcdDevice= 1.00
usb 7-4: New USB device strings: Mfr=2, Product=1, SerialNumber=3
usb 7-4: Product: JP001
usb 7-4: Manufacturer: JP001
usb 7-4: SerialNumber: JP001
usb 7-4: 3:1: cannot get freq at ep 0x84
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicolas Chauvet <kwizart@gmail.com>
Link: https://patch.msgid.link/20250515102132.73062-1-kwizart@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Commit 157ae5ffd76a dmaengine: mediatek: Fix a possible deadlock error
in mtk_cqdma_tx_status() fixed locks but kept unused varibale leading to
warning and build failure (due to warning treated as errors)
drivers/dma/mediatek/mtk-cqdma.c: In function 'mtk_cqdma_find_active_desc':
drivers/dma/mediatek/mtk-cqdma.c:423:23: error: unused variable 'flags' [-Werror=unused-variable]
423 | unsigned long flags;
| ^~~~~
Fix by dropping this unused flag
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 157ae5ffd76a ("dmaengine: mediatek: Fix a possible deadlock error in mtk_cqdma_tx_status()")
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
If ntuple filters count is modified followed by
unicast filters count using devlink then the ntuple count
set by user is ignored and all the ntuple filters are
being reallocated. Fix this by storing the ntuple count
set by user. Without this patch, say if user tries
to modify ntuple count as 8 followed by ucast filter count as 4
using devlink commands then ntuple count is being reverted to
default value 16 i.e, not retaining user set value 8.
Fixes: 39c469188b6d ("octeontx2-pf: Add ucast filter count configurability via devlink.")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1747054357-5850-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Merge series from Simon Trimmer <simont@opensource.cirrus.com>:
These two patches introduce a log message when provisioning the cs35l56
family of devices that uniquely identifies the firmware tuning.
|
|
Merge series from Zhang Yi <zhangyi@everest-semi.com>:
The driver is for codec ES8389 of everest-semi.
|
|
Ensure that the hdr_trans_tlv command is included in the broadcast wtbl to
prevent the IPv6 and multicast packet from being dropped by the chip.
Cc: stable@vger.kernel.org
Fixes: cb1353ef3473 ("wifi: mt76: mt7925: integrate *mlo_sta_cmd and *sta_cmd")
Reported-by: Benjamin Xiao <fossben@pm.me>
Tested-by: Niklas Schnelle <niks@kernel.org>
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Link: https://lore.kernel.org/lkml/EmWnO5b-acRH1TXbGnkx41eJw654vmCR-8_xMBaPMwexCnfkvKCdlU5u19CGbaapJ3KRu-l3B-tSUhf8CCQwL0odjo6Cd5YG5lvNeB-vfdg=@pm.me/
Link: https://patch.msgid.link/20250509010421.403022-1-mingyen.hsieh@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
A warning on driver removal started occurring after commit 9dd05df8403b
("net: warn if NAPI instance wasn't shut down"). Disable tx napi before
deleting it in mt76_dma_cleanup().
WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100
CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy)
Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024
RIP: 0010:__netif_napi_del_locked+0xf0/0x100
Call Trace:
<TASK>
mt76_dma_cleanup+0x54/0x2f0 [mt76]
mt7921_pci_remove+0xd5/0x190 [mt7921e]
pci_device_remove+0x47/0xc0
device_release_driver_internal+0x19e/0x200
driver_detach+0x48/0x90
bus_remove_driver+0x6d/0xf0
pci_unregister_driver+0x2e/0xb0
__do_sys_delete_module.isra.0+0x197/0x2e0
do_syscall_64+0x7b/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Tested with mt7921e but the same pattern can be actually applied to other
mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled
in their *_dma_init() functions and only toggled off and on again inside
their suspend/resume/reset paths. So it should be okay to disable tx
napi in such a generic way.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 2ac515a5d74f ("mt76: mt76x02: use napi polling for tx cleanup")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Tested-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Link: https://patch.msgid.link/20250506115540.19045-1-pchelkin@ispras.ru
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
mt8183_is_volatile_reg() is a large switch-case block that lists out
every register that is volatile. Since many pairs of registers have
consecutive addresses, the cases can be compressed down with the
ellipsis, i.e. GCC extension "case ranges" [1] to cover more addresses
in one case, shortening the source code.
This is not completely the same, since the addresses are 4-byte aligned,
and using the case ranges feature adds all unaligned addresses in
between. In practice this doesn't matter since the unaligned addresses
are blocked by the regmap core. This also ends up compiling slightly
smaller with a reduction of 128 bytes in the text section.
[1] https://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250515073825.4155297-4-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The irq_data table describes all the supported interrupts for the audio
frontend. The parameters are either the same or can be derived from the
interrupt number. This results in a very long table (in source code)
that can be shortened with macros.
Do just that.
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250515073825.4155297-3-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The memif_data table describes all the supported PCM channels for the
audio frontend. Most of the fields are either the same or can be derived
from the interface's name. This results in a very long table (in source
code) that can be shortened with macros.
Do just that. Some "convenience" macros were added to cover non-existent
register fields that would otherwise require multiple layers of macros
to handle.
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20250515073825.4155297-2-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Change the port enable failure error message format specifier to make
it less confusing.
Take the chance to align the style ('fail'->'Failed') while at it.
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20250514-topic-asoc_print_hexdec-v1-1-85e90947ec4f@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Add proper pahole version dependency to CONFIG_GENDWARFKSYMS to avoid
module loading errors
- Fix UAPI header tests for the OpenRISC architecture
- Add dependency on the libdw package in Debian and RPM packages
- Disable -Wdefault-const-init-unsafe warnings on Clang
- Make "make clean ARCH=um" also clean the arch/x86/ directory
- Revert the use of -fmacro-prefix-map=, which causes issues with
debugger usability
* tag 'kbuild-fixes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: fix typos "module.builtin" to "modules.builtin"
Revert "kbuild, rust: use -fremap-path-prefix to make paths relative"
Revert "kbuild: make all file references relative to source root"
kbuild: fix dependency on sorttable
init: remove unused CONFIG_CC_CAN_LINK_STATIC
um: let 'make clean' properly clean underlying SUBARCH as well
kbuild: Disable -Wdefault-const-init-unsafe
kbuild: rpm-pkg: Add (elfutils-devel or libdw-devel) to BuildRequires
kbuild: deb-pkg: Add libdw-dev:native to Build-Depends-Arch
usr/include: openrisc: don't HDRTEST bpf_perf_event.h
kbuild: Require pahole <v1.28 or >v1.29 with GENDWARFKSYMS on X86
|
|
Michael Kelley says:
====================
hv_netvsc: Fix error "nvsp_rndis_pkt_complete error status: 2"
Starting with commit dca5161f9bd0 in the 6.3 kernel, the Linux driver
for Hyper-V synthetic networking (netvsc) occasionally reports
"nvsp_rndis_pkt_complete error status: 2".[1] This error indicates
that Hyper-V has rejected a network packet transmit request from the
guest, and the outgoing network packet is dropped. Higher level
network protocols presumably recover and resend the packet so there is
no functional error, but performance is slightly impacted. Commit
dca5161f9bd0 is not the cause of the error -- it only added reporting
of an error that was already happening without any notice. The error
has presumably been present since the netvsc driver was originally
introduced into Linux.
This patch set fixes the root cause of the problem, which is that the
netvsc driver in Linux may send an incorrectly formatted VMBus message
to Hyper-V when transmitting the network packet. The incorrect
formatting occurs when the rndis header of the VMBus message crosses a
page boundary due to how the Linux skb head memory is aligned. In such
a case, two PFNs are required to describe the location of the rndis
header, even though they are contiguous in guest physical address
(GPA) space. Hyper-V requires that two PFNs be in a single "GPA range"
data struture, but current netvsc code puts each PFN in its own GPA
range, which Hyper-V rejects as an error in the case of the rndis
header.
The incorrect formatting occurs only for larger packets that netvsc
must transmit via a VMBus "GPA Direct" message. There's no problem
when netvsc transmits a smaller packet by copying it into a pre-
allocated send buffer slot because the pre-allocated slots don't have
page crossing issues.
After commit 14ad6ed30a10 in the 6.14 kernel, the error occurs much
more frequently in VMs with 16 or more vCPUs. It may occur every few
seconds, or even more frequently, in a ssh session that outputs a lot
of text. Commit 14ad6ed30a10 subtly changes how skb head memory is
allocated, making it much more likely that the rndis header will cross
a page boundary when the vCPU count is 16 or more. The changes in
commit 14ad6ed30a10 are perfectly valid -- they just had the side
effect of making the netvsc bug more prominent.
One fix is to check for adjacent PFNs in vmbus_sendpacket_pagebuffer()
and just combine them into a single GPA range. Such a fix is very
contained. But conceptually it is fixing the problem at the wrong
level. So this patch set takes the broader approach of maintaining
the already known grouping of contiguous PFNs at a higher level in
the netvsc driver code, and propagating that grouping down to the
creation of the VMBus message to send to Hyper-V. Maintaining the
grouping fixes this problem, and has the added benefit of allowing
netvsc_dma_map() to make fewer calls to dma_map_single() to do bounce
buffering in CoCo VMs.
Patch 1 is a preparatory change to allow vmbus_sendpacket_mpb_desc()
to specify multiple GPA ranges. In current code
vmbus_sendpacket_mpb_desc() is used only by the storvsc synthetic SCSI
driver, and it always creates a single GPA range.
Patch 2 updates the netvsc driver to use vmbus_sendpacket_mpb_desc()
instead of vmbus_sendpacket_pagebuffer(). Because the higher levels of
netvsc still don't group contiguous PFNs, this patch is functionally
neutral. The VMBus message to Hyper-V still has many GPA ranges, each
with a single PFN. But it lays the groundwork for the next patch.
Patch 3 changes the higher levels of netvsc to preserve the already
known grouping of contiguous PFNs. When the contiguous groupings are
passed to vmbus_sendpacket_mpb_desc(), GPA ranges containing multiple
PFNs are produced, as expected by Hyper-V. This is point at which the
core problem is fixed.
Patches 4 and 5 remove code that is no longer necessary after the
previous patches.
These changes provide a net reduction of about 65 lines of code, which
is an added benefit.
These changes have been tested in normal VMs, in SEV-SNP and TDX CoCo
VMs, and in Dv6-series VMs where the netvsp implementation is in the
OpenHCL paravisor instead of the Hyper-V host.
These changes are built against kernel version 6.15-rc6.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217503
====================
Link: https://patch.msgid.link/20250513000604.1396-1-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|