Age | Commit message (Collapse) | Author |
|
Lists should have fixed amount if items, so add missing constraint to
the 'reg' property (only one address space entry).
Fixes: c5eda0333076 ("dt-bindings: i2c: Add Realtek RTL I2C Controller")
Cc: <stable@vger.kernel.org> # v6.13+
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250702061530.6940-2-krzysztof.kozlowski@linaro.org
|
|
Making anonymous inodes regular files comes with a lot of risk and
regression potential as evidenced by a recent hickup in io_uring. We're
better of continuing to not have them be regular files. Since we have
S_ANON_INODE we can port all of our assertions easily.
Link: https://lore.kernel.org/20250702-work-fixes-v1-1-ff76ea589e33@kernel.org
Fixes: cfd86ef7e8e7 ("anon_inode: use a proper mode internally")
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: stable@kernel.org
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
dma_map_XXX() functions return values DMA_MAPPING_ERROR as error values
which is often ~0. The error value should be tested with
dma_mapping_error().
This patch creates a new function in niu_ops to test if the mapping
failed. The test is fixed in niu_rbr_add_page(), added in
niu_start_xmit() and the successfully mapped pages are unmaped upon error.
Fixes: ec2deec1f352 ("niu: Fix to check for dma mapping errors.")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Many error paths in tlmi_sysfs_init() lead to sysfs groups being removed
when they were not even created.
Fix this by letting the kobject core manage these groups through their
kobj_type's defult_groups.
Fixes: a40cd7ef22fb ("platform/x86: think-lmi: Add WMI interface support on Lenovo platforms")
Cc: stable@vger.kernel.org
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250630-lmi-fix-v3-3-ce4f81c9c481@gmail.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
In tlmi_analyze(), allocated structs with an embedded kobject are freed
in error paths after the they were already initialized.
Fix this by first by avoiding the initialization of kobjects in
tlmi_analyze() and then by correctly cleaning them up in
tlmi_release_attr() using their kset's kobject list.
Fixes: a40cd7ef22fb ("platform/x86: think-lmi: Add WMI interface support on Lenovo platforms")
Fixes: 30e78435d3bf ("platform/x86: think-lmi: Split kobject_init() and kobject_add() calls")
Cc: stable@vger.kernel.org
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250630-lmi-fix-v3-2-ce4f81c9c481@gmail.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Avoid entering tlmi_release_attr() in error paths if both ksets are not
yet created.
This is accomplished by initializing them side by side.
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250630-lmi-fix-v3-1-ce4f81c9c481@gmail.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
A GEM handle can be released while the GEM buffer object is attached
to a DRM framebuffer. This leads to the release of the dma-buf backing
the buffer object, if any. [1] Trying to use the framebuffer in further
mode-setting operations leads to a segmentation fault. Most easily
happens with driver that use shadow planes for vmap-ing the dma-buf
during a page flip. An example is shown below.
[ 156.791968] ------------[ cut here ]------------
[ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430
[...]
[ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430
[ 157.043420] Call Trace:
[ 157.045898] <TASK>
[ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0
[ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710
[ 157.065567] ? dma_buf_vmap+0x224/0x430
[ 157.069446] ? __warn.cold+0x58/0xe4
[ 157.073061] ? dma_buf_vmap+0x224/0x430
[ 157.077111] ? report_bug+0x1dd/0x390
[ 157.080842] ? handle_bug+0x5e/0xa0
[ 157.084389] ? exc_invalid_op+0x14/0x50
[ 157.088291] ? asm_exc_invalid_op+0x16/0x20
[ 157.092548] ? dma_buf_vmap+0x224/0x430
[ 157.096663] ? dma_resv_get_singleton+0x6d/0x230
[ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10
[ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10
[ 157.110697] drm_gem_shmem_vmap+0x74/0x710
[ 157.114866] drm_gem_vmap+0xa9/0x1b0
[ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0
[ 157.123086] drm_gem_fb_vmap+0xab/0x300
[ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10
[ 157.133032] ? lockdep_init_map_type+0x19d/0x880
[ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0
[ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180
[ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40
[...]
[ 157.346424] ---[ end trace 0000000000000000 ]---
Acquiring GEM handles for the framebuffer's GEM buffer objects prevents
this from happening. The framebuffer's cleanup later puts the handle
references.
Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object
instance") triggers the segmentation fault easily by using the dma-buf
field more widely. The underlying issue with reference counting has
been present before.
v2:
- acquire the handle instead of the BO (Christian)
- fix comment style (Christian)
- drop the Fixes tag (Christian)
- rename err_ gotos
- add missing Link tag
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1]
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Anusha Srivatsa <asrivats@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: <stable@vger.kernel.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250630084001.293053-1-tzimmermann@suse.de
|
|
There are two bugs in rose_rt_device_down() that can cause
use-after-free:
1. The loop bound `t->count` is modified within the loop, which can
cause the loop to terminate early and miss some entries.
2. When removing an entry from the neighbour array, the subsequent entries
are moved up to fill the gap, but the loop index `i` is still
incremented, causing the next entry to be skipped.
For example, if a node has three neighbours (A, A, B) with count=3 and A
is being removed, the second A is not checked.
i=0: (A, A, B) -> (A, B) with count=2
^ checked
i=1: (A, B) -> (A, B) with count=2
^ checked (B, not A!)
i=2: (doesn't occur because i < count is false)
This leaves the second A in the array with count=2, but the rose_neigh
structure has been freed. Code that accesses these entries assumes that
the first `count` entries are valid pointers, causing a use-after-free
when it accesses the dangling pointer.
Fix both issues by iterating over the array in reverse order with a fixed
loop bound. This ensures that all entries are examined and that the removal
of an entry doesn't affect subsequent iterations.
Reported-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e04e2c007ba2c80476cb
Tested-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250629030833.6680-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The comparison in enic_change_mtu() incorrectly used the current
netdev->mtu instead of the new new_mtu value when warning about
an MTU exceeding the port MTU. This could suppress valid warnings
or issue incorrect ones.
Fix the condition and log to properly reflect the new_mtu.
Fixes: ab123fe071c9 ("enic: handle mtu change for vf properly")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: John Daley <johndale@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250628145612.476096-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.16-2025-07-01:
amdgpu:
- SDMA 5.x reset fix
- Add missing firmware declaration
- Fix leak in amdgpu_ctx_mgr_entity_fini()
- Freesync fix
- OLED backlight fix
amdkfd:
- mtype fix for ext coherent system memory
- MMU notifier fix
- gfx7/8 fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250701192642.32490-1-alexander.deucher@amd.com
|
|
Update the Clause 37 Auto-Negotiation implementation to properly align
with the PCS hardware specifications:
- Fix incorrect bit settings in Link Status and Link Duplex fields
- Implement missing sequence steps 2 and 7
These changes ensure CL37 auto-negotiation protocol follows the exact
sequence patterns as specified in the hardware databook.
Fixes: 1bf40ada6290 ("amd-xgbe: Add support for clause 37 auto-negotiation")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250630192636.3838291-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Smatch complains that the error message isn't set in the caller:
lib/test_objagg.c:923 test_hints_case2()
error: uninitialized symbol 'errmsg'.
This static checker warning only showed up after a recent refactoring
but the bug dates back to when the code was originally added. This
likely doesn't affect anything in real life.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202506281403.DsuyHFTZ-lkp@intel.com/
Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/8548f423-2e3b-4bb7-b816-5041de2762aa@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an option for completely disabling casefolding on a filesystem, as a
workaround for overlayfs.
This should only be needed as a temporary workaround, until the
overlayfs fix arrives.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Don't mark btree nodes for rewrites, if they are or would be degraded,
if journal replay hasn't finished, to avoid a deadlock.
This is because btree node rewrites generate more updates for the
interior updates (alloc, backpointers), and if those updates touch
new nodes and generate more rewrites - we can only have so many interior
btree updates in flight before we deadlock on open_buckets.
The biggest cause is that we don't use the btree write buffer (for
the backpointer updates - this needs some real thought on locking in
order to fix.
The problem with this workaround (not doing the rewrite for degraded
nodes in journal replay) is that those degraded nodes persist, and we
don't want that (this is a real bug when a btree node write completes
with fewer replicas than we wanted and leaves a degraded node due to
device _removal_, i.e. the device went away mid write).
It's less of a bug here, but still a problem because we don't yet
have a way of tracking degraded data - we another index (all
extents/btree nodes, by replicas entry) in order to fix properly
(re-replicate degraded data at the earliest possible time).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
A small race exists between spsc_queue_push and the run-job worker, in
which spsc_queue_push may return not-first while the run-job worker has
already idled due to the job count being zero. If this race occurs, job
scheduling stops, leading to hangs while waiting on the job’s DMA
fences.
Seal this race by incrementing the job count before appending to the
SPSC queue.
This race was observed on a drm-tip 6.16-rc1 build with the Xe driver in
an SVM test case.
Fixes: 1b1f42d8fde4 ("drm: move amd_gpu_scheduler into common location")
Fixes: 27105db6c63a ("drm/amdgpu: Add SPSC queue to scheduler.")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250613212013.719312-1-matthew.brost@intel.com
|
|
Fix Kconfig symbol dependency on KUNIT, which isn't actually required
for XE to be built-in. However, if KUNIT is enabled, it must be built-in
too.
Fixes: 08987a8b6820 ("drm/xe: Fix build with KUNIT=m")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Harry Austen <hpausten@protonmail.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-2-756fe5cd56cf@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit a559434880b320b83733d739733250815aecf1b0)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
The xe driver is the official driver for Intel Xe2 and later, while
maintaining experimental support for earlier GPUs. Reword the help
message accordingly.
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://lore.kernel.org/r/20250611-xe-kconfig-help-v1-1-8bcc6b47d11a@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 1488a3089de3d0bcdc9532da7ce04cf0af9d7dd0)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Limit GT max frequency to 2600MHz and wait for frequency to reduce
before proceeding with a transient flush. This is really only needed for
the transient flush: if L2 flush is needed due to 16023588340 then
there's no need to do this additional wait since we are already using
the bigger hammer.
v2: Use generic names, ensure user set max frequency requests wait
for flush to complete (Rodrigo)
v3:
- User requests wait via wait_var_event_timeout (Lucas)
- Close races on flush + user requests (Lucas)
- Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt
rather than root gt (Lucas)
v4:
- Only apply the freq reducing part if a TDF is needed: L2 flush trumps
the need for waiting a lower frequency
Fixes: aaa08078e725 ("drm/xe/bmg: Apply Wa_22019338487")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit deea6a7d6d803d6bb874a3e6f1b312e560e6c6df)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Set GT min frequency to 1200Mhz once driver load is complete.
v2: Review comments (Rodrigo)
v3: Apply Wa earlier so user_req_min is not clobbered.
v4: Apply to all GTs (Lucas)
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-3-94ba5dcc1e30@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit bdde16c9ac5cb56ad2ee19792222fa1853577af7)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
xe_device_td_flush() has 2 possible implementations: an entire L2 flush
or a transient flush, depending on WA 16023588340. Make this clear by
splitting the function so it calls each of them.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 5e300ed8a545bdffc26b579c526b5fef7b2d5365)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
pc_set_mert_freq_cap() currently lock()/unlock() the mutex multiple times
to stash the current frequencies. It's not a problem since
xe_guc_pc_restore_stashed_freq() is guaranteed to be called only later
in the init sequence. However, now that we have _locked() variants for
this functions, use them and avoid potential issues when called from
other places or using the same pattern.
While at it, prefer and early return for the WA check to reduce
indentation.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-2-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit d878c97daa603573e5af01fd8beec2fffdb42ad1)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
There are places in which the getters/setters are called one after the
other causing a multiple lock()/unlock(). These are not currently a
problem since they are all happening from the same thread, but there's a
race possibility as calls are added outside of the early init when the
max/min and stashed values need to be correlated.
Add the _locked() variants to prepare for that.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-1-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 1beae9aa2b88d3a02eb666e7b777eb2d7bc645f4)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fix from Lee Jones:
- Fix some -Werror=unused-variable build errors
* tag 'mfd-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: Fix building without CONFIG_OF
|
|
No idea why, but without this GuC context switches randomly fail when
running IGTs in a loop. Need to follow up why this fixes the
aforementioned issue but can live with a stable driver for now.
Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250612031925.4009701-1-matthew.brost@intel.com
(cherry picked from commit 3a1edef8f4b58b0ba826bc68bf4bce4bdf59ecf3)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Pull NFS client fixes from Anna Schumaker:
- Fix loop in GSS sequence number cache
- Clean up /proc/net/rpc/nfs if nfs_fs_proc_net_init() fails
- Fix a race to wake on NFS_LAYOUT_DRAIN
- Fix handling of NFS level errors in I/O
* tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4/flexfiles: Fix handling of NFS level errors in I/O
NFSv4/pNFS: Fix a race to wake on NFS_LAYOUT_DRAIN
nfs: Clean up /proc/net/rpc/nfs when nfs_fs_proc_net_init() fails.
sunrpc: fix loop in gss seqno cache
|
|
David Howells <dhowells@redhat.com> says:
Here are some miscellaneous fixes and changes for netfslib and cifs, if you
could consider pulling them.
Many of these were found because a bug in Samba was causing smbd to crash
and restart after about 1-2s and this was vigorously and abruptly
exercising the netfslib retry paths.
Subsequent testing of the cifs RDMA support showed up some more bugs, but
the fixes for those went via the cifs tree and have been removed from this set
as they're now upstream.
First, there are some netfs fixes:
(1) Fix a hang due to missing case in final DIO read result collection
not breaking out of a loop if the request finished, but there were no
subrequests being processed and NETFS_RREQ_ALL_QUEUED wasn't yet set.
(2) Fix a double put of the netfs_io_request struct if completion happened
in the pause loop.
(3) Provide some helpers to abstract out NETFS_RREQ_IN_PROGRESS flag
wrangling.
(4) Fix infinite looping in netfs_wait_for_pause/request() which wa caused
by a loop waiting for NETFS_RREQ_ALL_QUEUED to get set - but which
wouldn't get set until the looping function returned. This uses patch
(3) above.
(5) Fix a ref leak on an extra subrequest inserted into a request's list
of subreqs because more subreq records were needed for retrying than
were needed for the original request (say, for instance, that the
amount of cifs credit available was reduced and, subsequently, the ops
had to be smaller).
Then a bunch of cifs fixes, some of which are from other people:
(6-8) cifs: Fix various RPC callbacks to set NETFS_SREQ_NEED_RETRY if a
subrequest fails retriably.
(10) Fix a warning in the workqueue code when reconnecting a channel.
Followed by some patches to deal with i_size handling:
(11) Fix the updating of i_size to use a lock to avoid a race between
testing if we should have extended the file with a DIO write and
changing i_size.
(12) A follow-up patch to (11) to merge the places in netfslib that update
i_size on write.
And finally a couple of patches to improve tracing output, but that should
otherwise not affect functionality:
(13) Renumber the NETFS_RREQ_* flags to make the hex values easier to
interpret by eye, including moving the main status flags down to the
lowest bits, with IN_PROGRESS in bit 0.
(14) Update the tracepoints in a number of ways, including adding more
tracepoints into the cifs read/write RPC callback so that differend
MID_RESPONSE_* values can be differentiated.
* patches from https://lore.kernel.org/20250701163852.2171681-1-dhowells@redhat.com:
netfs: Update tracepoints in a number of ways
netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read
netfs: Merge i_size update functions
netfs: Fix i_size updating
smb: client: set missing retry flag in cifs_writev_callback()
smb: client: set missing retry flag in cifs_readv_callback()
smb: client: set missing retry flag in smb2_writev_callback()
netfs: Fix ref leak on inserted extra subreq in write retry
netfs: Fix looping in wait functions
netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling
netfs: Fix double put of request
netfs: Fix hang due to missing case in final DIO read result collection
Link: https://lore.kernel.org/20250701163852.2171681-1-dhowells@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make a number of updates to the netfs tracepoints:
(1) Remove a duplicate trace from netfs_unbuffered_write_iter_locked().
(2) Move the trace in netfs_wake_rreq_flag() to after the flag is cleared
so that the change appears in the trace.
(3) Differentiate the use of netfs_rreq_trace_wait/woke_queue symbols.
(4) Don't do so many trace emissions in the wait functions as some of them
are redundant.
(5) In netfs_collect_read_results(), differentiate a subreq that's being
abandoned vs one that has been consumed in a regular way.
(6) Add a tracepoint to indicate the call to ->ki_complete().
(7) Don't double-increment the subreq_counter when retrying a write.
(8) Move the netfs_sreq_trace_io_progress tracepoint within cifs code to
just MID_RESPONSE_RECEIVED and add different tracepoints for other MID
states and note check failure.
Signed-off-by: David Howells <dhowells@redhat.com>
Co-developed-by: Paulo Alcantara <pc@manguebit.org>
Signed-off-by: Paulo Alcantara <pc@manguebit.org>
Link: https://lore.kernel.org/20250701163852.2171681-14-dhowells@redhat.com
cc: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Renumber the NETFS_RREQ_* flags to put the most useful status bits in the
bottom nibble - and therefore the last hex digit in the trace output -
making it easier to grasp the state at a glance.
In particular, put the IN_PROGRESS flag in bit 0 and ALL_QUEUED at bit 1.
Also make the flags field in /proc/fs/netfs/requests larger to accommodate
all the flags.
Also make the flags field in the netfs_sreq tracepoint larger to
accommodate all the NETFS_SREQ_* flags.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-13-dhowells@redhat.com
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Netfslib has two functions for updating the i_size after a write: one for
buffered writes into the pagecache and one for direct/unbuffered writes.
However, what needs to be done is much the same in both cases, so merge
them together.
This does raise one question, though: should updating the i_size after a
direct write do the same estimated update of i_blocks as is done for
buffered writes.
Also get rid of the cleanup function pointer from netfs_io_request as it's
only used for direct write to update i_size; instead do the i_size setting
directly from write collection.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-12-dhowells@redhat.com
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix the updating of i_size, particularly in regard to the completion of DIO
writes and especially async DIO writes by using a lock.
The bug is triggered occasionally by the generic/207 xfstest as it chucks a
bunch of AIO DIO writes at the filesystem and then checks that fstat()
returns a reasonable st_size as each completes.
The problem is that netfs is trying to do "if new_size > inode->i_size,
update inode->i_size" sort of thing but without a lock around it.
This can be seen with cifs, but shouldn't be seen with kafs because kafs
serialises modification ops on the client whereas cifs sends the requests
to the server as they're generated and lets the server order them.
Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-11-dhowells@redhat.com
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs
to be retried.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-9-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: netfs@lists.linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs
to be retried.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-8-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: netfs@lists.linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs
to be retried.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-7-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: netfs@lists.linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The write-retry algorithm will insert extra subrequests into the list if it
can't get sufficient capacity to split the range that needs to be retried
into the sequence of subrequests it currently has (for instance, if the
cifs credit pool has fewer credits available than it did when the range was
originally divided).
However, the allocator furnishes each new subreq with 2 refs and then
another is added for resubmission, causing one to be leaked.
Fix this by replacing the ref-getting line with a neutral trace line.
Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-6-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
netfs_wait_for_request() and netfs_wait_for_pause() can loop forever if
netfs_collect_in_app() returns 2, indicating that it wants to repeat
because the ALL_QUEUED flag isn't yet set and there are no subreqs left
that haven't been collected.
The problem is that, unless collection is offloaded (OFFLOAD_COLLECTION),
we have to return to the application thread to continue and eventually set
ALL_QUEUED after pausing to deal with a retry - but we never get there.
Fix this by inserting checks for the IN_PROGRESS and PAUSE flags as
appropriate before cycling round - and add cond_resched() for good measure.
Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-5-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Provide helpers to clear and test the NETFS_RREQ_IN_PROGRESS and to insert
the appropriate barrierage.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-4-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If a netfs request finishes during the pause loop, it will have the ref
that belongs to the IN_PROGRESS flag removed at that point - however, if it
then goes to the final wait loop, that will *also* put the ref because it
sees that the IN_PROGRESS flag is clear and incorrectly assumes that this
happened when it called the collector.
In fact, since IN_PROGRESS is clear, we shouldn't call the collector again
since it's done all the cleanup, such as calling ->ki_complete().
Fix this by making netfs_collect_in_app() just return, indicating that
we're done if IN_PROGRESS is removed.
Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-3-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When doing a DIO read, if the subrequests we issue fail and cause the
request PAUSE flag to be set to put a pause on subrequest generation, we
may complete collection of the subrequests (possibly discarding them) prior
to the ALL_QUEUED flags being set.
In such a case, netfs_read_collection() doesn't see ALL_QUEUED being set
after netfs_collect_read_results() returns and will just return to the app
(the collector can be seen unpausing the generator in the trace log).
The subrequest generator can then set ALL_QUEUED and the app thread reaches
netfs_wait_for_request(). This causes netfs_collect_in_app() to be called
to see if we're done yet, but there's missing case here.
netfs_collect_in_app() will see that a thread is active and set inactive to
false, but won't see any subrequests in the read stream, and so won't set
need_collect to true. The function will then just return 0, indicating
that the caller should just sleep until further activity (which won't be
forthcoming) occurs.
Fix this by making netfs_collect_in_app() check to see if an active thread
is complete - i.e. that ALL_QUEUED is set and the subrequests list is empty
- and to skip the sleep return path. The collector will then be called
which will clear the request IN_PROGRESS flag, allowing the app to
progress.
Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
Reported-by: Steve French <sfrench@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250701163852.2171681-2-dhowells@redhat.com
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The ready event list of an epoll object is protected by read-write
semaphore:
- The consumer (waiter) acquires the write lock and takes items.
- the producer (waker) takes the read lock and adds items.
The point of this design is enabling epoll to scale well with large number
of producers, as multiple producers can hold the read lock at the same
time.
Unfortunately, this implementation may cause scheduling priority inversion
problem. Suppose the consumer has higher scheduling priority than the
producer. The consumer needs to acquire the write lock, but may be blocked
by the producer holding the read lock. Since read-write semaphore does not
support priority-boosting for the readers (even with CONFIG_PREEMPT_RT=y),
we have a case of priority inversion: a higher priority consumer is blocked
by a lower priority producer. This problem was reported in [1].
Furthermore, this could also cause stall problem, as described in [2].
To fix this problem, make the event list half-lockless:
- The consumer acquires a mutex (ep->mtx) and takes items.
- The producer locklessly adds items to the list.
Performance is not the main goal of this patch, but as the producer now can
add items without waiting for consumer to release the lock, performance
improvement is observed using the stress test from
https://github.com/rouming/test-tools/blob/master/stress-epoll.c. This is
the same test that justified using read-write semaphore in the past.
Testing using 12 x86_64 CPUs:
Before After Diff
threads events/ms events/ms
8 6932 19753 +185%
16 7820 27923 +257%
32 7648 35164 +360%
64 9677 37780 +290%
128 11166 38174 +242%
Testing using 1 riscv64 CPU (averaged over 10 runs, as the numbers are
noisy):
Before After Diff
threads events/ms events/ms
1 73 129 +77%
2 151 216 +43%
4 216 364 +69%
8 234 382 +63%
16 251 392 +56%
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/ [1]
Reported-by: Valentin Schneider <vschneid@redhat.com>
Closes: https://lore.kernel.org/linux-rt-users/xhsmhttqvnall.mognet@vschneid.remote.csb/ [2]
Signed-off-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/20250527090836.1290532-1-namcao@linutronix.de
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
According to Bspec, bits 0~9 of MI_STORE_DATA_IMM must not exceed 0x3FE.
The macro MI_SDI_NUM_QW(x) evaluates to 2 * x + 1, which means the
condition 2 * x + 1 <= 0x3FE must be satisfied. Therefore, the maximum
valid value for x is 0x1FE, not 0x1FF.
v2
- Replace 0x1fe with macro MAX_PTE_PER_SDI (Auld, Matthew & Patelczyk, Maciej)
v3
- Change macro MAX_PTE_PER_SDI from 0x1fe to 0x1feU (De Marchi, Lucas)
Bspec: 60246
Fixes: 9c44fd5f6e8a ("drm/xe: Add migrate layer functions for SVM support")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Brian3 Nguyen <brian3.nguyen@intel.com>
Cc: Alex Zuo <alex.zuo@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Suggested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Link: https://lore.kernel.org/r/20250612224620.161105-1-jia.yao@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit c038bdba98c9f6a36378044a9d4385531a194d3e)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Introducing support for smbus re-broke i2cdetect, causing it to detect
devices at every i2c address, just as it did prior to being fixed in
commit 49e1f0fd0d4cb ("i2c: microchip-core: fix "ghost" detections").
This was caused by an oversight, where the new smbus code failed to
check the return value of mchp_corei2c_xfer(). Check it, and propagate
any errors.
Fixes: d6ceb40538263 ("i2c: microchip-corei2c: add smbus support")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250630-shopper-proven-500f4075e7d6@spud
|
|
I226 devices advertise support for the PCI-E link L1.2 substate. However,
due to a hardware limitation, the exit latency from this low-power state
is longer than the packet buffer can tolerate under high traffic
conditions. This can lead to packet loss and degraded performance.
To mitigate this, disable the L1.2 substate. The increased power draw
between L1.1 and L1.2 is insignificant.
Fixes: 43546211738e ("igc: Add new device ID's")
Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated
on module load:
[ 324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
[ 324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager
[ 324.701689] preempt_count: 201, expected: 0
[ 324.701693] RCU nest depth: 0, expected: 0
[ 324.701697] 2 locks held by NetworkManager/1582:
[ 324.701702] #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0
[ 324.701730] #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870
[ 324.701749] Preemption disabled at:
[ 324.701752] [<ffffffff9cd23b9d>] __dev_open+0x3dd/0x870
[ 324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary)
[ 324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
[ 324.701774] Call Trace:
[ 324.701777] <TASK>
[ 324.701779] dump_stack_lvl+0x5d/0x80
[ 324.701788] ? __dev_open+0x3dd/0x870
[ 324.701793] __might_resched.cold+0x1ef/0x23d
<..>
[ 324.701818] __mutex_lock+0x113/0x1b80
<..>
[ 324.701917] idpf_ctlq_clean_sq+0xad/0x4b0 [idpf]
[ 324.701935] ? kasan_save_track+0x14/0x30
[ 324.701941] idpf_mb_clean+0x143/0x380 [idpf]
<..>
[ 324.701991] idpf_send_mb_msg+0x111/0x720 [idpf]
[ 324.702009] idpf_vc_xn_exec+0x4cc/0x990 [idpf]
[ 324.702021] ? rcu_is_watching+0x12/0xc0
[ 324.702035] idpf_add_del_mac_filters+0x3ed/0xb50 [idpf]
<..>
[ 324.702122] __hw_addr_sync_dev+0x1cf/0x300
[ 324.702126] ? find_held_lock+0x32/0x90
[ 324.702134] idpf_set_rx_mode+0x317/0x390 [idpf]
[ 324.702152] __dev_open+0x3f8/0x870
[ 324.702159] ? __pfx___dev_open+0x10/0x10
[ 324.702174] __dev_change_flags+0x443/0x650
<..>
[ 324.702208] netif_change_flags+0x80/0x160
[ 324.702218] do_setlink.isra.0+0x16a0/0x3960
<..>
[ 324.702349] rtnl_newlink+0x12fd/0x21e0
The sequence is as follows:
rtnl_newlink()->
__dev_change_flags()->
__dev_open()->
dev_set_rx_mode() - > # disables BH and grabs "dev->addr_list_lock"
idpf_set_rx_mode() -> # proceed only if VIRTCHNL2_CAP_MACFILTER is ON
__dev_uc_sync() ->
idpf_add_mac_filter ->
idpf_add_del_mac_filters ->
idpf_send_mb_msg() ->
idpf_mb_clean() ->
idpf_ctlq_clean_sq() # mutex_lock(cq_lock)
Fix by converting cq_lock to a spinlock. All operations under the new
lock are safe except freeing the DMA memory, which may use vunmap(). Fix
by requesting a contiguous physical memory for the DMA mapping.
Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Returning -EOPNOTSUPP from function returning u32 is leading to
cast and invalid size value as a result.
-EOPNOTSUPP as a size probably will lead to allocation fail.
Command: ethtool -x eth0
It is visible on all devices that don't have RSS caps set.
[ 136.615917] Call Trace:
[ 136.615921] <TASK>
[ 136.615927] ? __warn+0x89/0x130
[ 136.615942] ? __alloc_frozen_pages_noprof+0x322/0x330
[ 136.615953] ? report_bug+0x164/0x190
[ 136.615968] ? handle_bug+0x58/0x90
[ 136.615979] ? exc_invalid_op+0x17/0x70
[ 136.615987] ? asm_exc_invalid_op+0x1a/0x20
[ 136.616001] ? rss_prepare_get.constprop.0+0xb9/0x170
[ 136.616016] ? __alloc_frozen_pages_noprof+0x322/0x330
[ 136.616028] __alloc_pages_noprof+0xe/0x20
[ 136.616038] ___kmalloc_large_node+0x80/0x110
[ 136.616072] __kmalloc_large_node_noprof+0x1d/0xa0
[ 136.616081] __kmalloc_noprof+0x32c/0x4c0
[ 136.616098] ? rss_prepare_get.constprop.0+0xb9/0x170
[ 136.616105] rss_prepare_get.constprop.0+0xb9/0x170
[ 136.616114] ethnl_default_doit+0x107/0x3d0
[ 136.616131] genl_family_rcv_msg_doit+0x100/0x160
[ 136.616147] genl_rcv_msg+0x1b8/0x2c0
[ 136.616156] ? __pfx_ethnl_default_doit+0x10/0x10
[ 136.616168] ? __pfx_genl_rcv_msg+0x10/0x10
[ 136.616176] netlink_rcv_skb+0x58/0x110
[ 136.616186] genl_rcv+0x28/0x40
[ 136.616195] netlink_unicast+0x19b/0x290
[ 136.616206] netlink_sendmsg+0x222/0x490
[ 136.616215] __sys_sendto+0x1fd/0x210
[ 136.616233] __x64_sys_sendto+0x24/0x30
[ 136.616242] do_syscall_64+0x82/0x160
[ 136.616252] ? __sys_recvmsg+0x83/0xe0
[ 136.616265] ? syscall_exit_to_user_mode+0x10/0x210
[ 136.616275] ? do_syscall_64+0x8e/0x160
[ 136.616282] ? __count_memcg_events+0xa1/0x130
[ 136.616295] ? count_memcg_events.constprop.0+0x1a/0x30
[ 136.616306] ? handle_mm_fault+0xae/0x2d0
[ 136.616319] ? do_user_addr_fault+0x379/0x670
[ 136.616328] ? clear_bhb_loop+0x45/0xa0
[ 136.616340] ? clear_bhb_loop+0x45/0xa0
[ 136.616349] ? clear_bhb_loop+0x45/0xa0
[ 136.616359] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 136.616369] RIP: 0033:0x7fd30ba7b047
[ 136.616376] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d bd d5 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 71 c3 55 48 83 ec 30 44 89 4c 24 2c 4c 89 44
[ 136.616381] RSP: 002b:00007ffde1796d68 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 136.616388] RAX: ffffffffffffffda RBX: 000055d7bd89f2a0 RCX: 00007fd30ba7b047
[ 136.616392] RDX: 0000000000000028 RSI: 000055d7bd89f3b0 RDI: 0000000000000003
[ 136.616396] RBP: 00007ffde1796e10 R08: 00007fd30bb4e200 R09: 000000000000000c
[ 136.616399] R10: 0000000000000000 R11: 0000000000000202 R12: 000055d7bd89f340
[ 136.616403] R13: 000055d7bd89f3b0 R14: 000055d78943f200 R15: 0000000000000000
Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks")
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
__xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
memory if necessary.
Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
it still should be held before xa_unlock(), prevent returned page to be
freed by concurrent discard.
Fixes: bbcacab2e8ee ("brd: avoid extra xarray lookups on first write")
Reported-by: syzbot+ea4c8fd177a47338881a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/685ec4c9.a00a0220.129264.000c.GAE@google.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630112828.421219-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 524346e9d79f ("ublk: build batch from IOs in same io_ring_ctx and io task")
need to dereference `io->cmd` for checking if the IO can be added to current
batch, see ublk_belong_to_same_batch() and io_uring_cmd_ctx_handle(). However,
`io->cmd` may become invalid after the uring_cmd is canceled.
Fixes it by only allowing to queue this IO in case that ublk_prep_req()
returns `BLK_STS_OK`, when 'io->cmd' is guaranteed to be valid.
Reported-by: Changhui Zhong <czhong@redhat.com>
Fixes: 524346e9d79f ("ublk: build batch from IOs in same io_ring_ctx and io task")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250701072325.1458109-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove incorrect checks on cqspi->rx_chan that cause driver breakage
during failure cleanup. Ensure proper resource freeing on the success
path when operating in cqspi->use_direct_mode, preventing leaks and
improving stability.
Signed-off-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/89765a2b94f047ded4f14babaefb7ef92ba07cb2.1751274389.git.khairul.anuar.romli@altera.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Chris Mason reported a performance regression on big iron. Reports of
this kind were usually reported as part of a micro benchmark but Chris'
test did mimic his real workload. This makes it a real regression.
The root cause is rcuref_get() which is invoked during each futex
operation. If all threads of an application do this simultaneously then
it leads to cache line bouncing and the performance drops.
Disable FUTEX_PRIVATE_HASH entirely for this cycle. The performance
regression will be addressed in the following cycle enabling the option
again.
Closes: https://lore.kernel.org/all/3ad05298-351e-4d61-9972-ca45a0a50e33@meta.com/
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250630145034.8JnINEaS@linutronix.de
|
|
Trying to compile an x86 kernel on big endian results in this error:
net/ipv4/netfilter/iptable_nat.o: warning: objtool: iptable_nat_table_init+0x150: Unknown annotation type: 50331648
make[5]: *** [scripts/Makefile.build:287: net/ipv4/netfilter/iptable_nat.o] Error 255
Reason is a missing endian conversion in read_annotate().
Add the missing conversion to fix this.
Fixes: 2116b349e29a ("objtool: Generic annotation infrastructure")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250630131230.4130185-1-hca@linux.ibm.com
|
|
On Mon, Jun 02, 2025 at 03:22:13PM +0800, Kuyo Chang wrote:
> So, the potential race scenario is:
>
> CPU0 CPU1
> // doing migrate_swap(cpu0/cpu1)
> stop_two_cpus()
> ...
> // doing _cpu_down()
> sched_cpu_deactivate()
> set_cpu_active(cpu, false);
> balance_push_set(cpu, true);
> cpu_stop_queue_two_works
> __cpu_stop_queue_work(stopper1,...);
> __cpu_stop_queue_work(stopper2,..);
> stop_cpus_in_progress -> true
> preempt_enable();
> ...
> 1st balance_push
> stop_one_cpu_nowait
> cpu_stop_queue_work
> __cpu_stop_queue_work
> list_add_tail -> 1st add push_work
> wake_up_q(&wakeq); -> "wakeq is empty.
> This implies that the stopper is at wakeq@migrate_swap."
> preempt_disable
> wake_up_q(&wakeq);
> wake_up_process // wakeup migrate/0
> try_to_wake_up
> ttwu_queue
> ttwu_queue_cond ->meet below case
> if (cpu == smp_processor_id())
> return false;
> ttwu_do_activate
> //migrate/0 wakeup done
> wake_up_process // wakeup migrate/1
> try_to_wake_up
> ttwu_queue
> ttwu_queue_cond
> ttwu_queue_wakelist
> __ttwu_queue_wakelist
> __smp_call_single_queue
> preempt_enable();
>
> 2nd balance_push
> stop_one_cpu_nowait
> cpu_stop_queue_work
> __cpu_stop_queue_work
> list_add_tail -> 2nd add push_work, so the double list add is detected
> ...
> ...
> cpu1 get ipi, do sched_ttwu_pending, wakeup migrate/1
>
So this balance_push() is part of schedule(), and schedule() is supposed
to switch to stopper task, but because of this race condition, stopper
task is stuck in WAKING state and not actually visible to be picked.
Therefore CPU1 can do another schedule() and end up doing another
balance_push() even though the last one hasn't been done yet.
This is a confluence of fail, where both wake_q and ttwu_wakelist can
cause crucial wakeups to be delayed, resulting in the malfunction of
balance_push.
Since there is only a single stopper thread to be woken, the wake_q
doesn't really add anything here, and can be removed in favour of
direct wakeups of the stopper thread.
Then add a clause to ttwu_queue_cond() to ensure the stopper threads
are never queued / delayed.
Of all 3 moving parts, the last addition was the balance_push()
machinery, so pick that as the point the bug was introduced.
Fixes: 2558aacff858 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
Reported-by: Kuyo Chang <kuyo.chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kuyo Chang <kuyo.chang@mediatek.com>
Link: https://lkml.kernel.org/r/20250605100009.GO39944@noisy.programming.kicks-ass.net
|