linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-06-12	fs: unlock the superblock during iterate_supers_type	Darrick J. Wong
	This function takes super_lock in shared mode, so it should release the same lock. Cc: stable@vger.kernel.org # v6.16-rc1 Fixes: af7551cf13cf7f ("super: remove pointless s_root checks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Link: https://lore.kernel.org/20250611164044.GF6138@frogsfrogsfrogs Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-12	ovl: fix debug print in case of mkdir error	Amir Goldstein
	We want to print the name in case of mkdir failure and now we will get a cryptic (efault) as name. Fixes: c54b386969a5 ("VFS: Change vfs_mkdir() to return the dentry.") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250612072245.2825938-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-12	coredump: cleanup coredump socket functions	Christian Brauner
	We currently use multiple CONFIG_UNIX guards. This looks messy and makes the code harder to follow and maintain. Use a helper function coredump_sock_connect() that handles the connect portion. This allows us to remove the CONFIG_UNIX guard in the main do_coredump() function. Link: https://lore.kernel.org/20250605-schlamm-touren-720ba2b60a85@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-12	coredump: allow for flexible coredump handling	Christian Brauner
	Extend the coredump socket to allow the coredump server to tell the kernel how to process individual coredumps. When the crashing task connects to the coredump socket the kernel will send a struct coredump_req to the coredump server. The kernel will set the size member of struct coredump_req allowing the coredump server how much data can be read. The coredump server uses MSG_PEEK to peek the size of struct coredump_req. If the kernel uses a newer struct coredump_req the coredump server just reads the size it knows and discard any remaining bytes in the buffer. If the kernel uses an older struct coredump_req the coredump server just reads the size the kernel knows. The returned struct coredump_req will inform the coredump server what features the kernel supports. The coredump_req->mask member is set to the currently know features. The coredump server may only use features whose bits were raised by the kernel in coredump_req->mask. In response to a coredump_req from the kernel the coredump server sends a struct coredump_ack to the kernel. The kernel informs the coredump server what version of struct coredump_ack it supports by setting struct coredump_req->size_ack to the size it knows about. The coredump server may only send as many bytes as coredump_req->size_ack indicates (a smaller size is fine of course). The coredump server must set coredump_ack->size accordingly. The coredump server sets the features it wants to use in struct coredump_ack->mask. Only bits returned in struct coredump_req->mask may be used. In case an invalid struct coredump_ack is sent to the kernel a non-zero u32 integer is sent indicating the reason for the failure. If it was successful a zero u32 integer is sent. In the initial version the following features are supported in coredump_{req,ack}->mask: * COREDUMP_KERNEL The kernel will write the coredump data to the socket. * COREDUMP_USERSPACE The kernel will not write coredump data but will indicate to the parent that a coredump has been generated. This is used when userspace generates its own coredumps. * COREDUMP_REJECT The kernel will skip generating a coredump for this task. * COREDUMP_WAIT The kernel will prevent the task from exiting until the coredump server has shutdown the socket connection. The flexible coredump socket can be enabled by using the "@@" prefix instead of the single "@" prefix for the regular coredump socket: @@/run/systemd/coredump.socket will enable flexible coredump handling. Current kernels already enforce that "@" must be followed by "/" and will reject anything else. So extending this is backward and forward compatible. Link: https://lore.kernel.org/20250603-work-coredump-socket-protocol-v2-1-05a5f0c18ecc@kernel.org Acked-by: Lennart Poettering <lennart@poettering.net> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-11	bcachefs: Don't trace should_be_locked unless changing	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Ensure that snapshot creation propagates has_case_insensitive	Kent Overstreet
	We normally can't create a new directory with the case-insensitive option already set - except when we're creating a snapshot. And if casefolding is enabled filesystem wide, we should still set it even though not strictly required, for consistency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Print devices we're mounting on multi device filesystems	Kent Overstreet
	Previously, we only ever logged the filesystem UUID. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Don't trust sb->nr_devices in members_to_text()	Kent Overstreet
	We have to be able to print superblock sections even if they fail to validate (for debugging), so we have to calculate the number of entries from the field size. Reported-by: syzbot+5138f00559ffb3cb3610@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Fix version checks in validate_bset()	Kent Overstreet
	It seems btree node scan picked up a partially overwritten btree node, and corrected the "bset version older than sb version_min" error - resulting in an invalid superblock with a bad version_min field. Don't run this check at all when we're in btree node scan, and when we do run it, do something saner if the bset version is totally crazy. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: ioctl: avoid stack overflow warning	Arnd Bergmann
	Multiple ioctl handlers individually use a lot of stack space, and clang chooses to inline them into the bch2_fs_ioctl() function, blowing through the warning limit: fs/bcachefs/chardev.c:655:6: error: stack frame size (1032) exceeds limit (1024) in 'bch2_fs_ioctl' [-Werror,-Wframe-larger-than] 655 \| long bch2_fs_ioctl(struct bch_fs c, unsigned cmd, void __user arg) By marking the largest two of them as noinline_for_stack, no indidual code path ends up using this much, which avoids the warning and reduces the possible total stack usage in the ioctl handler. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Don't pass trans to fsck_err() in gc_accounting_done	Kent Overstreet
	fsck_err() can return a transaction restart if passed a transaction object - this has always been true when it has to drop locks to prompt for user input, but we're seeing this more now that we're logging the error being corrected in the journal. gc_accounting_done() doesn't call fsck_err() from an actual commit loop, and it doesn't need to be holding btree locks when it calls fsck_err(), so the easy fix here for the unhandled transaction restart is to just not pass it the transaction object. We'll miss out on the fancy new logging, but that's ok. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Fix leak in bch2_fs_recovery() error path	Kent Overstreet
	Fix a small leak of the superblock 'clean' section. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Fix rcu_pending for PREEMPT_RT	Kent Overstreet
	PREEMPT_RT redefines how standard spinlocks work, so local_irq_save() + spin_lock() is no longer equivalent to spin_lock_irqsave(). Fortunately, we don't strictly need to do it that way. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Fix downgrade_table_extra()	Kent Overstreet
	Fix a UAF: we were calling darray_make_room() and retaining a pointer to the old buffer. And fix an UBSAN warning: struct bch_sb_field_downgrade_entry uses __counted_by, so set dst->nr_errors before assigning to the array entry. Reported-by: syzbot+14c52d86ddbd89bea13e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Don't put rhashtable on stack	Kent Overstreet
	Object debugging generally needs special provisions for putting said objects on the stack, which rhashtable does not have. Reported-by: syzbot+bcc38a9556d0324c2ec2@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Make sure opts.read_only gets propagated back to VFS	Kent Overstreet
	If we think we're read-only but the VFS doesn't, fun will ensue. And now that we know we have to be able to do this safely, just make nochanges imply ro. Reported-by: syzbot+a7d6ceaba099cc21dee4@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Fix possible console lock involved deadlock	Alan Huang
	Link: https://lore.kernel.org/all/6822ab02.050a0220.f2294.00cb.GAE@google.com/T/ Reported-by: syzbot+2c3ef91c9523c3d1a25c@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: mark more errors autofix	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Don't persistently run scan_for_btree_nodes	Kent Overstreet
	bch2_btree_lost_data() gets called on btree node read error, but the error might be transient. btree_node_scan is expensive, and there's no need to run it persistently (marking it in the superblock as required to run) - check_topology will run it if required, via bch2_get_scanned_nodes(). Running it non-persistently is fine, to avoid check_topology having to rewind recovery to run it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Read error message now prints if self healing	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Only run 'increase_depth' for keys from btree node csan	Kent Overstreet
	bch2_btree_increase_depth() was originally for disaster recovery, to get some data back from the journal when a btree root was bad. We don't need it for that purpose anymore; on bad btree root we'll launch btree node scan and reconstruct all the interior nodes. If there's a key in the journal for a depth that doesn't exists, and it's not from check_topology/btree node scan, we should just ignore it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Mark need_discard_freespace_key_bad autofix	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Update /dev/disk/by-uuid on device add	Kent Overstreet
	Invalidate pagecache after we write the new superblock and send a uevent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Add more flags to btree nodes for rewrite reason	Kent Overstreet
	It seems excessive forced btree node rewrites can cause interior btree updates to become wedged during recovery, before we're using the write buffer for backpointer updates. Add more flags so we can determine where these are coming from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Add range being updated to btree_update_to_text()	Kent Overstreet
	We had a deadlock during recovery where interior btree updates became wedged and all open_buckets were consumed; start adding more introspection. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Log fsck errors in the journal	Kent Overstreet
	Log the specific error being corrected in the journal when we're repairing, this helps greatly with 'bcachefs list_journal' analysis. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	bcachefs: Add missing restart handling to check_topology()	Kent Overstreet
	The next patch will add logging of the specific error being corrected in repair paths to the journal; this means __bch2_fsck_err() can return transaction restarts in places that previously weren't expecting them. check_topology() is old code that doesn't use btree iterators for btree node locking - it'll have to be rewritten in the future to work online. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11	configfs: use DCACHE_DONTCACHE	Al Viro
	Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	debugfs: use DCACHE_DONTCACHE	Al Viro
	Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()	Al Viro
	Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	9p: don't bother with always_delete_dentry	Al Viro
	just set DCACHE_DONTCACHE for "don't cache" mounts... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE	Al Viro
	makes simple_lookup() slightly cheaper there - no need for simple_lookup() to set the flag and we want it on everything on those anyway. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	kill simple_dentry_operations	Al Viro
	No users left and anything that wants it would be better off just setting DCACHE_DONTCACHE in their ->s_d_flags. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	devpts, sunrpc, hostfs: don't bother with ->d_op	Al Viro
	Default ->d_op being simple_dentry_operations is equivalent to leaving it NULL and putting DCACHE_DONTCACHE into ->s_d_flags. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier	Al Viro
	Do that before new dentry is visible anywhere. It does create a new possible state for dentries present in ->d_children/->d_sib - DCACHE_PAR_LOOKUP present, negative, unhashed, not in in-lookup hash chains, refcount positive. Those are going to be skipped by all tree-walkers (both d_walk() callbacks in fs/dcache.c and explicit loops over children/sibling lists elsewhere) and dput() is fine with those. NOTE: dropping the final reference to a "normal" in-lookup dentry (in in-lookup hash) is a bug - somebody must've forgotten to call d_lookup_done() on it and bad things will happen. With those it's OK; if/when we get around to making __dentry_kill() complain about such breakage, remember that predicate to check should not be just d_in_lookup(victim) but rather a combination of that with !hlist_bl_unhashed(&victim->d_u.d_in_lookup_hash). Might be worth considering later... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	make d_set_d_op() static	Al Viro
	Convert the last user (d_alloc_pseudo()) and be done with that. Any out-of-tree filesystem using it should switch to d_splice_alias_ops() or, better yet, check whether it really needs to have ->d_op vary among its dentries. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	simple_lookup(): just set DCACHE_DONTCACHE	Al Viro
	No need to mess with ->d_op at all. Note that ->d_delete that always returns 1 is equivalent to having DCACHE_DONTCACHE in ->d_flags. Later the same thing will be placed into ->s_d_flags of the filesystems where we want that behaviour for all dentries; then the check in simple_lookup() will at least get unlikely() slapped on it. NOTE: there are only two filesystems where * simple_lookup() might be called * default ->d_op is non-NULL * its ->d_delete() doesn't always return 1 If not for those, we could have simple_lookup() just set DCACHE_DONTCACHE without even looking at ->d_op. Filesystems in question are btrfs and tracefs; both have ->d_delete() returning 1 on anything fed to simple_lookup(), so both would be fine with simple_lookup() setting DCACHE_DONTCACHE regardless of ->d_op. IOW, we might want to drop the check for ->d_op in simple_lookup(); it's definitely a separate story, though. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	tracefs: Add d_delete to remove negative dentries	Steven Rostedt
	If a lookup in tracefs is done on a file that does not exist, it leaves a dentry hanging around until memory pressure removes it. But eventfs dentries should hang around as when their ref count goes to zero, it requires more work to recreate it. For the rest of the tracefs dentries, they hang around as their dentry is used as a descriptor for the tracing system. But if a file lookup happens for a file in tracefs that does not exist, it should be deleted. Add a .d_delete callback that checks if dentry->fsdata is set or not. Only eventfs dentries set fsdata so if it has content it should not be deleted and should hang around in the cache. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	set_default_d_op(): calculate the matching value for ->d_flags	Al Viro
	... and store it in ->s_d_flags, to be used by __d_alloc() Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	correct the set of flags forbidden at d_set_d_op() time	Al Viro
	DCACHE_OP_PRUNE in ->d_flags at the time of d_set_d_op() should've been treated the same as any other DCACHE_OP_... - we forgot to adjust that WARN_ON() when DCACHE_OP_PRUNE had been introduced... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	exportfs: use lookup_one_unlocked()	NeilBrown
	rather than locking the directory and using lookup_one(), just use lookup_one_unlocked(). This keeps locking code centralised. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250608230952.20539-5-neil@brown.name Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-11	coda: use iterate_dir() in coda_readdir()	NeilBrown
	The code in coda_readdir() is nearly identical to iterate_dir(). Differences are: - iterate_dir() is killable - iterate_dir() adds permission checking and accessing notifications I believe these are not harmful for coda so it is best to use iterate_dir() directly. This will allow locking changes without touching the code in coda. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250608230952.20539-4-neil@brown.name Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-11	VFS: merge lookup_one_qstr_excl_raw() back into lookup_one_qstr_excl()	NeilBrown
	The effect of lookup_one_qstr_excl_raw() can be achieved by passing LOOKUP_CREATE() to lookup_one_qstr_excl() - we don't need a separate function. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250608230952.20539-2-neil@brown.name Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-11	VFS: change try_lookup_noperm() to skip revalidation	NeilBrown
	The recent change from using d_hash_and_lookup() to using try_lookup_noperm() inadvertently introduce a d_revalidate() call when the lookup was successful. Steven French reports that this resulted in worse than halving of performance in some cases. Prior to the offending patch the only caller of try_lookup_noperm() was autofs which does not need the d_revalidate(). So it is safe to remove the d_revalidate() call providing we stop using try_lookup_noperm() to implement lookup_noperm(). The "try_" in the name is strongly suggestive that the caller isn't expecting much effort, so it seems reasonable to avoid the effort of d_revalidate(). Fixes: 06c567403ae5 ("Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS") Reported-by: Steve French <smfrench@gmail.com> Link: https://lore.kernel.org/all/CAH2r5mu5SfBrdc2CFHwzft8=n9koPMk+Jzwpy-oUMx-wCRCesQ@mail.gmail.com/ Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/174951744454.608730.18354002683881684261@noble.neil.brown.name Tested-by: Steve French <stfrench@microsoft.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-11	mntns: use stable inode number for initial mount ns	Christian Brauner
	Apart from the network and mount namespace all other namespaces expose a stable inode number and userspace has been relying on that for a very long time now. It's very much heavily used API. Align the mount namespace and use a stable inode number from the reserved procfs inode number space so this is consistent across all namespaces. Link: https://lore.kernel.org/20250606-work-nsfs-v1-3-b8749c9a8844@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-10	split d_flags calculation out of d_set_d_op()	Al Viro
	Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10	new helper: set_default_d_op()	Al Viro
	... to be used instead of manually assigning to ->s_d_op. All in-tree filesystem converted (and field itself is renamed, so any out-of-tree ones in need of conversion will be caught by compiler). Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10	fuse: no need for special dentry_operations for root dentry	Al Viro
	->d_revalidate() is never called for root anyway... Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10	switch procfs from d_set_d_op() to d_splice_alias_ops()	Al Viro
	Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10	new helper: d_splice_alias_ops()	Al Viro
	Uses of d_set_d_op() on live dentry can be very dangerous; it is going to be withdrawn and replaced with saner things. The best way for a filesystem is to have the default dentry_operations set at mount time and be done with that - __d_alloc() will use that. Currently there are two cases when d_set_d_op() is used on a live dentry - one is procfs, which has several genuinely different dentry_operations instances (different ->d_revalidate(), etc.) and another is simple_lookup(), where we would be better off without overriding ->d_op. For procfs we have d_set_d_op() calls followed by d_splice_alias(); provide a new helper (d_splice_alias_ops(inode, dentry, d_ops)) that would combine those two, and do the d_set_d_op() part while under ->d_lock. That eliminates one of the places where ->d_flags had been modified without holding ->d_lock; current behaviour is not racy, but the reasons for that are far too brittle. Better move to uniform locking rules and simpler proof of correctness... The next commit will convert procfs to use of that helper; it is not exported and won't be until somebody comes up with convincing modular user for it. Again, the best approach is to have default ->d_op and let __d_alloc() do the right thing; filesystem _may_ need non-uniform ->d_op (procfs does), but there'd better be good reasons for that. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>