Age | Commit message (Collapse) | Author |
|
If the delayed_root is not empty we are increasing the number of
references to a delayed_node without decreasing it, causing a leak. Fix
by decrementing the delayed_node reference count.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ Remove the changelog from the commit message. ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
To help debugging include the root number in the error message, and since
this is a critical error that implies a metadata inconsistency and results
in a transaction abort change the log message level from "info" to
"critical", which is a much better fit.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Prepare for allowing extended attributes to be set on pidfd inodes by
allowing them to be mutable.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-11-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow for S_IMMUTABLE to be stripped so that we can support xattrs.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-10-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The validation is now completely handled in path_from_stashed().
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-9-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that we stash persistent information in struct pid there's no need
to play volatile games with pinning struct pid via dentries in pidfs.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-8-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We don't need it anymore as persistent information is allocated lazily
and stashed in struct pid.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-7-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We've moved persistent information to struct pid.
So there's no need for these anymore.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-6-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Persist exit and coredump information independent of whether anyone
currently holds a pidfd for the struct pid.
The current scheme allocated pidfs dentries on-demand repeatedly.
This scheme is reaching it's limits as it makes it impossible to pin
information that needs to be available after the task has exited or
coredumped and that should not be lost simply because the pidfd got
closed temporarily. The next opener should still see the stashed
information.
This is also a prerequisite for supporting extended attributes on
pidfds to allow attaching meta information to them.
If someone opens a pidfd for a struct pid a pidfs dentry is allocated
and stashed in pid->stashed. Once the last pidfd for the struct pid is
closed the pidfs dentry is released and removed from pid->stashed.
So if 10 callers create a pidfs dentry for the same struct pid
sequentially, i.e., each closing the pidfd before the other creates a
new one then a new pidfs dentry is allocated every time.
Because multiple tasks acquiring and releasing a pidfd for the same
struct pid can race with each another a task may still find a valid
pidfs entry from the previous task in pid->stashed and reuse it. Or it
might find a dead dentry in there and fail to reuse it and so stashes a
new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry
but only one will succeed, the other ones will put their dentry.
The current scheme aims to ensure that a pidfs dentry for a struct pid
can only be created if the task is still alive or if a pidfs dentry
already existed before the task was reaped and so exit information has
been was stashed in the pidfs inode.
That's great except that it's buggy. If a pidfs dentry is stashed in
pid->stashed after pidfs_exit() but before __unhash_process() is called
we will return a pidfd for a reaped task without exit information being
available.
The pidfds_pid_valid() check does not guard against this race as it
doens't sync at all with pidfs_exit(). The pid_has_task() check might be
successful simply because we're before __unhash_process() but after
pidfs_exit().
Introduce a new scheme where the lifetime of information associated with
a pidfs entry (coredump and exit information) isn't bound to the
lifetime of the pidfs inode but the struct pid itself.
The first time a pidfs dentry is allocated for a struct pid a struct
pidfs_attr will be allocated which will be used to store exit and
coredump information.
If all pidfs for the pidfs dentry are closed the dentry and inode can be
cleaned up but the struct pidfs_attr will stick until the struct pid
itself is freed. This will ensure minimal memory usage while persisting
relevant information.
The new scheme has various advantages. First, it allows to close the
race where we end up handing out a pidfd for a reaped task for which no
exit information is available. Second, it minimizes memory usage.
Third, it allows to remove complex lifetime tracking via dentries when
registering a struct pid with pidfs. There's no need to get or put a
reference. Instead, the lifetime of exit and coredump information
associated with a struct pid is bound to the lifetime of struct pid
itself.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-5-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make it a littler easier to follow.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-3-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
* Add a callback to struct stashed_operations so it's possible to
implement custom behavior for pidfs and allow for it to return errors.
* Teach stashed_dentry_get() to handle error pointers.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-2-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Similar to commit 1ed95281c0c7 ("anon_inode: raise SB_I_NODEV and SB_I_NOEXEC"):
it shouldn't be possible to execute pidfds via
execveat(fd_anon_inode, "", NULL, NULL, AT_EMPTY_PATH)
so raise SB_I_NOEXEC so that no one gets any creative ideas.
Also raise SB_I_NODEV as we don't expect or support any devices on pidfs.
Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-1-98f3456fd552@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
This callback is invoked in the mmap() logic far earlier, so error handling
can be performed more safely without complicated and bug-prone state
unwinding required should an error arise.
This hook also avoids passing a pointer to a not-yet-correctly-established
VMA avoiding any issues with referencing this data structure.
It rather provides a pointer to the new struct vm_area_desc descriptor type
which contains all required state and allows easy setting of required
parameters without any consideration needing to be paid to locking or
reference counts.
Note that nested filesystems like overlayfs are compatible with an
.mmap_prepare() callback since commit bb666b7c2707 ("mm: add mmap_prepare()
compatibility layer for nested file systems").
In this patch we apply this change to file systems with relatively simple
mmap() hook logic - exfat, ceph, f2fs, bcachefs, zonefs, btrfs, ocfs2,
orangefs, nilfs2, romfs, ramfs and aio.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/f528ac4f35b9378931bd800920fee53fc0c5c74d.1750099179.git.lorenzo.stoakes@oracle.com
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Update nearly all generic_file_mmap() and generic_file_readonly_mmap()
callers to use generic_file_mmap_prepare() and
generic_file_readonly_mmap_prepare() respectively.
We update blkdev, 9p, afs, erofs, ext2, nfs, ntfs3, smb, ubifs and vboxsf
file systems this way.
Remaining users we cannot yet update are ecryptfs, fuse and cramfs. The
former two are nested file systems that must support any underlying file
ssytem, and cramfs inserts a mixed mapping which currently requires a VMA.
Once all file systems have been converted to mmap_prepare(), we can then
update nested file systems.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/08db85970d89b17a995d2cffae96fb4cc462377f.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Commit e1defc4ff0cf ("block: Do away with the notion of hardsect_size")
changed hardsect_size to logical block size. The comment on top still
says hardsect_size.
Remove the comment as the code is pretty clear. While we are at it,
format the relevant code.
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Link: https://lore.kernel.org/20250618075821.111459-1-p.raghav@samsung.com
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Crafted encoded extents could record out-of-range `lstart`, which should
not happen in normal cases.
It caused an iomap_iter_done() complaint [1] reported by syzbot.
[1] https://lore.kernel.org/r/684cb499.a00a0220.c6bd7.0010.GAE@google.com
Fixes: 1d191b4ca51d ("erofs: implement encoded extent metadata")
Reported-and-tested-by: syzbot+d8f000c609f05f52d9b5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8f000c609f05f52d9b5
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250619032839.2642193-1-hsiangkao@linux.alibaba.com
|
|
Pull smb server fixes from Steve French:
- Fix alternate data streams bug
- Important fix for null pointer deref with Kerberos authentication
- Fix oops in smbdirect (RDMA) in free_transport
* tag '6.16-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: handle set/get info file for streamed file
ksmbd: fix null pointer dereference in destroy_previous_session
ksmbd: add free_transport ops in ksmbd connection
|
|
fstest reports a f2fs bug:
generic/363 42s ... [failed, exit status 1]- output mismatch (see /share/git/fstests/results//generic/363.out.bad)
--- tests/generic/363.out 2025-01-12 21:57:40.271440542 +0800
+++ /share/git/fstests/results//generic/363.out.bad 2025-05-19 19:55:58.000000000 +0800
@@ -1,2 +1,78 @@
QA output created by 363
fsx -q -S 0 -e 1 -N 100000
+READ BAD DATA: offset = 0xd6fb, size = 0xf044, fname = /mnt/f2fs/junk
+OFFSET GOOD BAD RANGE
+0x1540d 0x0000 0x2a25 0x0
+operation# (mod 256) for the bad data may be 37
+0x1540e 0x0000 0x2527 0x1
...
(Run 'diff -u /share/git/fstests/tests/generic/363.out /share/git/fstests/results//generic/363.out.bad' to see the entire diff)
Ran: generic/363
Failures: generic/363
Failed 1 of 1 tests
The root cause is user can update post-eof page via mmap [1], however, f2fs
missed to zero post-eof page in below operations, so, once it expands i_size,
then it will include dummy data locates previous post-eof page, so during
below operations, we need to zero post-eof page.
Operations which can include dummy data after previous i_size after expanding
i_size:
- write
- mapwrite [1]
- truncate
- fallocate
* preallocate
* zero_range
* insert_range
* collapse_range
- clone_range (doesn’t support in f2fs)
- copy_range (doesn’t support in f2fs)
[1] https://man7.org/linux/man-pages/man2/mmap.2.html 'BUG section'
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit 8bd25b61c5a5 ("smb: client: set correct d_type for reparse DFS/DFSR
and mount point") deduplicated assignment of fattr->cf_dtype member from
all places to end of the function cifs_reparse_point_to_fattr(). The only
one missing place which was not deduplicated is wsl_to_fattr(). Fix it.
Fixes: 8bd25b61c5a5 ("smb: client: set correct d_type for reparse DFS/DFSR and mount point")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
hostname when adding channels
When mounting a share with kerberos authentication with multichannel
support, share mounts correctly, but fails to create secondary
channels. This occurs because the hostname is not populated when
adding the channels. The hostname is necessary for the userspace
cifs.upcall program to retrieve the required credentials and pass
it back to kernel, without hostname secondary channels fails
establish.
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Reported-by: xfuren <xfuren@gmail.com>
Link: https://bugzilla.samba.org/show_bug.cgi?id=15824
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Previously, file operations on a file-backed mount used the current
process' credentials to access the backing FD. Attempting to do so on
Android lead to SELinux denials, as ACL rules on the backing file (e.g.
/system/apex/foo.apex) is restricted to a small set of process.
Arguably, this error is redundant and leaking implementation details, as
access to files on a mount is already ACL'ed by path.
Instead, override to use the opener's cred when accessing the backing
file. This makes the behavior similar to a loop-backed mount, which
uses kworker cred when accessing the backing file and does not cause
SELinux denials.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20250612-b4-erofs-impersonate-v1-1-8ea7d6f65171@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The normal fsck code doesn't call key_visible_in_snapshot() with an
empty list of snapshot IDs seen (the current snapshot ID will always be
on the list), but str_hash_repair_key() ->
bch2_get_snapshot_overwrites() can, and that's totally fine as long as
we check for it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Lift copying the name into callers of ceph_encode_encrypted_dname()
that do not have it already copied; ceph_encode_encrypted_fname()
disappears.
That fixes a UAF in ceph_mdsc_build_path() - while the initial copy
of plaintext into buf is done under ->d_lock, we access the
original name again in ceph_encode_encrypted_fname() and that is
done without any locking. With ceph_encode_encrypted_dname() using
the stable copy the problem goes away.
Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
ceph_encode_encrypted_dname() would be better off with plaintext name
already copied into buffer; we'll lift that into the callers on the
next step, which will allow to fix UAF on races with rename; for now
copy it in the very beginning of ceph_encode_encrypted_dname().
That has a pleasant side benefit - we don't need to mess with tmp_buf
anymore (i.e. that's 256 bytes off the stack footprint).
Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and parse_longname() is not guaranteed that. That's the reason
why it uses kmemdup_nul() to build the argument for kstrtou64();
the problem is, kstrtou64() is not the only thing that need it.
Just get a NUL-terminated copy of the entire thing and be done
with that...
Fixes: dd66df0053ef "ceph: add support for encrypted snapshot names"
Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The bug only appears when:
- windows 11 copies a file that has an alternate data stream
- streams_xattr is enabled on the share configuration.
Microsoft Edge adds a ZoneIdentifier data stream containing the URL
for files it downloads.
Another way to create a test file:
- open cmd.exe
- echo "hello from default data stream" > hello.txt
- echo "hello again from ads" > hello.txt:ads.txt
If you open the file using notepad, we'll see the first message.
If you run "notepad hello.txt:ads.txt" in cmd.exe, we should see
the second message.
dir /s /r should least all streams for the file.
The truncation happens because the windows 11 client sends
a SetInfo/EndOfFile message on the ADS, but it is instead applied
on the main file, because we don't check fp->stream.
When receiving set/get info file for a stream file, Change to process
requests using stream position and size.
Truncate is unnecessary for stream files, so we skip
set_file_allocation_info and set_end_of_file_info operations.
Reported-by: Marios Makassikis <mmakassikis@freebox.fr>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If client set ->PreviousSessionId on kerberos session setup stage,
NULL pointer dereference error will happen. Since sess->user is not
set yet, It can pass the user argument as NULL to destroy_previous_session.
sess->user will be set in ksmbd_krb5_authenticate(). So this patch move
calling destroy_previous_session() after ksmbd_krb5_authenticate().
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27391
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
free_transport function for tcp connection can be called from smbdirect.
It will cause kernel oops. This patch add free_transport ops in ksmbd
connection, and add each free_transports for tcp and smbdirect.
Fixes: 21a4e47578d4 ("ksmbd: fix use-after-free in __smb2_lease_break_noti()")
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
check_directory_structure runs after check_dirents, so it expects that
it won't see any inodes with missing backpointers - normally.
But online fsck can't run check_dirents yet, or the user might only be
running a specific pass, so we need to be careful that this isn't an
error. If an inode is unreachable, that's handled by a separate pass.
Also, add a new 'bch2_inode_has_backpointer()' helper, since we were
doing this inconsistently.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree node scrub was sometimes failing to rewrite nodes with errors;
bch2_btree_node_rewrite() can return a transaction restart and we
weren't checking - the lockrestart_do() needs to wrap the entire
operation.
And there's a better helper it should've been using,
bch2_btree_node_rewrite_key(), which makes all this more convenient.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
We have provided generic .mmap_prepare() equivalents, so update all file
systems that specify these directly in their file_operations structures.
This updates 9p, adfs, affs, bfs, fat, hfs, hfsplus, hostfs, hpfs, jffs2,
jfs, minix, omfs, ramfs and ufs file systems directly.
It updates generic_ro_fops which impacts qnx4, cramfs, befs, squashfs,
frebxfs, qnx6, efs, romfs, erofs and isofs file systems.
There are remaining file systems which use generic hooks in a less direct
way which we address in a subsequent commit.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/c7dc90e44a9e75e750939ea369290d6e441a18e6.1750099179.git.lorenzo.stoakes@oracle.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
This callback is invoked in the mmap() logic far earlier, so error handling
can be performed more safely without complicated and bug-prone state
unwinding required should an error arise.
This hook also avoids passing a pointer to a not-yet-correctly-established
VMA avoiding any issues with referencing this data structure.
It rather provides a pointer to the new struct vm_area_desc descriptor type
which contains all required state and allows easy setting of required
parameters without any consideration needing to be paid to locking or
reference counts.
Note that nested filesystems like overlayfs are compatible with an
.mmap_prepare() callback since commit bb666b7c2707 ("mm: add mmap_prepare()
compatibility layer for nested file systems").
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/cba8b29ba5f225df8f63f50182d5f6e0fcf94456.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
This callback is invoked in the mmap() logic far earlier, so error handling
can be performed more safely without complicated and bug-prone state
unwinding required should an error arise.
This hook also avoids passing a pointer to a not-yet-correctly-established
VMA avoiding any issues with referencing this data structure.
It rather provides a pointer to the new struct vm_area_desc descriptor type
which contains all required state and allows easy setting of required
parameters without any consideration needing to be paid to locking or
reference counts.
Note that nested filesystems like overlayfs are compatible with an
.mmap_prepare() callback since commit bb666b7c2707 ("mm: add mmap_prepare()
compatibility layer for nested file systems").
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/5abfe526032a6698fd1bcd074a74165cda7ea57c.1750099179.git.lorenzo.stoakes@oracle.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This is a prerequisite for adapting those filesystems to use the
.mmap_prepare() hook for mmap()'ing which invoke this check as this hook
does not have access to a VMA pointer.
To effect this, change the signature of daxdev_mapping_supported() and
update its callers (ext4 and xfs mmap()'ing hook code).
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/b09de1e8544384074165d92d048e80058d971286.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
Additionally, commit bb666b7c2707 ("mm: add mmap_prepare() compatibility
layer for nested file systems") permits the use of the .mmap_prepare() hook
even in nested filesystems like overlayfs.
There are a number of places where we check only for f_op->mmap - this is
incorrect now mmap_prepare exists, so update all of these to use the
general helper can_mmap_file().
Most notably, this updates the elf logic to allow for the ability to
execute binaries on filesystems which have the .mmap_prepare hook, but
additionally we update nested filesystems.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/b68145b609532e62bab603dd9686faa6562046ec.1750099179.git.lorenzo.stoakes@oracle.com
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The call_mmap() function violates the existing convention in
include/linux/fs.h whereby invocations of virtual file system hooks is
performed by functions prefixed with vfs_xxx().
Correct this by renaming call_mmap() to vfs_mmap(). This also avoids
confusion as to the fact that f_op->mmap_prepare may be invoked here.
Also rename __call_mmap_prepare() function to vfs_mmap_prepare() and adjust
to accept a file parameter, this is useful later for nested file systems.
Finally, fix up the VMA userland tests and ensure the mmap_prepare -> mmap
shim is implemented there.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/8d389f4994fa736aa8f9172bef8533c10a9e9011.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The sysfs core handles 'const struct bin_attribute *'.
Adapt the internal references.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-2-724bfcf05b99@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We can only pass negative error codes to bch2_err_str(); if it's a
positive integer it's not an error and we trip an assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
A path exists in a particular snapshot: we should do the pathwalk in the
snapshot ID of the inode we started from, _not_ change snapshot ID as we
walk inodes and dirents.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can easily go from inode number -> path now, which makes for more
useful log messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Log the inode's new path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we find a directory connectivity problem, we should do the repair
in the oldest snapshot that has the issue - so that we don't end up
duplicating work or making a real mess of things.
Oldest snapshot IDs have the highest integer value, so - just walk
inodes in reverse order.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch_subvolume.fs_path_parent needs to be updated as well, it should
match inode.bi_parent_subvol.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The dirent will be in a different snapshot if the inode is a subvolume
root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The bch2_subvolume_get_snapshot() call needs to happen before the dirent
lookup - the dirent is in the parent subvolume.
Also, check for loops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
kthread creation checks for pending signals, which is _very_ annoying if
we have to do a long recovery and don't go rw until we've done
significant work.
Check if we'll be going rw and pre-allocate kthreads/workqueues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|