summaryrefslogtreecommitdiff
path: root/fs/overlayfs
AgeCommit message (Collapse)Author
2025-07-06ovl: Check for NULL d_inode() in ovl_dentry_upper()Kees Cook
[ Upstream commit 8a39f1c870e9d6fbac5638f3a42a6a6363829c49 ] In ovl_path_type() and ovl_is_metacopy_dentry() GCC notices that it is possible for OVL_E() to return NULL (which implies that d_inode(dentry) may be NULL). This would result in out of bounds reads via container_of(), seen with GCC 15's -Warray-bounds -fdiagnostics-details. For example: In file included from arch/x86/include/generated/asm/rwonce.h:1, from include/linux/compiler.h:339, from include/linux/export.h:5, from include/linux/linkage.h:7, from include/linux/fs.h:5, from fs/overlayfs/util.c:7: In function 'ovl_upperdentry_dereference', inlined from 'ovl_dentry_upper' at ../fs/overlayfs/util.c:305:9, inlined from 'ovl_path_type' at ../fs/overlayfs/util.c:216:6: include/asm-generic/rwonce.h:44:26: error: array subscript 0 is outside array bounds of 'struct inode[7486503276667837]' [-Werror=array-bounds=] 44 | #define __READ_ONCE(x) (*(const volatile __unqual_scalar_typeof(x) *)&(x)) | ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/asm-generic/rwonce.h:50:9: note: in expansion of macro '__READ_ONCE' 50 | __READ_ONCE(x); \ | ^~~~~~~~~~~ fs/overlayfs/ovl_entry.h:195:16: note: in expansion of macro 'READ_ONCE' 195 | return READ_ONCE(oi->__upperdentry); | ^~~~~~~~~ 'ovl_path_type': event 1 185 | return inode ? OVL_I(inode)->oe : NULL; 'ovl_path_type': event 2 Avoid this by allowing ovl_dentry_upper() to return NULL if d_inode() is NULL, as that means the problematic dereferencing can never be reached. Note that this fixes the over-eager compiler warning in an effort to being able to enable -Warray-bounds globally. There is no known behavioral bug here. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27ovl: Fix nested backing file pathsAndré Almeida
commit 924577e4f6ca473de1528953a0e13505fae61d7b upstream. When the lowerdir of an overlayfs is a merged directory of another overlayfs, ovl_open_realfile() will fail to open the real file and point to a lower dentry copy, without the proper parent path. After this, d_path() will then display the path incorrectly as if the file is placed in the root directory. This bug can be triggered with the following setup: mkdir -p ovl-A/lower ovl-A/upper ovl-A/merge ovl-A/work mkdir -p ovl-B/upper ovl-B/merge ovl-B/work cp /bin/cat ovl-A/lower/ mount -t overlay overlay -o \ lowerdir=ovl-A/lower,upperdir=ovl-A/upper,workdir=ovl-A/work \ ovl-A/merge mount -t overlay overlay -o \ lowerdir=ovl-A/merge,upperdir=ovl-B/upper,workdir=ovl-B/work \ ovl-B/merge ovl-A/merge/cat /proc/self/maps | grep --color cat ovl-B/merge/cat /proc/self/maps | grep --color cat The first cat will correctly show `/ovl-A/merge/cat`, while the second one shows just `/cat`. To fix that, uses file_user_path() inside of backing_file_open() to get the correct file path for the dentry. Co-developed-by: John Schoenick <johns@valvesoftware.com> Signed-off-by: John Schoenick <johns@valvesoftware.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Fixes: def3ae83da02 ("fs: store real path instead of fake path in backing file f_path") Cc: <stable@vger.kernel.org> # v6.7 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25ovl: don't allow datadir onlyMiklos Szeredi
commit eb3a04a8516ee9b5174379306f94279fc90424c4 upstream. In theory overlayfs could support upper layer directly referring to a data layer, but there's no current use case for this. Originally, when data-only layers were introduced, this wasn't allowed, only introduced by the "datadir+" feature, but without actually handling this case, resulting in an Oops. Fix by disallowing datadir without lowerdir. Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Fixes: 24e16e385f22 ("ovl: add support for appending lowerdirs one by one") Cc: <stable@vger.kernel.org> # v6.7 Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25ovl: remove unused forward declarationGiuseppe Scrivano
[ Upstream commit a6eb9a4a69cc360b930dad9dc8513f8fd9b3577f ] The ovl_get_verity_xattr() function was never added, only its declaration. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Fixes: 184996e92e86 ("ovl: Validate verity xattr when resolving lowerdata") Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_upVasiliy Kovalev
[ Upstream commit c84e125fff2615b4d9c259e762596134eddd2f27 ] The issue was caused by dput(upper) being called before ovl_dentry_update_reval(), while upper->d_flags was still accessed in ovl_dentry_remote(). Move dput(upper) after its last use to prevent use-after-free. BUG: KASAN: slab-use-after-free in ovl_dentry_remote fs/overlayfs/util.c:162 [inline] BUG: KASAN: slab-use-after-free in ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc3/0x620 mm/kasan/report.c:488 kasan_report+0xd9/0x110 mm/kasan/report.c:601 ovl_dentry_remote fs/overlayfs/util.c:162 [inline] ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167 ovl_link_up fs/overlayfs/copy_up.c:610 [inline] ovl_copy_up_one+0x2105/0x3490 fs/overlayfs/copy_up.c:1170 ovl_copy_up_flags+0x18d/0x200 fs/overlayfs/copy_up.c:1223 ovl_rename+0x39e/0x18c0 fs/overlayfs/dir.c:1136 vfs_rename+0xf84/0x20a0 fs/namei.c:4893 ... </TASK> Fixes: b07d5cc93e1b ("ovl: update of dentry revalidate flags after copy up") Reported-by: syzbot+316db8a1191938280eb6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=316db8a1191938280eb6 Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Link: https://lore.kernel.org/r/20250214215148.761147-1-kovalev@altlinux.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17fs: relax assertions on failure to encode file handlesAmir Goldstein
commit 974e3fe0ac61de85015bbe5a4990cf4127b304b2 upstream. Encoding file handles is usually performed by a filesystem >encode_fh() method that may fail for various reasons. The legacy users of exportfs_encode_fh(), namely, nfsd and name_to_handle_at(2) syscall are ready to cope with the possibility of failure to encode a file handle. There are a few other users of exportfs_encode_{fh,fid}() that currently have a WARN_ON() assertion when ->encode_fh() fails. Relax those assertions because they are wrong. The second linked bug report states commit 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles") in v6.6 as the regressing commit, but this is not accurate. The aforementioned commit only increases the chances of the assertion and allows triggering the assertion with the reproducer using overlayfs, inotify and drop_caches. Triggering this assertion was always possible with other filesystems and other reasons of ->encode_fh() failures and more particularly, it was also possible with the exact same reproducer using overlayfs that is mounted with options index=on,nfs_export=on also on kernels < v6.6. Therefore, I am not listing the aforementioned commit as a Fixes commit. Backport hint: this patch will have a trivial conflict applying to v6.6.y, and other trivial conflicts applying to stable kernels < v6.6. Reported-by: syzbot+ec07f6f5ce62b858579f@syzkaller.appspotmail.com Tested-by: syzbot+ec07f6f5ce62b858579f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@google.com/ Reported-by: Dmitry Safonov <dima@arista.com> Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3F+Juqy_o6oP8uw@mail.gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17ovl: support encoding fid from inode with no aliasAmir Goldstein
[ Upstream commit c45beebfde34aa71afbc48b2c54cdda623515037 ] Dmitry Safonov reported that a WARN_ON() assertion can be trigered by userspace when calling inotify_show_fdinfo() for an overlayfs watched inode, whose dentry aliases were discarded with drop_caches. The WARN_ON() assertion in inotify_show_fdinfo() was removed, because it is possible for encoding file handle to fail for other reason, but the impact of failing to encode an overlayfs file handle goes beyond this assertion. As shown in the LTP test case mentioned in the link below, failure to encode an overlayfs file handle from a non-aliased inode also leads to failure to report an fid with FAN_DELETE_SELF fanotify events. As Dmitry notes in his analyzis of the problem, ovl_encode_fh() fails if it cannot find an alias for the inode, but this failure can be fixed. ovl_encode_fh() seldom uses the alias and in the case of non-decodable file handles, as is often the case with fanotify fid info, ovl_encode_fh() never needs to use the alias to encode a file handle. Defer finding an alias until it is actually needed so ovl_encode_fh() will not fail in the common case of FAN_DELETE_SELF fanotify events. Fixes: 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles") Reported-by: Dmitry Safonov <dima@arista.com> Closes: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiie81voLZZi2zXS1BziXZCM24nXqPAxbu8kxXCUWdwOg@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250105162404.357058-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17ovl: pass realinode to ovl_encode_real_fh() instead of realdentryAmir Goldstein
[ Upstream commit 07aeefae7ff44d80524375253980b1bdee2396b0 ] We want to be able to encode an fid from an inode with no alias. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250105162404.357058-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: c45beebfde34 ("ovl: support encoding fid from inode with no alias") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09ovl: properly handle large files in ovl_security_fileattrOleksandr Tymoshenko
commit 3b6b99ef15ea37635604992ede9ebcccef38a239 upstream. dentry_open in ovl_security_fileattr fails for any file larger than 2GB if open method of the underlying filesystem calls generic_file_open (e.g. fusefs). The issue can be reproduce using the following script: (passthrough_ll is an example app from libfuse). $ D=/opt/test/mnt $ mkdir -p ${D}/{source,base,top/uppr,top/work,ovlfs} $ dd if=/dev/zero of=${D}/source/zero.bin bs=1G count=2 $ passthrough_ll -o source=${D}/source ${D}/base $ mount -t overlay overlay \ -olowerdir=${D}/base,upperdir=${D}/top/uppr,workdir=${D}/top/work \ ${D}/ovlfs $ chmod 0777 ${D}/mnt/ovlfs/zero.bin Running this script results in "Value too large for defined data type" error message from chmod. Signed-off-by: Oleksandr Tymoshenko <ovt@google.com> Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09ovl: Filter invalid inodes with missing lookup functionVasiliy Kovalev
commit c8b359dddb418c60df1a69beea01d1b3322bfe83 upstream. Add a check to the ovl_dentry_weird() function to prevent the processing of directory inodes that lack the lookup function. This is important because such inodes can cause errors in overlayfs when passed to the lowerstack. Reported-by: syzbot+a8c9d476508bd14a90e5@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=a8c9d476508bd14a90e5 Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-unionfs/CAJfpegvx-oS9XGuwpJx=Xe28_jzWx5eRo1y900_ZzWY+=gGzUg@mail.gmail.com/ Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Cc: <stable@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-16fs: pass offset and result to backing_file end_write() callbackAmir Goldstein
This is needed for extending fuse inode size after fuse passthrough write. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-fsdevel/CAJfpegs=cvZ_NYy6Q_D42XhYS=Sjj5poM1b5TzXzOVvX=R36aA@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-09-27ovl: fix file leak in ovl_real_fdget_meta()Amir Goldstein
ovl_open_realfile() is wrongly called twice after conversion to new struct fd. Fixes: 88a2f6468d01 ("struct fd: representation change") Reported-by: syzbot+d9efec94dcbfa0de1c07@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-23Merge tag 'pull-stable-struct_fd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.
2024-09-19Merge tag 'ovl-update-6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - Increase robustness of overlayfs to crashes in the case of underlying filesystems that to not guarantee metadata ordering to persistent storage (problem was reported with ubifs). - Deny mount inside container with features that require root privileges to work properly, instead of failing operations later. - Some clarifications to overlayfs documentation. * tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: fail if trusted xattrs are needed but caller lacks permission overlayfs.rst: update metacopy section in overlayfs documentation ovl: fsync after metadata copy-up ovl: don't set the superblock's errseq_t manually
2024-09-16Merge tag 'lsm-pr-20240911' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Move the LSM framework to static calls This transitions the vast majority of the LSM callbacks into static calls. Those callbacks which haven't been converted were left as-is due to the general ugliness of the changes required to support the static call conversion; we can revisit those callbacks at a future date. - Add the Integrity Policy Enforcement (IPE) LSM This adds a new LSM, Integrity Policy Enforcement (IPE). There is plenty of documentation about IPE in this patches, so I'll refrain from going into too much detail here, but the basic motivation behind IPE is to provide a mechanism such that administrators can restrict execution to only those binaries which come from integrity protected storage, e.g. a dm-verity protected filesystem. You will notice that IPE requires additional LSM hooks in the initramfs, dm-verity, and fs-verity code, with the associated patches carrying ACK/review tags from the associated maintainers. We couldn't find an obvious maintainer for the initramfs code, but the IPE patchset has been widely posted over several years. Both Deven Bowers and Fan Wu have contributed to IPE's development over the past several years, with Fan Wu agreeing to serve as the IPE maintainer moving forward. Once IPE is accepted into your tree, I'll start working with Fan to ensure he has the necessary accounts, keys, etc. so that he can start submitting IPE pull requests to you directly during the next merge window. - Move the lifecycle management of the LSM blobs to the LSM framework Management of the LSM blobs (the LSM state buffers attached to various kernel structs, typically via a void pointer named "security" or similar) has been mixed, some blobs were allocated/managed by individual LSMs, others were managed by the LSM framework itself. Starting with this pull we move management of all the LSM blobs, minus the XFRM blob, into the framework itself, improving consistency across LSMs, and reducing the amount of duplicated code across LSMs. Due to some additional work required to migrate the XFRM blob, it has been left as a todo item for a later date; from a practical standpoint this omission should have little impact as only SELinux provides a XFRM LSM implementation. - Fix problems with the LSM's handling of F_SETOWN The LSM hook for the fcntl(F_SETOWN) operation had a couple of problems: it was racy with itself, and it was disconnected from the associated DAC related logic in such a way that the LSM state could be updated in cases where the DAC state would not. We fix both of these problems by moving the security_file_set_fowner() hook into the same section of code where the DAC attributes are updated. Not only does this resolve the DAC/LSM synchronization issue, but as that code block is protected by a lock, it also resolve the race condition. - Fix potential problems with the security_inode_free() LSM hook Due to use of RCU to protect inodes and the placement of the LSM hook associated with freeing the inode, there is a bit of a challenge when it comes to managing any LSM state associated with an inode. The VFS folks are not open to relocating the LSM hook so we have to get creative when it comes to releasing an inode's LSM state. Traditionally we have used a single LSM callback within the hook that is triggered when the inode is "marked for death", but not actually released due to RCU. Unfortunately, this causes problems for LSMs which want to take an action when the inode's associated LSM state is actually released; so we add an additional LSM callback, inode_free_security_rcu(), that is called when the inode's LSM state is released in the RCU free callback. - Refactor two LSM hooks to better fit the LSM return value patterns The vast majority of the LSM hooks follow the "return 0 on success, negative values on failure" pattern, however, there are a small handful that have unique return value behaviors which has caused confusion in the past and makes it difficult for the BPF verifier to properly vet BPF LSM programs. This includes patches to convert two of these"special" LSM hooks to the common 0/-ERRNO pattern. - Various cleanups and improvements A handful of patches to remove redundant code, better leverage the IS_ERR_OR_NULL() helper, add missing "static" markings, and do some minor style fixups. * tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits) security: Update file_set_fowner documentation fs: Fix file_set_fowner LSM hook inconsistencies lsm: Use IS_ERR_OR_NULL() helper function lsm: remove LSM_COUNT and LSM_CONFIG_COUNT ipe: Remove duplicated include in ipe.c lsm: replace indirect LSM hook calls with static calls lsm: count the LSMs enabled at compile time kernel: Add helper macros for loop unrolling init/main.c: Initialize early LSMs after arch code, static keys and calls. MAINTAINERS: add IPE entry with Fan Wu as maintainer documentation: add IPE documentation ipe: kunit test for parser scripts: add boot policy generation program ipe: enable support for fs-verity as a trust provider fsverity: expose verified fsverity built-in signatures to LSMs lsm: add security_inode_setintegrity() hook ipe: add support for dm-verity as a trust provider dm-verity: expose root hash digest and signature data to LSMs block,lsm: add LSM blob and new LSM hooks for block devices ipe: add permissive toggle ...
2024-09-08ovl: fail if trusted xattrs are needed but caller lacks permissionMike Baynton
Some overlayfs features require permission to read/write trusted.* xattrs. These include redirect_dir, verity, metacopy, and data-only layers. This patch adds additional validations at mount time to stop overlays from mounting in certain cases where the resulting mount would not function according to the user's expectations because they lack permission to access trusted.* xattrs (for example, not global root.) Similar checks in ovl_make_workdir() that disable features instead of failing are still relevant and used in cases where the resulting mount can still work "reasonably well." Generally, if the feature was enabled through kernel config or module option, any mount that worked before will still work the same; this applies to redirect_dir and metacopy. The user must explicitly request these features in order to generate a mount failure. Verity and data-only layers on the other hand must be explictly requested and have no "reasonable" disabled or degraded alternative, so mounts attempting either always fail. "lower data-only dirs require metacopy support" moved down in case userxattr is set, which disables metacopy. Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Mike Baynton <mike@mbaynton.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-08-30ovl: fsync after metadata copy-upAmir Goldstein
For upper filesystems which do not use strict ordering of persisting metadata changes (e.g. ubifs), when overlayfs file is modified for the first time, copy up will create a copy of the lower file and its parent directories in the upper layer. Permission lost of the new upper parent directory was observed during power-cut stress test. Fix by moving the fsync call to after metadata copy to make sure that the metadata copied up directory and files persists to disk before renaming from tmp to final destination. With metacopy enabled, this change will hurt performance of workloads such as chown -R, so we keep the legacy behavior of fsync only on copyup of data. Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxj-pOvmw1-uXR3qVdqtLjSkwcR9nVKcNU_vC10Zyf2miQ@mail.gmail.com/ Reported-and-tested-by: Fei Lv <feilv@asrmicro.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-08-29ovl: don't set the superblock's errseq_t manuallyHaifeng Xu
Since commit 5679897eb104 ("vfs: make sync_filesystem return errors from ->sync_fs"), the return value from sync_fs callback can be seen in sync_filesystem(). Thus the errseq_set opreation can be removed here. Depends-on: commit 5679897eb104 ("vfs: make sync_filesystem return errors from ->sync_fs") Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-08-24Merge patch series "ovl: simplify ovl_parse_param_lowerdir()"Christian Brauner
Simplify and fix overlayfs layer parsing so the maximum of 500 layers can be used. * patches from https://lore.kernel.org/r/20240705011510.794025-1-chengzhihao1@huawei.com: ovl: ovl_parse_param_lowerdir: Add missed '\n' for pr_err ovl: fix wrong lowerdir number check for parameter Opt_lowerdir ovl: pass string to ovl_parse_layer() Link: https://lore.kernel.org/r/20240705011510.794025-1-chengzhihao1@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-23ovl: ovl_parse_param_lowerdir: Add missed '\n' for pr_errZhihao Cheng
Add '\n' for pr_err in function ovl_parse_param_lowerdir(), which ensures that error message is displayed at once. Fixes: b36a5780cb44 ("ovl: modify layer parameter parsing") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20240705011510.794025-4-chengzhihao1@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-23ovl: fix wrong lowerdir number check for parameter Opt_lowerdirZhihao Cheng
The max count of lowerdir is OVL_MAX_STACK[500], which is broken by commit 37f32f526438("ovl: fix memory leak in ovl_parse_param()") for parameter Opt_lowerdir. Since commit 819829f0319a("ovl: refactor layer parsing helpers") and commit 24e16e385f22("ovl: add support for appending lowerdirs one by one") added check ovl_mount_dir_check() in function ovl_parse_param_lowerdir(), the 'ctx->nr' should be smaller than OVL_MAX_STACK, after commit 37f32f526438("ovl: fix memory leak in ovl_parse_param()") is applied, the 'ctx->nr' is updated before the check ovl_mount_dir_check(), which leads the max count of lowerdir to become 499 for parameter Opt_lowerdir. Fix it by replacing lower layers parsing code with the existing helper function ovl_parse_layer(). Fixes: 37f32f526438 ("ovl: fix memory leak in ovl_parse_param()") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20240705011510.794025-3-chengzhihao1@huawei.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-23ovl: pass string to ovl_parse_layer()Christian Brauner
So it can be used for parsing the Opt_lowerdir. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20240705011510.794025-2-chengzhihao1@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12struct fd: representation changeAl Viro
We want the compiler to see that fdput() on empty instance is a no-op. The emptiness check is that file reference is NULL, while fdput() is "fput() if FDPUT_FPUT is present in flags". The reason why fdput() on empty instance is a no-op is something compiler can't see - it's that we never generate instances with NULL file reference combined with non-zero flags. It's not that hard to deal with - the real primitives behind fdget() et.al. are returning an unsigned long value, unpacked by (inlined) __to_fd() into the current struct file * + int. The lower bits are used to store flags, while the rest encodes the pointer. Linus suggested that keeping this unsigned long around with the extractions done by inlined accessors should generate a sane code and that turns out to be the case. Namely, turning struct fd into a struct-wrapped unsinged long, with fd_empty(f) => unlikely(f.word == 0) fd_file(f) => (struct file *)(f.word & ~3) fdput(f) => if (f.word & 1) fput(fd_file(f)) ends up with compiler doing the right thing. The cost is the patch footprint, of course - we need to switch f.file to fd_file(f) all over the tree, and it's not doable with simple search and replace; there are false positives, etc. Note that the sole member of that structure is an opaque unsigned long - all accesses should be done via wrappers and I don't want to use a name that would invite manual casts to file pointers, etc. The value of that member is equal either to (unsigned long)p | flags, p being an address of some struct file instance, or to 0 for an empty fd. For now the new predicate (fd_empty(f)) has no users; all the existing checks have form (!fd_file(f)). We will convert to fd_empty() use later; here we only define it (and tell the compiler that it's unlikely to return true). This commit only deals with representation change; there will be followups. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-08-12introduce fd_file(), convert all accessors to it.Al Viro
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-07-31lsm: Refactor return value of LSM hook inode_copy_up_xattrXu Kuohai
To be consistent with most LSM hooks, convert the return value of hook inode_copy_up_xattr to 0 or a negative error code. Before: - Hook inode_copy_up_xattr returns 0 when accepting xattr, 1 when discarding xattr, -EOPNOTSUPP if it does not know xattr, or any other negative error code otherwise. After: - Hook inode_copy_up_xattr returns 0 when accepting xattr, *-ECANCELED* when discarding xattr, -EOPNOTSUPP if it does not know xattr, or any other negative error code otherwise. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-06-14ovl: fix encoding fid for lower only rootMiklos Szeredi
ovl_check_encode_origin() should return a positive number if the lower dentry is to be encoded, zero otherwise. If there's no upper layer at all (read-only overlay), then it obviously needs to return positive. This was broken by commit 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles"), which didn't take the lower-only configuration into account. Fix by checking the no-upper-layer case up-front. Reported-and-tested-by: Youzhong Yang <youzhong@gmail.com> Closes: https://lore.kernel.org/all/CADpNCvaBimi+zCYfRJHvCOhMih8OU0rmZkwLuh24MKKroRuT8Q@mail.gmail.com/ Fixes: 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles") Cc: <stable@vger.kernel.org> # v6.6 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-05-28ovl: fix copy-up in tmpfileMiklos Szeredi
Move ovl_copy_up() call outside of ovl_want_write()/ovl_drop_write() region, since copy up may also call ovl_want_write() resulting in recursive locking on sb->s_writers. Reported-and-tested-by: syzbot+85e58cdf5b3136471d4b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000f6865106191c3e58@google.com/ Fixes: 9a87907de359 ("ovl: implement tmpfile") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-05-22Merge tag 'ovl-update-6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Miklos Szeredi: - Add tmpfile support - Clean up include * tag 'ovl-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: remove duplicate included header ovl: remove upper umask handling from ovl_create_upper() ovl: implement tmpfile
2024-05-21Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc vfs updates from Al Viro: "Assorted commits that had missed the last merge window..." * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove call_{read,write}_iter() functions do_dentry_open(): kill inode argument kernel_file_open(): get rid of inode argument get_file_rcu(): no need to check for NULL separately fd_is_open(): move to fs/file.c close_on_exec(): pass files_struct instead of fdtable
2024-05-15Merge tag 'integrity-v6.10' of ↵Linus Torvalds
ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Two IMA changes, one EVM change, a use after free bug fix, and a code cleanup to address "-Wflex-array-member-not-at-end" warnings: - The existing IMA {ascii, binary}_runtime_measurements lists include a hard coded SHA1 hash. To address this limitation, define per TPM enabled hash algorithm {ascii, binary}_runtime_measurements lists - Close an IMA integrity init_module syscall measurement gap by defining a new critical-data record - Enable (partial) EVM support on stacked filesystems (overlayfs). Only EVM portable & immutable file signatures are copied up, since they do not contain filesystem specific metadata" * tag 'integrity-v6.10' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: add crypto agility support for template-hash algorithm evm: Rename is_unsupported_fs to is_unsupported_hmac_fs fs: Rename SB_I_EVM_UNSUPPORTED to SB_I_EVM_HMAC_UNSUPPORTED evm: Enforce signatures on unsupported filesystem for EVM_INIT_X509 ima: re-evaluate file integrity on file metadata change evm: Store and detect metadata inode attributes changes ima: Move file-change detection variables into new structure evm: Use the metadata inode to calculate metadata hash evm: Implement per signature type decision in security_inode_copy_up_xattr security: allow finer granularity in permitting copy-up of security xattrs ima: Rename backing_inode to real_inode integrity: Avoid -Wflex-array-member-not-at-end warnings ima: define an init_module critical data record ima: Fix use-after-free on a dentry's dname.name
2024-05-10ovl: remove duplicate included headerThorsten Blum
Remove duplicate included header file linux/posix_acl.h Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-05-02ovl: remove upper umask handling from ovl_create_upper()Miklos Szeredi
This is already done by vfs_prepare_mode() when creating the upper object by vfs_create(), vfs_mkdir() and vfs_mknod(). No regressions have been observed in xfstests run with posix acls turned off for the upper filesystem. Fixes: 1639a49ccdce ("fs: move S_ISGID stripping into the vfs_*() helpers") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-05-02ovl: implement tmpfileMiklos Szeredi
Combine inode creation with opening a file. There are six separate objects that are being set up: the backing inode, dentry and file, and the overlay inode, dentry and file. Cleanup in case of an error is a bit of a challenge and is difficult to test, so careful review is needed. All tmpfile testcases except generic/509 now run/pass, and no regressions are observed with full xfstests. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
2024-04-15kernel_file_open(): get rid of inode argumentAl Viro
always equal to ->dentry->d_inode of the path argument these days. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-09fs: Rename SB_I_EVM_UNSUPPORTED to SB_I_EVM_HMAC_UNSUPPORTEDStefan Berger
Now that EVM supports RSA signatures for previously completely unsupported filesystems rename the flag SB_I_EVM_UNSUPPORTED to SB_I_EVM_HMAC_UNSUPPORTED to reflect that only HMAC is not supported. Suggested-by: Amir Goldstein <amir73il@gmail.com> Suggested-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-04-09security: allow finer granularity in permitting copy-up of security xattrsStefan Berger
Copying up xattrs is solely based on the security xattr name. For finer granularity add a dentry parameter to the security_inode_copy_up_xattr hook definition, allowing decisions to be based on the xattr content as well. Co-developed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM,SELinux) Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-03-26fs_parser: move fsparam_string_empty() helper into headerLuis Henriques (SUSE)
Since both ext4 and overlayfs define the same macro to specify string parameters that may allow empty values, define it in an header file so that this helper can be shared. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Link: https://lore.kernel.org/r/20240312104757.27333-1-luis.henriques@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-17ovl: relax WARN_ON in ovl_verify_area()Amir Goldstein
syzbot hit an assertion in copy up data loop which looks like it is the result of a lower file whose size is being changed underneath overlayfs. This type of use case is documented to cause undefined behavior, so returning EIO error for the copy up makes sense, but it should not be causing a WARN_ON assertion. Reported-and-tested-by: syzbot+3abd99031b42acf367ef@syzkaller.appspotmail.com Fixes: ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()") Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-03-12mm, slab: remove last vestiges of SLAB_MEM_SPREADLinus Torvalds
Yes, yes, I know the slab people were planning on going slow and letting every subsystem fight this thing on their own. But let's just rip off the band-aid and get it over and done with. I don't want to see a number of unnecessary pull requests just to get rid of a flag that no longer has any meaning. This was mainly done with a couple of 'sed' scripts and then some manual cleanup of the end result. Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-11Merge tag 'vfs-6.9.uuid' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs uuid updates from Christian Brauner: "This adds two new ioctl()s for getting the filesystem uuid and retrieving the sysfs path based on the path of a mounted filesystem. Getting the filesystem uuid has been implemented in filesystem specific code for a while it's now lifted as a generic ioctl" * tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: add support for FS_IOC_GETFSSYSFSPATH fs: add FS_IOC_GETFSSYSFSPATH fat: Hook up sb->s_uuid fs: FS_IOC_GETUUID ovl: convert to super_set_uuid() fs: super_set_uuid()
2024-03-11Merge tag 'vfs-6.9.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ...
2024-03-07Merge tag 'for-next-6.9' of ↵Christian Brauner
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode into vfs.misc Merge case-insensitive updates from Gabriel Krisman Bertazi: - Patch case-insensitive lookup by trying the case-exact comparison first, before falling back to costly utf8 casefolded comparison. - Fix to forbid using a case-insensitive directory as part of an overlayfs mount. - Patchset to ensure d_op are set at d_alloc time for fscrypt and casefold volumes, ensuring filesystem dentries will all have the correct ops, whether they come from a lookup or not. * tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27ovl: Always reject mounting over case-insensitive directoriesGabriel Krisman Bertazi
overlayfs relies on the filesystem setting DCACHE_OP_HASH or DCACHE_OP_COMPARE to reject mounting over case-insensitive directories. Since commit bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), we set ->d_op through a hook in ->d_lookup, which means the root dentry won't have them, causing the mount to accidentally succeed. In v6.7-rc7, the following sequence will succeed to mount, but any dentry other than the root dentry will be a "weird" dentry to ovl and fail with EREMOTE. mkfs.ext4 -O casefold lower.img mount -O loop lower.img lower mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work ovl /mnt Mounting on a subdirectory fails, as expected, because DCACHE_OP_HASH and DCACHE_OP_COMPARE are properly set by ->lookup. Fix by explicitly rejecting superblocks that allow case-insensitive dentries. Yes, this will be solved when we move d_op configuration back to ->s_d_op. Yet, we better have an explicit fix to avoid messing up again. While there, re-sort the entries to have more descriptive error messages first. Fixes: bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops") Acked-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-2-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-12Merge tag 'vfs-6.8-rc5.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix performance regression introduced by moving the security permission hook out of do_clone_file_range() and into its caller vfs_clone_file_range(). This causes the security hook to be called in situation were it wasn't called before as the fast permission checks were left in do_clone_file_range(). Fix this by merging the two implementations back together and restoring the old ordering: fast permission checks first, expensive ones later. - Tweak mount_setattr() permission checking so that mount properties on the real rootfs can be changed. When we added mount_setattr() we added additional checks compared to legacy mount(2). If the mount had a parent then verify that the caller and the mount namespace the mount is attached to match and if not make sure that it's an anonymous mount. But the real rootfs falls into neither category. It is neither an anoymous mount because it is obviously attached to the initial mount namespace but it also obviously doesn't have a parent mount. So that means legacy mount(2) allows changing mount properties on the real rootfs but mount_setattr(2) blocks this. This causes regressions (See the commit for details). Fix this by relaxing the check. If the mount has a parent or if it isn't a detached mount, verify that the mount namespaces of the caller and the mount are the same. Technically, we could probably write this even simpler and check that the mount namespaces match if it isn't a detached mount. But the slightly longer check makes it clearer what conditions one needs to think about. * tag 'vfs-6.8-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: relax mount_setattr() permission checks remap_range: merge do_clone_file_range() into vfs_clone_file_range()
2024-02-08ovl: convert to super_set_uuid()Kent Overstreet
We don't want to be settingc sb->s_uuid directly anymore, as there's a length field that also has to be set, and this conversion was not completely trivial. Acked-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240207025624.1019754-3-kent.overstreet@linux.dev Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-unionfs@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-06remap_range: merge do_clone_file_range() into vfs_clone_file_range()Amir Goldstein
commit dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") moved the permission hooks from do_clone_file_range() out to its caller vfs_clone_file_range(), but left all the fast sanity checks in do_clone_file_range(). This makes the expensive security hooks be called in situations that they would not have been called before (e.g. fs does not support clone). The only reason for the do_clone_file_range() helper was that overlayfs did not use to be able to call vfs_clone_file_range() from copy up context with sb_writers lock held. However, since commit c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held"), overlayfs just uses an open coded version of vfs_clone_file_range(). Merge_clone_file_range() into vfs_clone_file_range(), restoring the original order of checks as it was before the regressing commit and adapt the overlayfs code to call vfs_clone_file_range() before the permission hooks that were added by commit ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()"). Note that in the merge of do_clone_file_range(), the file_start_write() context was reduced to cover ->remap_file_range() without holding it over the permission hooks, which was the reason for doing the regressing commit in the first place. Reported-and-tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401312229.eddeb9a6-oliver.sang@intel.com Fixes: dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202102258.1582671-1-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-06fs: remove the inode argument to ->d_real() methodAmir Goldstein
The only remaining user of ->d_real() method is d_real_inode(), which passed NULL inode argument to get the real data dentry. There are no longer any users that call ->d_real() with a non-NULL inode argument for getting a detry from a specific underlying layer. Remove the inode argument of the method and replace it with an integer 'type' argument, to allow callers to request the real metadata dentry instead of the real data dentry. All the current users of d_real_inode() (e.g. uprobe) continue to get the real data inode. Caller that need to get the real metadata inode (e.g. IMA/EVM) can use d_inode(d_real(dentry, D_REAL_METADATA)). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202110132.1584111-3-amir73il@gmail.com Tested-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-23ovl: mark xwhiteouts directory with overlay.opaque='x'Amir Goldstein
An opaque directory cannot have xwhiteouts, so instead of marking an xwhiteouts directory with a new xattr, overload overlay.opaque xattr for marking both opaque dir ('y') and xwhiteouts dir ('x'). This is more efficient as the overlay.opaque xattr is checked during lookup of directory anyway. This also prevents unnecessary checking the xattr when reading a directory without xwhiteouts, i.e. most of the time. Note that the xwhiteouts marker is not checked on the upper layer and on the last layer in lowerstack, where xwhiteouts are not expected. Fixes: bc8df7a3dc03 ("ovl: Add an alternative type of whiteout") Cc: <stable@vger.kernel.org> # v6.7 Reviewed-by: Alexander Larsson <alexl@redhat.com> Tested-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-01-11Merge tag 'pull-dcache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc)" * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) dcache: remove unnecessary NULL check in dget_dlock() kill DCACHE_MAY_FREE __d_unalias() doesn't use inode argument d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant get rid of DCACHE_GENOCIDE d_genocide(): move the extern into fs/internal.h simple_fill_super(): don't bother with d_genocide() on failure nsfs: use d_make_root() d_alloc_pseudo(): move setting ->d_op there from the (sole) caller kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller retain_dentry(): introduce a trimmed-down lockless variant __dentry_kill(): new locking scheme d_prune_aliases(): use a shrink list switch select_collect{,2}() to use of to_shrink_list() to_shrink_list(): call only if refcount is 0 fold dentry_kill() into dput() don't try to cut corners in shrink_lock_dentry() fold the call of retain_dentry() into fast_dput() Call retain_dentry() with refcount 0 dentry_kill(): don't bother with retain_dentry() on slow path ...
2024-01-11Merge tag 'pull-rename' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change