summaryrefslogtreecommitdiff
path: root/fs/overlayfs/copy_up.c
AgeCommit message (Collapse)Author
4 daysMerge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
2025-07-18ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()NeilBrown
The only remaining user of ovl_cleanup() is ovl_cleanup_locked(), so we no longer need both. This patch renames ovl_cleanup() to ovl_cleanup_locked() and makes it static. ovl_cleanup_unlocked() is renamed to ovl_cleanup(). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-22-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow the locked region in ovl_copy_up_workdir()NeilBrown
In ovl_copy_up_workdir() unlock immediately after the rename. There is nothing else in the function that needs the lock. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-5-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: Call ovl_create_temp() without lock held.NeilBrown
ovl currently locks a directory or two and then performs multiple actions in one or both directories. This is incompatible with proposed changes which will lock just the dentry of objects being acted on. This patch moves calls to ovl_create_temp() out of the locked regions and has it take and release the relevant lock itself. The lock that was taken before this function was called is now taken after. This means that any code between where the lock was taken and ovl_create_temp() is now unlocked. This necessitates the use of ovl_cleanup_unlocked() and the creation of ovl_lookup_upper_unlocked(). These will be used more widely in future patches. Now that the file is created before the lock is taken for rename, we need to ensure the parent wasn't changed before the lock was gained. ovl_lock_rename_workdir() is changed to optionally receive the dentries that will be involved in the rename. If either is present but has the wrong parent, an error is returned. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-4-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_create_index() to take dir locksNeilBrown
ovl_copy_up_workdir() currently take a rename lock on two directories, then use the lock to both create a file in one directory, perform a rename, and possibly unlink the file for cleanup. This is incompatible with proposed changes which will lock just the dentry of objects being acted on. This patch moves the call to ovl_create_index() earlier in ovl_copy_up_workdir() to before the lock is taken. ovl_create_index() then takes the required lock only when needed. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-3-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: simplify an error path in ovl_copy_up_workdir()NeilBrown
If ovl_copy_up_data() fails the error is not immediately handled but the code continues on to call ovl_start_write() and lock_rename(), presumably because both of these locks are needed for the cleanup. Only then (if the lock was successful) is the error checked. This makes the code a little hard to follow and could be fragile. This patch changes to handle the error after the ovl_start_write() (which cannot fail, so there aren't multiple errors to deail with). A new ovl_cleanup_unlocked() is created which takes the required directory lock. This will be used extensively in later patches. In general we need to check the parent is still correct after taking the lock (as ovl_copy_up_workdir() does after a successful lock_rename()) so that is included in ovl_cleanup_unlocked() using new ovl_parent_lock() and ovl_parent_unlock() calls (it is planned to move this API into VFS code eventually, though in a slightly different form). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-2-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04tree-wide: s/struct fileattr/struct file_kattr/gChristian Brauner
Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-02fs: make vfs_fileattr_[get|set] return -EOPNOTSUPPAndrey Albershteyn
Future patches will add new syscalls which use these functions. As this interface won't be used for ioctls only, the EOPNOSUPP is more appropriate return code. This patch converts return code from ENOIOCTLCMD to EOPNOSUPP for vfs_fileattr_get and vfs_fileattr_set. To save old behavior translate EOPNOSUPP back for current users - overlayfs, encryptfs and fs/ioctl.c. Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-4-c4e3bc35227b@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-16VFS: change old_dir and new_dir in struct renamedata to dentrysNeilBrown
all users of 'struct renamedata' have the dentry for the old and new directories, and often have no use for the inode except to store it in the renamedata. This patch changes struct renamedata to hold the dentry, rather than the inode, for the old and new directories, and changes callers to match. The names are also changed from a _dir suffix to _parent. This is consistent with other usage in namei.c and elsewhere. This results in the removal of several local variables and several dereferences of ->d_inode at the cost of adding ->d_inode dereferences to vfs_rename(). Acked-by: Miklos Szeredi <miklos@szeredi.hu> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/174977089072.608730.4244531834577097454@noble.neil.brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_upVasiliy Kovalev
The issue was caused by dput(upper) being called before ovl_dentry_update_reval(), while upper->d_flags was still accessed in ovl_dentry_remote(). Move dput(upper) after its last use to prevent use-after-free. BUG: KASAN: slab-use-after-free in ovl_dentry_remote fs/overlayfs/util.c:162 [inline] BUG: KASAN: slab-use-after-free in ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc3/0x620 mm/kasan/report.c:488 kasan_report+0xd9/0x110 mm/kasan/report.c:601 ovl_dentry_remote fs/overlayfs/util.c:162 [inline] ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167 ovl_link_up fs/overlayfs/copy_up.c:610 [inline] ovl_copy_up_one+0x2105/0x3490 fs/overlayfs/copy_up.c:1170 ovl_copy_up_flags+0x18d/0x200 fs/overlayfs/copy_up.c:1223 ovl_rename+0x39e/0x18c0 fs/overlayfs/dir.c:1136 vfs_rename+0xf84/0x20a0 fs/namei.c:4893 ... </TASK> Fixes: b07d5cc93e1b ("ovl: update of dentry revalidate flags after copy up") Reported-by: syzbot+316db8a1191938280eb6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=316db8a1191938280eb6 Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Link: https://lore.kernel.org/r/20250214215148.761147-1-kovalev@altlinux.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-06ovl: pass realinode to ovl_encode_real_fh() instead of realdentryAmir Goldstein
We want to be able to encode an fid from an inode with no alias. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250105162404.357058-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-19fs: relax assertions on failure to encode file handlesAmir Goldstein
Encoding file handles is usually performed by a filesystem >encode_fh() method that may fail for various reasons. The legacy users of exportfs_encode_fh(), namely, nfsd and name_to_handle_at(2) syscall are ready to cope with the possibility of failure to encode a file handle. There are a few other users of exportfs_encode_{fh,fid}() that currently have a WARN_ON() assertion when ->encode_fh() fails. Relax those assertions because they are wrong. The second linked bug report states commit 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles") in v6.6 as the regressing commit, but this is not accurate. The aforementioned commit only increases the chances of the assertion and allows triggering the assertion with the reproducer using overlayfs, inotify and drop_caches. Triggering this assertion was always possible with other filesystems and other reasons of ->encode_fh() failures and more particularly, it was also possible with the exact same reproducer using overlayfs that is mounted with options index=on,nfs_export=on also on kernels < v6.6. Therefore, I am not listing the aforementioned commit as a Fixes commit. Backport hint: this patch will have a trivial conflict applying to v6.6.y, and other trivial conflicts applying to stable kernels < v6.6. Reported-by: syzbot+ec07f6f5ce62b858579f@syzkaller.appspotmail.com Tested-by: syzbot+ec07f6f5ce62b858579f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@google.com/ Reported-by: Dmitry Safonov <dima@arista.com> Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3F+Juqy_o6oP8uw@mail.gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-22Merge tag 'ovl-update-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - Fix a syzbot reported NULL pointer deref with bfs lower layers - Fix a copy up failure of large file from lower fuse fs - Followup cleanup of backing_file API from Miklos - Introduction and use of revert/override_creds_light() helpers, that were suggested by Christian as a mitigation to cache line bouncing and false sharing of fields in overlayfs creator_cred long lived struct cred copy. - Store up to two backing file references (upper and lower) in an ovl_file container instead of storing a single backing file in file->private_data. This is used to avoid the practice of opening a short lived backing file for the duration of some file operations and to avoid the specialized use of FDPUT_FPUT in such occasions, that was getting in the way of Al's fd_file() conversions. * tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: Filter invalid inodes with missing lookup function ovl: convert ovl_real_fdget() callers to ovl_real_file() ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path() ovl: store upper real file in ovl_file struct ovl: allocate a container struct ovl_file for ovl private context ovl: do not open non-data lower file for fsync ovl: Optimize override/revert creds ovl: pass an explicit reference of creators creds to callers ovl: use wrapper ovl_revert_creds() fs/backing-file: Convert to revert/override_creds_light() cred: Add a light version of override/revert_creds() backing-file: clean up the API ovl: properly handle large files in ovl_security_fileattr
2024-11-11ovl: use wrapper ovl_revert_creds()Vinicius Costa Gomes
Introduce ovl_revert_creds() wrapper of revert_creds() to match callers of ovl_override_creds(). Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-10-07remove pointless includes of <linux/fdtable.h>Al Viro
some of those used to be needed, some had been cargo-culted for no reason... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-09-19Merge tag 'ovl-update-6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - Increase robustness of overlayfs to crashes in the case of underlying filesystems that to not guarantee metadata ordering to persistent storage (problem was reported with ubifs). - Deny mount inside container with features that require root privileges to work properly, instead of failing operations later. - Some clarifications to overlayfs documentation. * tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: fail if trusted xattrs are needed but caller lacks permission overlayfs.rst: update metacopy section in overlayfs documentation ovl: fsync after metadata copy-up ovl: don't set the superblock's errseq_t manually
2024-08-30ovl: fsync after metadata copy-upAmir Goldstein
For upper filesystems which do not use strict ordering of persisting metadata changes (e.g. ubifs), when overlayfs file is modified for the first time, copy up will create a copy of the lower file and its parent directories in the upper layer. Permission lost of the new upper parent directory was observed during power-cut stress test. Fix by moving the fsync call to after metadata copy to make sure that the metadata copied up directory and files persists to disk before renaming from tmp to final destination. With metacopy enabled, this change will hurt performance of workloads such as chown -R, so we keep the legacy behavior of fsync only on copyup of data. Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxj-pOvmw1-uXR3qVdqtLjSkwcR9nVKcNU_vC10Zyf2miQ@mail.gmail.com/ Reported-and-tested-by: Fei Lv <feilv@asrmicro.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-07-31lsm: Refactor return value of LSM hook inode_copy_up_xattrXu Kuohai
To be consistent with most LSM hooks, convert the return value of hook inode_copy_up_xattr to 0 or a negative error code. Before: - Hook inode_copy_up_xattr returns 0 when accepting xattr, 1 when discarding xattr, -EOPNOTSUPP if it does not know xattr, or any other negative error code otherwise. After: - Hook inode_copy_up_xattr returns 0 when accepting xattr, *-ECANCELED* when discarding xattr, -EOPNOTSUPP if it does not know xattr, or any other negative error code otherwise. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-04-09security: allow finer granularity in permitting copy-up of security xattrsStefan Berger
Copying up xattrs is solely based on the security xattr name. For finer granularity add a dentry parameter to the security_inode_copy_up_xattr hook definition, allowing decisions to be based on the xattr content as well. Co-developed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM,SELinux) Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-03-17ovl: relax WARN_ON in ovl_verify_area()Amir Goldstein
syzbot hit an assertion in copy up data loop which looks like it is the result of a lower file whose size is being changed underneath overlayfs. This type of use case is documented to cause undefined behavior, so returning EIO error for the copy up makes sense, but it should not be causing a WARN_ON assertion. Reported-and-tested-by: syzbot+3abd99031b42acf367ef@syzkaller.appspotmail.com Fixes: ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()") Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-02-06remap_range: merge do_clone_file_range() into vfs_clone_file_range()Amir Goldstein
commit dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") moved the permission hooks from do_clone_file_range() out to its caller vfs_clone_file_range(), but left all the fast sanity checks in do_clone_file_range(). This makes the expensive security hooks be called in situations that they would not have been called before (e.g. fs does not support clone). The only reason for the do_clone_file_range() helper was that overlayfs did not use to be able to call vfs_clone_file_range() from copy up context with sb_writers lock held. However, since commit c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held"), overlayfs just uses an open coded version of vfs_clone_file_range(). Merge_clone_file_range() into vfs_clone_file_range(), restoring the original order of checks as it was before the regressing commit and adapt the overlayfs code to call vfs_clone_file_range() before the permission hooks that were added by commit ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()"). Note that in the merge of do_clone_file_range(), the file_start_write() context was reduced to cover ->remap_file_range() without holding it over the permission hooks, which was the reason for doing the regressing commit in the first place. Reported-and-tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401312229.eddeb9a6-oliver.sang@intel.com Fixes: dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202102258.1582671-1-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-11Merge tag 'pull-rename' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change
2024-01-10Merge tag 'ovl-update-6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: "This is a very small update with no bug fixes and no new features. The larger update of overlayfs for this cycle, the re-factoring of overlayfs code into generic backing_file helpers, was already merged via Christian. Summary: - Simplify/clarify some code No bug fixes here, just some changes following questions from Al about overlayfs code that could be a little more simple to follow. - Overlayfs documentation style fixes Mainly fixes for ReST formatting suggested by documentation developers" * tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: overlayfs.rst: fix ReST formatting overlayfs.rst: use consistent feature names ovl: initialize ovl_copy_up_ctx.destname inside ovl_do_copy_up() ovl: remove redundant ofs->indexdir member
2024-01-08Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds
Pull vfs rw updates from Christian Brauner: "This contains updates from Amir for read-write backing file helpers for stacking filesystems such as overlayfs: - Fanotify is currently in the process of introducing pre content events. Roughly, a new permission event will be added indicating that it is safe to write to the file being accessed. These events are used by hierarchical storage managers to e.g., fill the content of files on first access. During that work we noticed that our current permission checking is inconsistent in rw_verify_area() and remap_verify_area(). Especially in the splice code permission checking is done multiple times. For example, one time for the whole range and then again for partial ranges inside the iterator. In addition, we mostly do permission checking before we call file_start_write() except for a few places where we call it after. For pre-content events we need such permission checking to be done before file_start_write(). So this is a nice reason to clean this all up. After this series, all permission checking is done before file_start_write(). As part of this cleanup we also massaged the splice code a bit. We got rid of a few helpers because we are alredy drowning in special read-write helpers. We also cleaned up the return types for splice helpers. - Introduce generic read-write helpers for backing files. This lifts some overlayfs code to common code so it can be used by the FUSE passthrough work coming in over the next cycles. Make Amir and Miklos the maintainers for this new subsystem of the vfs" * tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) fs: fix __sb_write_started() kerneldoc formatting fs: factor out backing_file_mmap() helper fs: factor out backing_file_splice_{read,write}() helpers fs: factor out backing_file_{read,write}_iter() helpers fs: prepare for stackable filesystems backing file helpers fsnotify: optionally pass access range in file permission hooks fsnotify: assert that file_start_write() is not held in permission hooks fsnotify: split fsnotify_perm() into two hooks fs: use splice_copy_file_range() inline helper splice: return type ssize_t from all helpers fs: use do_splice_direct() for nfsd/ksmbd server-side-copy fs: move file_start_write() into direct_splice_actor() fs: fork splice_file_range() from do_splice_direct() fs: create {sb,file}_write_not_started() helpers fs: create file_write_started() helper fs: create __sb_write_started() helper fs: move kiocb_start_write() into vfs_iocb_iter_write() fs: move permission hook out of do_iter_read() fs: move permission hook out of do_iter_write() fs: move file_start_write() into vfs_iter_write() ...
2023-12-17ovl: fix dentry reference leak after changes to underlying layersAmir Goldstein
syzbot excercised the forbidden practice of moving the workdir under lowerdir while overlayfs is mounted and tripped a dentry reference leak. Fixes: c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held") Reported-and-tested-by: syzbot+8608bb4553edb8c78f41@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-12-12splice: return type ssize_t from all helpersAmir Goldstein
Not sure why some splice helpers return long, maybe historic reasons. Change them all to return ssize_t to conform to the splice methods and to the rest of the helpers. Suggested-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20231208-horchen-helium-d3ec1535ede5@brauner/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231212094440.250945-2-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-01fs: move file_start_write() into direct_splice_actor()Amir Goldstein
The callers of do_splice_direct() hold file_start_write() on the output file. This may cause file permission hooks to be called indirectly on an overlayfs lower layer, which is on the same filesystem of the output file and could lead to deadlock with fanotify permission events. To fix this potential deadlock, move file_start_write() from the callers into the direct_splice_actor(), so file_start_write() will not be held while splicing from the input file. Suggested-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231128214258.GA2398475@perftesting/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231130141624.3338942-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-25rename(): avoid a deadlock in the case of parents having no common ancestorAl Viro
... and fix the directory locking documentation and proof of correctness. Holding ->s_vfs_rename_mutex *almost* prevents ->d_parent changes; the case where we really don't want it is splicing the root of disconnected tree to somewhere. In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an ancestor of Y" only if X and Y are already in the same tree. Otherwise it can go from false to true, and one can construct a deadlock on that. Make lock_two_directories() report an error in such case and update the callers of lock_rename()/lock_rename_child() to handle such errors. And yes, such conditions are not impossible to create ;-/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-24ovl: add permission hooks outside of do_splice_direct()Amir Goldstein
The main callers of do_splice_direct() also call rw_verify_area() for the entire range that is being copied, e.g. by vfs_copy_file_range() or do_sendfile() before calling do_splice_direct(). The only caller that does not have those checks for entire range is ovl_copy_up_file(). In preparation for removing the checks inside do_splice_direct(), add rw_verify_area() call in ovl_copy_up_file(). For extra safety, perform minimal sanity checks from rw_verify_area() for non negative offsets also in the copy up do_splice_direct() loop without calling the file permission hooks. This is needed for fanotify "pre content" events. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-2-amir73il@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-20ovl: initialize ovl_copy_up_ctx.destname inside ovl_do_copy_up()Amir Goldstein
The ->destname member of struct ovl_copy_up_ctx is initialized inside ovl_copy_up_one() to ->d_name of the overlayfs dentry being copied up and then it may be overridden by index name inside ovl_do_copy_up(). ovl_inode_lock() in ovl_copy_up_start() and ovl_copy_up() in ovl_rename() effectively stabilze ->d_name of the overlayfs dentry being copied up, but ovl_inode_lock() is not held when ->d_name is being read. It is not a correctness bug, because if ovl_do_copy_up() races with ovl_rename() and ctx.destname is freed, we will not end up calling ovl_do_copy_up() with the dead name reference. The code becomes much easier to understand and to document if the initialization of c->destname is always done inside ovl_do_copy_up(), either to the index entry name, or to the overlay dentry ->d_name. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: do not encode lower fh with upper sb_writers heldAmir Goldstein
When lower fs is a nested overlayfs, calling encode_fh() on a lower directory dentry may trigger copy up and take sb_writers on the upper fs of the lower nested overlayfs. The lower nested overlayfs may have the same upper fs as this overlayfs, so nested sb_writers lock is illegal. Move all the callers that encode lower fh to before ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: do not open/llseek lower file with upper sb_writers heldAmir Goldstein
overlayfs file open (ovl_maybe_lookup_lowerdata) and overlay file llseek take the ovl_inode_lock, without holding upper sb_writers. In case of nested lower overlay that uses same upper fs as this overlay, lockdep will warn about (possibly false positive) circular lock dependency when doing open/llseek of lower ovl file during copy up with our upper sb_writers held, because the locking ordering seems reverse to the locking order in ovl_copy_up_start(): - lower ovl_inode_lock - upper sb_writers Let the copy up "transaction" keeps an elevated mnt write count on upper mnt, but leaves taking upper sb_writers to lower level helpers only when they actually need it. This allows to avoid holding upper sb_writers during lower file open/llseek and prevents the lockdep warning. Minimizing the scope of upper sb_writers during copy up is also needed for fixing another possible deadlocks by a following patch. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: reorder ovl_want_write() after ovl_inode_lock()Amir Goldstein
Make the locking order of ovl_inode_lock() strictly between the two vfs stacked layers, i.e.: - ovl vfs locks: sb_writers, inode_lock, ... - ovl_inode_lock - upper vfs locks: sb_writers, inode_lock, ... To that effect, move ovl_want_write() into the helpers ovl_nlink_start() and ovl_copy_up_start which currently take the ovl_inode_lock() after ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-09-26Merge tag 'v6.6-rc4.vfs.fixes' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains the usual miscellaneous fixes and cleanups for vfs and individual fses: Fixes: - Revert ki_pos on error from buffered writes for direct io fallback - Add missing documentation for block device and superblock handling for changes merged this cycle - Fix reiserfs flexible array usage - Ensure that overlayfs sets ctime when setting mtime and atime - Disable deferred caller completions with overlayfs writes until proper support exists Cleanups: - Remove duplicate initialization in pipe code - Annotate aio kioctx_table with __counted_by" * tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: overlayfs: set ctime when setting mtime and atime ntfs3: put resources during ntfs_fill_super() ovl: disable IOCB_DIO_CALLER_COMP porting: document superblock as block device holder porting: document new block device opening order fs/pipe: remove duplicate "offset" initializer fs-writeback: do not requeue a clean inode having skipped pages aio: Annotate struct kioctx_table with __counted_by direct_write_fallback(): on error revert the ->ki_pos update from buffered write reiserfs: Replace 1-element array with C99 style flex-array
2023-09-25overlayfs: set ctime when setting mtime and atimeJeff Layton
Nathan reported that he was seeing the new warning in setattr_copy_mgtime pop when starting podman containers. Overlayfs is trying to set the atime and mtime via notify_change without also setting the ctime. POSIX states that when the atime and mtime are updated via utimes() that we must also update the ctime to the current time. The situation with overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask. notify_change will fill in the value. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Amir Goldstein <amir73il@gmail.com> Message-Id: <20230913-ctime-v1-1-c6bc509cbc27@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-04ovl: fix failed copyup of fileattr on a symlinkAmir Goldstein
Some local filesystems support setting persistent fileattr flags (e.g. FS_NOATIME_FL) on directories and regular files via ioctl. Some of those persistent fileattr flags are reflected to vfs as in-memory inode flags (e.g. S_NOATIME). Overlayfs uses the in-memory inode flags (e.g. S_NOATIME) on a lower file as an indication that a the lower file may have persistent inode fileattr flags (e.g. FS_NOATIME_FL) that need to be copied to upper file. However, in some cases, the S_NOATIME in-memory flag could be a false indication for persistent FS_NOATIME_FL fileattr. For example, with NFS and FUSE lower fs, as was the case in the two bug reports, the S_NOATIME flag is set unconditionally for all inodes. Users cannot set persistent fileattr flags on symlinks and special files, but in some local fs, such as ext4/btrfs/tmpfs, the FS_NOATIME_FL fileattr flag are inheritted to symlinks and special files from parent directory. In both cases described above, when lower symlink has the S_NOATIME flag, overlayfs will try to copy the symlink's fileattrs and fail with error ENOXIO, because it could not open the symlink for the ioctl security hook. To solve this failure, do not attempt to copyup fileattrs for anything other than directories and regular files. Reported-by: Ruiwen Zhao <ruiwen@google.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217850 Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags") Cc: <stable@vger.kernel.org> # v5.15 Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12ovl: make consistent use of OVL_FS()Andrea Righi
Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a struct super_block. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12ovl: add support for unique fsid per instanceAmir Goldstein
The legacy behavior of ovl_statfs() reports the f_fsid filled by underlying upper fs. This fsid is not unique among overlayfs instances on the same upper fs. With mount option uuid=on, generate a non-persistent uuid per overlayfs instance and use it as the seed for f_fsid, similar to tmpfs. This is useful for reporting fanotify events with fid info from different instances of overlayfs over the same upper fs. The old behavior of null uuid and upper fs fsid is retained with the mount option uuid=null, which is the default. The mount option uuid=off that disables uuid checks in underlying layers also retains the legacy behavior. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12ovl: Handle verity during copy-upAlexander Larsson
During regular metacopy, if lowerdata file has fs-verity enabled, and the verity option is enabled, we add the digest to the metacopy xattr. If verity is required, and lowerdata does not have fs-verity enabled, fall back to full copy-up (or the generated metacopy would not validate). Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12ovl: Validate verity xattr when resolving lowerdataAlexander Larsson
The new digest field in the metacopy xattr is used during lookup to record whether the header contained a digest in the OVL_HAS_DIGEST flags. When accessing file data the first time, if OVL_HAS_DIGEST is set, we reload the metadata and check that the source lowerdata inode matches the specified digest in it (according to the enabled verity options). If the verity check passes we store this info in the inode flags as OVL_VERIFIED_DIGEST, so that we can avoid doing it again if the inode remains in memory. The verification is done in ovl_maybe_validate_verity() which needs to be called in the same places as ovl_maybe_lookup_lowerdata(), so there is a new ovl_verify_lowerdata() helper that calls these in the right order, and all current callers of ovl_maybe_lookup_lowerdata() are changed to call it instead. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-19ovl: implement lazy lookup of lowerdata in data-only layersAmir Goldstein
Defer lookup of lowerdata in the data-only layers to first data access or before copy up. We perform lowerdata lookup before copy up even if copy up is metadata only copy up. We can further optimize this lookup later if needed. We do best effort lazy lookup of lowerdata for d_real_inode(), because this interface does not expect errors. The only current in-tree caller of d_real_inode() is trace_uprobe and this caller is likely going to be followed reading from the file, before placing uprobes on offset within the file, so lowerdata should be available when setting the uprobe. Tested-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19ovl: update of dentry revalidate flags after copy upAmir Goldstein
After copy up, we may need to update d_flags if upper dentry is on a remote fs and lower dentries are not. Add helpers to allow incremental update of the revalidate flags. Fixes: bccece1ead36 ("ovl: allow remote upper") Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-03-06ovl: check for ->listxattr() supportChristian Brauner
We have decoupled vfs_listxattr() from IOP_XATTR. Instead we just need to check whether inode->i_op->listxattr is implemented. Cc: linux-unionfs@vger.kernel.org Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-27ovl: fail on invalid uid/gid mapping at copy upMiklos Szeredi
If st_uid/st_gid doesn't have a mapping in the mounter's user_ns, then copy-up should fail, just like it would fail if the mounter task was doing the copy using "cp -a". There's a corner case where the "cp -a" would succeed but copy up fail: if there's a mapping of the invalid uid/gid (65534 by default) in the user namespace. This is because stat(2) will return this value if the mapping doesn't exist in the current user_ns and "cp -a" will in turn be able to create a file with this uid/gid. This behavior would be inconsistent with POSIX ACL's, which return -1 for invalid uid/gid which result in a failed copy. For consistency and simplicity fail the copy of the st_uid/st_gid are invalid. Fixes: 459c7c565ac3 ("ovl: unprivieged mounts") Cc: <stable@vger.kernel.org> # v5.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Seth Forshee <sforshee@kernel.org>
2023-01-27ovl: fix tmpfile leakMiklos Szeredi
Missed an error cleanup. Reported-by: syzbot+fd749a7ea127a84e0ffd@syzkaller.appspotmail.com Fixes: 2b1a77461f16 ("ovl: use vfs_tmpfile_open() helper") Cc: <stable@vger.kernel.org> # v6.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-10-20ovl: use posix acl apiChristian Brauner
Now that posix acls have a proper api us it to copy them. All filesystems that can serve as lower or upper layers for overlayfs have gained support for the new posix acl api in previous patches. So switch all internal overlayfs codepaths for copying posix acls to the new posix acl api. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-10Merge tag 'pull-tmpfile' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs tmpfile updates from Al Viro: "Miklos' ->tmpfile() signature change; pass an unopened struct file to it, let it open the damn thing. Allows to add tmpfile support to FUSE" * tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse: implement ->tmpfile() vfs: open inside ->tmpfile() vfs: move open right after ->tmpfile() vfs: make vfs_tmpfile() static ovl: use vfs_tmpfile_open() helper cachefiles: use vfs_tmpfile_open() helper cachefiles: only pass inode to *mark_inode_inuse() helpers cachefiles: tmpfile error handling cleanup hugetlbfs: cleanup mknod and tmpfile vfs: add vfs_tmpfile_open() helper
2022-09-24ovl: use vfs_tmpfile_open() helperMiklos Szeredi
If tmpfile is used for copy up, then use this helper to create the tmpfile and open it at the same time. This will later allow filesystems such as fuse to do this operation atomically. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-09-01overlayfs: constify pathAl Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-03Merge tag 'pull-work.lseek' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs lseek updates from Al Viro: "Jason's lseek series. Saner handling of 'lseek should fail with ESPIPE' - this gets rid of the magical no_llseek thing and makes checks consistent. In particular, the ad-hoc "can we do splice via internal pipe" checks got saner (and somewhat more permissive, which is what Jason had been after, AFAICT)" * tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove no_llseek fs: check FMODE_LSEEK to control internal pipe splicing vfio: do not set FMODE_LSEEK flag dma-buf: remove useless FMODE_LSEEK flag fs: do not compare against ->llseek fs: clear or set FMODE_LSEEK based on llseek function