summaryrefslogtreecommitdiff
path: root/fs/fuse/file.c
AgeCommit message (Collapse)Author
2024-08-29fuse: refactor out shared logic in fuse_writepages_fill() and ↵Joanne Koong
fuse_writepage_locked() This change refactors the shared logic in fuse_writepages_fill() and fuse_writepages_locked() into two separate helper functions, fuse_writepage_args_page_fill() and fuse_writepage_args_setup(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-29fuse: move fuse file initialization to wpa allocation timeJoanne Koong
Before this change, wpa->ia.ff is initialized with an acquired reference on the fuse file right before it submits the writeback request. If there are auxiliary writebacks, then the initialization and reference acquisition needs to also be set before we submit the auxiliary writeback request. To make the logic simpler and to pave the way for a subsequent refactoring of fuse_writepages_fill() and fuse_writepage_locked(), this change initializes and acquires wpa->ia.ff when the wpa is allocated. No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-29fuse: convert fuse_writepages_fill() to use a folio for its tmp pageJoanne Koong
To pave the way for refactoring out the shared logic in fuse_writepages_fill() and fuse_writepage_locked(), this change converts the temporary page in fuse_writepages_fill() to use the folio API. This is similar to the change in commit e0887e095a80 ("fuse: Convert fuse_writepage_locked to take a folio"), which converted the tmp page in fuse_writepage_locked() to use the folio API. inc_node_page_state() is intentionally preserved here instead of converting to node_stat_add_folio() since it is updating the stat of the underlying page and to better maintain API symmetry with dec_node_page_stat() in fuse_writepage_finish_stat(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-29fuse: move initialization of fuse_file to fuse_writepages() instead of in ↵Joanne Koong
callback Prior to this change, data->ff is checked and if not initialized then initialized in the fuse_writepages_fill() callback, which gets called for every dirty page in the address space mapping. This logic is better placed in the main fuse_writepages() caller where data.ff is initialized before walking the dirty pages. No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-29fuse: refactor finished writeback stats updates into helper functionJoanne Koong
Move the logic for updating the bdi and page stats for a finished writeback into a separate helper function, where it can be called from both fuse_writepage_finish() and fuse_writepage_add() (in the case where there is already an auxiliary write request for the page). No functional changes added. Suggested by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-29fuse: drop unused fuse_mount arg in fuse_writepage_finish()Joanne Koong
Drop the unused "struct fuse_mount *fm" arg in fuse_writepage_finish(). No functional changes added. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-29fuse: add fast path for fuse_range_is_writebackyangyun
In some cases, the fi->writepages may be empty. And there is no need to check fi->writepages with spin_lock, which may have an impact on performance due to lock contention. For example, in scenarios where multiple readers read the same file without any writers, or where the page cache is not enabled. Also remove the outdated comment since commit 6b2fb79963fb ("fuse: optimize writepages search") has optimize the situation by replacing list with rb-tree. Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: update stats for pages in dropped aux writeback listJoanne Koong
In the case where the aux writeback list is dropped (e.g. the pages have been truncated or the connection is broken), the stats for its pages and backing device info need to be updated as well. Fixes: e2653bd53a98 ("fuse: fix leaked aux requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Cc: <stable@vger.kernel.org> # v5.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-07fs: Convert aops->write_begin to take a folioMatthew Wilcox (Oracle)
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07fs: Convert aops->write_end to take a folioMatthew Wilcox (Oracle)
Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07fuse: Convert fuse_write_begin() to use a folioMatthew Wilcox (Oracle)
Fetch a folio from the page cache instead of a page and use it throughout removing several calls to compound_head() and supporting large folios (in this function). We still have to convert back to a page for calling internal fuse functions, but hopefully they will be converted soon. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07fuse: Convert fuse_write_end() to use a folioMatthew Wilcox (Oracle)
Convert the passed page to a folio and operate on that. Replaces five calls to compound_head() with one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-08fuse: Convert fuse_readpages_end() to use folio_end_read()Matthew Wilcox (Oracle)
Nobody checks the error flag on fuse folios, so stop setting it. Optimise the (optional) setting of the uptodate flag and clearing of the lock flag by using folio_end_read(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-04-15fuse: fix wrong ff->iomode state changes from parallel dio writeAmir Goldstein
There is a confusion with fuse_file_uncached_io_{start,end} interface. These helpers do two things when called from passthrough open()/release(): 1. Take/drop negative refcount of fi->iocachectr (inode uncached io mode) 2. State change ff->iomode IOM_NONE <-> IOM_UNCACHED (file uncached open) The calls from parallel dio write path need to take a reference on fi->iocachectr, but they should not be changing ff->iomode state, because in this case, the fi->iocachectr reference does not stick around until file release(). Factor out helpers fuse_inode_uncached_io_{start,end}, to be used from parallel dio write path and rename fuse_file_*cached_io_{start,end} helpers to fuse_file_*cached_io_{open,release} to clarify the difference. Fixes: 205c1d802683 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-15Merge tag 'fuse-update-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add passthrough mode for regular file I/O. This allows performing read and write (also via memory maps) on a backing file without incurring the overhead of roundtrips to userspace. For now this is only allowed to privileged servers, but this limitation will go away in the future (Amir Goldstein) - Fix interaction of direct I/O mode with memory maps (Bernd Schubert) - Export filesystem tags through sysfs for virtiofs (Stefan Hajnoczi) - Allow resending queued requests for server crash recovery (Zhao Chen) - Misc fixes and cleanups * tag 'fuse-update-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (38 commits) fuse: get rid of ff->readdir.lock fuse: remove unneeded lock which protecting update of congestion_threshold fuse: Fix missing FOLL_PIN for direct-io fuse: remove an unnecessary if statement fuse: Track process write operations in both direct and writethrough modes fuse: Use the high bit of request ID for indicating resend requests fuse: Introduce a new notification type for resend pending requests fuse: add support for explicit export disabling fuse: __kuid_val/__kgid_val helpers in fuse_fill_attr_from_inode() fuse: fix typo for fuse_permission comment fuse: Convert fuse_writepage_locked to take a folio fuse: Remove fuse_writepage virtio_fs: remove duplicate check if queue is broken fuse: use FUSE_ROOT_ID in fuse_get_root_inode() fuse: don't unhash root fuse: fix root lookup with nonzero generation fuse: replace remaining make_bad_inode() with fuse_make_bad() virtiofs: drop __exit from virtio_fs_sysfs_exit() fuse: implement passthrough for mmap fuse: implement splice read/write passthrough ...
2024-03-06fuse: get rid of ff->readdir.lockMiklos Szeredi
The same protection is provided by file->f_pos_lock. Note, this relies on the fact that file->f_mode has FMODE_ATOMIC_POS. This flag is cleared by stream_open(), which would prevent locking of f_pos_lock. Prior to commit 7de64d521bf9 ("fuse: break up fuse_open_common()") FOPEN_STREAM on a directory would cause stream_open() to be called. After this commit this is not done anymore, so f_pos_lock will always be locked. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-06fuse: Fix missing FOLL_PIN for direct-ioLei Huang
Our user space filesystem relies on fuse to provide POSIX interface. In our test, a known string is written into a file and the content is read back later to verify correct data returned. We observed wrong data returned in read buffer in rare cases although correct data are stored in our filesystem. Fuse kernel module calls iov_iter_get_pages2() to get the physical pages of the user-space read buffer passed in read(). The pages are not pinned to avoid page migration. When page migration occurs, the consequence are two-folds. 1) Applications do not receive correct data in read buffer. 2) fuse kernel writes data into a wrong place. Using iov_iter_extract_pages() to pin pages fixes the issue in our test. An auxiliary variable "struct page **pt_pages" is used in the patch to prepare the 2nd parameter for iov_iter_extract_pages() since iov_iter_get_pages2() uses a different type for the 2nd parameter. [SzM] add iov_iter_extract_will_pin(ii) and unpin only if true. Signed-off-by: Lei Huang <lei.huang@linux.intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-06fuse: remove an unnecessary if statementJiachen Zhang
FUSE remote locking code paths never add any locking state to inode->i_flctx, so the locks_remove_posix() function called on file close will return without calling fuse_setlk(). Therefore, as the if statement to be removed in this commit will always be false, remove it for clearness. Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-06fuse: Track process write operations in both direct and writethrough modesZhou Jifeng
Due to the fact that fuse does not count the write IO of processes in the direct and writethrough write modes, user processes cannot track write_bytes through the “/proc/[pid]/io” path. For example, the system tool iotop cannot count the write operations of the corresponding process. Signed-off-by: Zhou Jifeng <zhoujifeng@kylinos.com.cn> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-05fuse: Convert fuse_writepage_locked to take a folioMatthew Wilcox (Oracle)
The one remaining caller of fuse_writepage_locked() already has a folio, so convert this function entirely. Saves a few calls to compound_head() but no attempt is made to support large folios in this patch. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-05fuse: Remove fuse_writepageMatthew Wilcox (Oracle)
The writepage operation is deprecated as it leads to worse performance under high memory pressure due to folios being written out in LRU order rather than sequentially within a file. Use filemap_migrate_folio() to support dirty folio migration instead of writepage. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-05fuse: implement passthrough for mmapAmir Goldstein
An mmap request for a file open in passthrough mode, maps the memory directly to the backing file. An mmap of a file in direct io mode, usually uses cached mmap and puts the inode in caching io mode, which denies new passthrough opens of that inode, because caching io mode is conflicting with passthrough io mode. For the same reason, trying to mmap a direct io file, while there is a passthrough file open on the same inode will fail with -ENODEV. An mmap of a file in direct io mode, also needs to wait for parallel dio writes in-progress to complete. If a passthrough file is opened, while an mmap of another direct io file is waiting for parallel dio writes to complete, the wait is aborted and mmap fails with -ENODEV. A FUSE server that uses passthrough and direct io opens on the same inode that may also be mmaped, is advised to provide a backing fd also for the files that are open in direct io mode (i.e. use the flags combination FOPEN_DIRECT_IO | FOPEN_PASSTHROUGH), so that mmap will always use the backing file, even if read/write do not passthrough. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-05fuse: implement splice read/write passthroughAmir Goldstein
This allows passing fstests generic/249 and generic/591. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-05fuse: implement read/write passthroughAmir Goldstein
Use the backing file read/write helpers to implement read/write passthrough to a backing file. After read/write, we invalidate a/c/mtime/size attributes. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-05fuse: implement open in passthrough modeAmir Goldstein
After getting a backing file id with FUSE_DEV_IOC_BACKING_OPEN ioctl, a FUSE server can reply to an OPEN request with flag FOPEN_PASSTHROUGH and the backing file id. The FUSE server should reuse the same backing file id for all the open replies of the same FUSE inode and open will fail (with -EIO) if a the server attempts to open the same inode with conflicting io modes or to setup passthrough to two different backing files for the same FUSE inode. Using the same backing file id for several different inodes is allowed. Opening a new file with FOPEN_DIRECT_IO for an inode that is already open for passthrough is allowed, but only if the FOPEN_PASSTHROUGH flag and correct backing file id are specified as well. The read/write IO of such files will not use passthrough operations to the backing file, but mmap, which does not support direct_io, will use the backing file insead of using the page cache as it always did. Even though all FUSE passthrough files of the same inode use the same backing file as a backing inode reference, each FUSE file opens a unique instance of a backing_file object to store the FUSE path that was used to open the inode and the open flags of the specific open file. The per-file, backing_file object is released along with the FUSE file. The inode associated fuse_backing object is released when the last FUSE passthrough file of that inode is released AND when the backing file id is closed by the server using the FUSE_DEV_IOC_BACKING_CLOSE ioctl. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-05fuse: prepare for opening file in passthrough modeAmir Goldstein
In preparation for opening file in passthrough mode, store the fuse_open_out argument in ff->args to be passed into fuse_file_io_open() with the optional backing_id member. This will be used for setting up passthrough to backing file on open reply with FOPEN_PASSTHROUGH flag and a valid backing_id. Opening a file in passthrough mode may fail for several reasons, such as missing capability, conflicting open flags or inode in caching mode. Return EIO from fuse_file_io_open() in those cases. The combination of FOPEN_PASSTHROUGH and FOPEN_DIRECT_IO is allowed - it mean that read/write operations will go directly to the server, but mmap will be done to the backing file. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAPAmir Goldstein
Instead of denying caching mode on parallel dio open, deny caching open only while parallel dio are in-progress and wait for in-progress parallel dio writes before entering inode caching io mode. This allows executing parallel dio when inode is not in caching mode even if shared mmap is allowed, but no mmaps have been performed on the inode in question. An mmap on direct_io file now waits for all in-progress parallel dio writes to complete, so parallel dio writes together with FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: introduce inode io modesAmir Goldstein
The fuse inode io mode is determined by the mode of its open files/mmaps and parallel dio opens and expressed in the value of fi->iocachectr: > 0 - caching io: files open in caching mode or mmap on direct_io file < 0 - parallel dio: direct io mode with parallel dio writes enabled == 0 - direct io: no files open in caching mode and no files mmaped Note that iocachectr value of 0 might become positive or negative, while non-parallel dio is getting processed. direct_io mmap uses page cache, so first mmap will mark the file as ff->io_opened and increment fi->iocachectr to enter the caching io mode. If the server opens the file in caching mode while it is already open for parallel dio or vice versa the open fails. This allows executing parallel dio when inode is not in caching mode and no mmaps have been performed on the inode in question. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: prepare for failing open responseAmir Goldstein
In preparation for inode io modes, a server open response could fail due to conflicting inode io modes. Allow returning an error from fuse_finish_open() and handle the error in the callers. fuse_finish_open() is used as the callback of finish_open(), so that FMODE_OPENED will not be set if fuse_finish_open() fails. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: break up fuse_open_common()Amir Goldstein
fuse_open_common() has a lot of code relevant only for regular files and O_TRUNC in particular. Copy the little bit of remaining code into fuse_dir_open() and stop using this common helper for directory open. Also split out fuse_dir_finish_open() from fuse_finish_open() before we add inode io modes to fuse_finish_open(). Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: allocate ff->release_args only if release is neededAmir Goldstein
This removed the need to pass isdir argument to fuse_put_file(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: factor out helper fuse_truncate_update_attr()Amir Goldstein
fuse_finish_open() is called from fuse_open_common() and from fuse_create_open(). In the latter case, the O_TRUNC flag is always cleared in finish_open()m before calling into fuse_finish_open(). Move the bits that update attribute cache post O_TRUNC open into a helper and call this helper from fuse_open_common() directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: add fuse_dio_lock/unlock helper functionsBernd Schubert
So far this is just a helper to remove complex locking logic out of fuse_direct_write_iter. Especially needed by the next patch in the series to that adds the fuse inode cache IO mode and adds in even more locking complexity. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: create helper function if DIO write needs exclusive lockBernd Schubert
This makes the code a bit easier to read and allows to more easily add more conditions when an exclusive lock is needed. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23fuse: fix VM_MAYSHARE and direct_io_allow_mmapBernd Schubert
There were multiple issues with direct_io_allow_mmap: - fuse_link_write_file() was missing, resulting in warnings in fuse_write_file_get() and EIO from msync() - "vma->vm_ops = &fuse_file_vm_ops" was not set, but especially fuse_page_mkwrite is needed. The semantics of invalidate_inode_pages2() is so far not clearly defined in fuse_file_mmap. It dates back to commit 3121bfe76311 ("fuse: fix "direct_io" private mmap") Though, as direct_io_allow_mmap is a new feature, that was for MAP_PRIVATE only. As invalidate_inode_pages2() is calling into fuse_launder_folio() and writes out dirty pages, it should be safe to call invalidate_inode_pages2 for MAP_PRIVATE and MAP_SHARED as well. Cc: Hao Xu <howeyxu@tencent.com> Cc: stable@vger.kernel.org Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode") Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-05fuse: adapt to breakup of struct file_lockJeff Layton
Most of the existing APIs have remained the same, but subsystems that access file_lock fields directly need to reach into struct file_lock_core now. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-39-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: split common fields into struct file_lock_coreJeff Layton
In a future patch, we're going to split file leases into their own structure. Since a lot of the underlying machinery uses the same fields move those into a new file_lock_core, and embed that inside struct file_lock. For now, add some macros to ensure that we can continue to build while the conversion is in progress. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-17-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-08Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds
Pull vfs rw updates from Christian Brauner: "This contains updates from Amir for read-write backing file helpers for stacking filesystems such as overlayfs: - Fanotify is currently in the process of introducing pre content events. Roughly, a new permission event will be added indicating that it is safe to write to the file being accessed. These events are used by hierarchical storage managers to e.g., fill the content of files on first access. During that work we noticed that our current permission checking is inconsistent in rw_verify_area() and remap_verify_area(). Especially in the splice code permission checking is done multiple times. For example, one time for the whole range and then again for partial ranges inside the iterator. In addition, we mostly do permission checking before we call file_start_write() except for a few places where we call it after. For pre-content events we need such permission checking to be done before file_start_write(). So this is a nice reason to clean this all up. After this series, all permission checking is done before file_start_write(). As part of this cleanup we also massaged the splice code a bit. We got rid of a few helpers because we are alredy drowning in special read-write helpers. We also cleaned up the return types for splice helpers. - Introduce generic read-write helpers for backing files. This lifts some overlayfs code to common code so it can be used by the FUSE passthrough work coming in over the next cycles. Make Amir and Miklos the maintainers for this new subsystem of the vfs" * tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) fs: fix __sb_write_started() kerneldoc formatting fs: factor out backing_file_mmap() helper fs: factor out backing_file_splice_{read,write}() helpers fs: factor out backing_file_{read,write}_iter() helpers fs: prepare for stackable filesystems backing file helpers fsnotify: optionally pass access range in file permission hooks fsnotify: assert that file_start_write() is not held in permission hooks fsnotify: split fsnotify_perm() into two hooks fs: use splice_copy_file_range() inline helper splice: return type ssize_t from all helpers fs: use do_splice_direct() for nfsd/ksmbd server-side-copy fs: move file_start_write() into direct_splice_actor() fs: fork splice_file_range() from do_splice_direct() fs: create {sb,file}_write_not_started() helpers fs: create file_write_started() helper fs: create __sb_write_started() helper fs: move kiocb_start_write() into vfs_iocb_iter_write() fs: move permission hook out of do_iter_read() fs: move permission hook out of do_iter_write() fs: move file_start_write() into vfs_iter_write() ...
2023-12-12fs: use splice_copy_file_range() inline helperAmir Goldstein
generic_copy_file_range() is just a wrapper around splice_file_range(), which caps the maximum copy length. The only caller of splice_file_range(), namely __ceph_copy_file_range() is already ready to cope with short copy. Move the length capping into splice_file_range() and replace the exported symbol generic_copy_file_range() with a simple inline helper. Suggested-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-fsdevel/20231204083849.GC32438@lst.de/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231212094440.250945-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-04fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAPAmir Goldstein
The new fuse init flag FUSE_DIRECT_IO_ALLOW_MMAP breaks assumptions made by FOPEN_PARALLEL_DIRECT_WRITES and causes test generic/095 to hit BUG_ON(fi->writectr < 0) assertions in fuse_set_nowrite(): generic/095 5s ... kernel BUG at fs/fuse/dir.c:1756! ... ? fuse_set_nowrite+0x3d/0xdd ? do_raw_spin_unlock+0x88/0x8f ? _raw_spin_unlock+0x2d/0x43 ? fuse_range_is_writeback+0x71/0x84 fuse_sync_writes+0xf/0x19 fuse_direct_io+0x167/0x5bd fuse_direct_write_iter+0xf0/0x146 Auto disable FOPEN_PARALLEL_DIRECT_WRITES when server negotiated FUSE_DIRECT_IO_ALLOW_MMAP. Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode") Cc: <stable@vger.kernel.org> # v6.6 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAPTyler Fanelli
Although DIRECT_IO_RELAX's initial usage is to allow shared mmap, its description indicates a purpose of reducing memory footprint. This may imply that it could be further used to relax other DIRECT_IO operations in the future. Replace it with a flag DIRECT_IO_ALLOW_MMAP which does only one thing, allow shared mmap of DIRECT_IO files while still bypassing the cache on regular reads and writes. [Miklos] Also Keep DIRECT_IO_RELAX definition for backward compatibility. Signed-off-by: Tyler Fanelli <tfanelli@redhat.com> Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode") Cc: <stable@vger.kernel.org> # v6.6 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16fuse: write back dirty pages before direct write in direct_io_relax modeHao Xu
In direct_io_relax mode, there can be shared mmaped files and thus dirty pages in its page cache. Therefore those dirty pages should be written back to backend before direct io to avoid data loss. Signed-off-by: Hao Xu <howeyxu@tencent.com> Reviewed-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16fuse: add a new fuse init flag to relax restrictions in no cache modeHao Xu
FOPEN_DIRECT_IO is usually set by fuse daemon to indicate need of strong coherency, e.g. network filesystems. Thus shared mmap is disabled since it leverages page cache and may write to it, which may cause inconsistence. But FOPEN_DIRECT_IO can be used not for coherency but to reduce memory footprint as well, e.g. reduce guest memory usage with virtiofs. Therefore, add a new fuse init flag FUSE_DIRECT_IO_RELAX to relax restrictions in that mode, currently, it allows shared mmap. One thing to note is to make sure it doesn't break coherency in your use case. Signed-off-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16fuse: invalidate page cache pages before direct writeHao Xu
In FOPEN_DIRECT_IO, page cache may still be there for a file since private mmap is allowed. Direct write should respect that and invalidate the corresponding pages so that page cache readers don't get stale data. Signed-off-by: Hao Xu <howeyxu@tencent.com> Tested-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-14Revert "fuse: in fuse_flush only wait if someone wants the return code"Miklos Szeredi
This reverts commit 5a8bee63b10f6f2f52f6d22e109a4a147409842a. Jürg Billeter reports the following regression: Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if someone wants the return code") `fput()` is called asynchronously if a file is closed as part of a process exiting, i.e., if there was no explicit `close()` before exit. If the file was open for writing, also `put_write_access()` is called asynchronously as part of the async `fput()`. If that newly written file is an executable, attempting to `execve()` the new file can fail with `ETXTBSY` if it's called after the writer process exited but before the async `fput()` has run. Reported-and-tested-by: "Jürg Billeter" <j@bitron.ch> Cc: <stable@vger.kernel.org> # v6.3 Link: https://lore.kernel.org/all/4f66cded234462964899f2a661750d6798a57ec0.camel@bitron.ch/ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-09fuse: use direct_write_fallbackChristoph Hellwig
Use the generic direct_write_fallback helper instead of duplicating the logic. Link: https://lkml.kernel.org/r/20230601145904.1385409-13-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09fuse: drop redundant arguments to fuse_perform_writeChristoph Hellwig
pos is always equal to iocb->ki_pos, and mapping is always equal to iocb->ki_filp->f_mapping. Link: https://lkml.kernel.org/r/20230601145904.1385409-12-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09fuse: update ki_pos in fuse_perform_writeChristoph Hellwig
Both callers of fuse_perform_write need to updated ki_pos, move it into common code. Link: https://lkml.kernel.org/r/20230601145904.1385409-11-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09backing_dev: remove current->backing_dev_infoChristoph Hellwig
Patch series "cleanup the filemap / direct I/O interaction", v4. This series cleans up some of the generic write helper calling conventions and the page cache writeback / invalidation for direct I/O. This is a spinoff from the no-bufferhead kernel project, for which we'll want to an use iomap based buffered write path in the block layer. This patch (of 12): The last user of current->backing_dev_info disappeared in commit b9b1335e6403 ("remove bdi_congested() and wb_congested() and related functions"). Remove the field and all assignments to it. Link: https://lkml.kernel.org/r/20230601145904.1385409-1-hch@lst.de Link: https://lkml.kernel.org/r/20230601145904.1385409-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>