summaryrefslogtreecommitdiff
path: root/fs/f2fs/file.c
AgeCommit message (Collapse)Author
4 daysMerge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
4 daysMerge tag 'vfs-6.17-rc1.mmap_prepare' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_*mmap() users to .mmap_prepare() fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-04tree-wide: s/struct fileattr/struct file_kattr/gChristian Brauner
Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-19fs: replace mmap hook with .mmap_prepare for simple mappingsLorenzo Stoakes
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"), the f_op->mmap() hook has been deprecated in favour of f_op->mmap_prepare(). This callback is invoked in the mmap() logic far earlier, so error handling can be performed more safely without complicated and bug-prone state unwinding required should an error arise. This hook also avoids passing a pointer to a not-yet-correctly-established VMA avoiding any issues with referencing this data structure. It rather provides a pointer to the new struct vm_area_desc descriptor type which contains all required state and allows easy setting of required parameters without any consideration needing to be paid to locking or reference counts. Note that nested filesystems like overlayfs are compatible with an .mmap_prepare() callback since commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems"). In this patch we apply this change to file systems with relatively simple mmap() hook logic - exfat, ceph, f2fs, bcachefs, zonefs, btrfs, ocfs2, orangefs, nilfs2, romfs, ramfs and aio. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/f528ac4f35b9378931bd800920fee53fc0c5c74d.1750099179.git.lorenzo.stoakes@oracle.com Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-18f2fs: fix to zero post-eof pageChao Yu
fstest reports a f2fs bug: generic/363 42s ... [failed, exit status 1]- output mismatch (see /share/git/fstests/results//generic/363.out.bad) --- tests/generic/363.out 2025-01-12 21:57:40.271440542 +0800 +++ /share/git/fstests/results//generic/363.out.bad 2025-05-19 19:55:58.000000000 +0800 @@ -1,2 +1,78 @@ QA output created by 363 fsx -q -S 0 -e 1 -N 100000 +READ BAD DATA: offset = 0xd6fb, size = 0xf044, fname = /mnt/f2fs/junk +OFFSET GOOD BAD RANGE +0x1540d 0x0000 0x2a25 0x0 +operation# (mod 256) for the bad data may be 37 +0x1540e 0x0000 0x2527 0x1 ... (Run 'diff -u /share/git/fstests/tests/generic/363.out /share/git/fstests/results//generic/363.out.bad' to see the entire diff) Ran: generic/363 Failures: generic/363 Failed 1 of 1 tests The root cause is user can update post-eof page via mmap [1], however, f2fs missed to zero post-eof page in below operations, so, once it expands i_size, then it will include dummy data locates previous post-eof page, so during below operations, we need to zero post-eof page. Operations which can include dummy data after previous i_size after expanding i_size: - write - mapwrite [1] - truncate - fallocate * preallocate * zero_range * insert_range * collapse_range - clone_range (doesn’t support in f2fs) - copy_range (doesn’t support in f2fs) [1] https://man7.org/linux/man-pages/man2/mmap.2.html 'BUG section' Cc: stable@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08f2fs: remove wbc->for_reclaim handlingChristoph Hellwig
Since commits 7ff0104a8052 ("f2fs: Remove f2fs_write_node_page()") and 3b47398d9861 ("f2fs: Remove f2fs_write_meta_page()'), f2fs can't be called from reclaim context any more. Remove all code keyed of the wbc->for_reclaim flag, which is now only set for writing out swap or shmem pages inside the swap code, but never passed to file systems. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06f2fs: handle error cases of memory donationDaeho Jeong
In cases of removing memory donation, we need to handle some error cases like ENOENT and EACCES (indicating the range already has been donated). Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Pass a folio to get_dnode_addr()Matthew Wilcox (Oracle)
All callers except __get_inode_rdev() and __set_inode_rdev() now have a folio, but the only callers of those two functions do have a folio, so pass the folio to them and then into get_dnode_addr(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Convert dnode_of_data->node_page to node_folioMatthew Wilcox (Oracle)
All assignments to this struct member are conversions from a folio so convert it to be a folio and convert all users. At the same time, convert data_blkaddr() to take a folio as all callers now have a folio. Remove eight calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in redirty_blocks()Matthew Wilcox (Oracle)
Support large folios & simplify the loops in redirty_blocks(). Use the folio APIs and remove four calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in need_inode_page_update()Matthew Wilcox (Oracle)
Fetch a folio from the pagecache instead of a page. Removes two calls to compound_head() Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Pass a folio to f2fs_truncate_inline_inode()Matthew Wilcox (Oracle)
All callers now have a folio, so pass it in. Removes a call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Pass folios to set_new_dnode()Matthew Wilcox (Oracle)
Removes a lot of conversions of folios into pages. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in f2fs_do_truncate_blocks()Matthew Wilcox (Oracle)
Removes two calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in fill_zero()Matthew Wilcox (Oracle)
Remove three hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in f2fs_defragment_range()Matthew Wilcox (Oracle)
Remove three hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use a folio in __clone_blkaddrs()Matthew Wilcox (Oracle)
Removes five hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28f2fs: Use f2fs_folio_wait_writeback()Matthew Wilcox (Oracle)
There were some missing conversions from f2fs_wait_on_page_writeback() to f2fs_folio_wait_writeback(). Saves a call to compound_head() at each callsite. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10f2fs: add a fast path in finish_preallocate_blocks()Chao Yu
This patch uses i_sem to protect access/update on f2fs_inode_info.flag in finish_preallocate_blocks(), it avoids grabbing inode_lock() in each open(). Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-12f2fs: fix to avoid running out of free segmentsChao Yu
If checkpoint is disabled, GC can not reclaim any segments, we need to detect such condition and bail out from fallocate() of a pinfile, rather than letting allocator running out of free segment, which may cause f2fs to be shutdown. reproducer: mkfs.f2fs -f /dev/vda 16777216 mount -o checkpoint=disable:10% /dev/vda /mnt/f2fs for ((i=0;i<4096;i++)) do { dd if=/dev/zero of=/mnt/f2fs/$i bs=1M count=1; } done sync for ((i=0;i<4096;i+=2)) do { rm /mnt/f2fs/$i; } done sync touch /mnt/f2fs/pinfile f2fs_io pinfile set /mnt/f2fs/pinfile f2fs_io fallocate 0 0 4201644032 /mnt/f2fs/pinfile cat /sys/kernel/debug/f2fs/status output: - Free: 0 (0) Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11f2fs: do sanity check on inode footer in f2fs_get_inode_page()Chao Yu
This patch introduces a new wrapper f2fs_get_inode_page(), then, caller can use it to load inode block to page cache, meanwhile it will do sanity check on inode footer. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04f2fs: Convert truncate_partial_data_page() to use a folioMatthew Wilcox (Oracle)
Retrieve a folio from the page cache and use it throughout. Saves five hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-13f2fs: keep POSIX_FADV_NOREUSE rangesJaegeuk Kim
This patch records POSIX_FADV_NOREUSE ranges for users to reclaim the caches instantly off from LRU. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-12f2fs: fix to avoid panic once fallocation fails for pinfileChao Yu
syzbot reports a f2fs bug as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:2746! CPU: 0 UID: 0 PID: 5323 Comm: syz.0.0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0 RIP: 0010:get_new_segment fs/f2fs/segment.c:2746 [inline] RIP: 0010:new_curseg+0x1f52/0x1f70 fs/f2fs/segment.c:2876 Call Trace: <TASK> __allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3210 f2fs_allocate_new_section fs/f2fs/segment.c:3224 [inline] f2fs_allocate_pinning_section+0xfa/0x4e0 fs/f2fs/segment.c:3238 f2fs_expand_inode_data+0x696/0xca0 fs/f2fs/file.c:1830 f2fs_fallocate+0x537/0xa10 fs/f2fs/file.c:1940 vfs_fallocate+0x569/0x6e0 fs/open.c:327 do_vfs_ioctl+0x258c/0x2e40 fs/ioctl.c:885 __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0x80/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Concurrent pinfile allocation may run out of free section, result in panic in get_new_segment(), let's expand pin_sem lock coverage to include f2fs_gc(), so that we can make sure to reclaim enough free space for following allocation. In addition, do below changes to enhance error path handling: - call f2fs_bug_on() only in non-pinfile allocation path in get_new_segment(). - call reset_curseg_fields() to reset all fields of curseg in new_curseg() Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Reported-by: syzbot+15669ec8c35ddf6c3d43@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/675cd64e.050a0220.37aaf.00bb.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-12f2fs: add ioctl to get IO priority hintJaegeuk Kim
This patch adds an ioctl to give a per-file priority hint to attach REQ_PRIO. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22f2fs: Clean up the loop outside of f2fs_invalidate_blocks()Yi Sun
Now f2fs_invalidate_blocks() supports a continuous range of addresses, so the for loop can be omitted. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-16f2fs: Optimize f2fs_truncate_data_blocks_range()Yi Sun
Function f2fs_invalidate_blocks() can process consecutive blocks at a time, so f2fs_truncate_data_blocks_range() is optimized to use the new functionality of f2fs_invalidate_blocks(). Add two variables @blkstart and @blklen, @blkstart records the first address of the consecutive blocks, and @blkstart records the number of consecutive blocks. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-13f2fs: add parameter @len to f2fs_invalidate_blocks()Yi Sun
New function can process some consecutive blocks at a time. Function f2fs_invalidate_blocks()->down_write() and up_write() are very time-consuming, so if f2fs_invalidate_blocks() can process consecutive blocks at one time, it will save a lot of time. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-26Merge tag 'f2fs-for-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancements: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Fixes: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks" * tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to drop all discards after creating snapshot on lvm device f2fs: add a sysfs node to limit max read extent count per-inode f2fs: fix to shrink read extent node in batches f2fs: print message if fscorrupted was found in f2fs_new_node_page() f2fs: clear SBI_POR_DOING before initing inmem curseg f2fs: fix changing cursegs if recovery fails on zoned device f2fs: adjust unusable cap before checkpoint=disable mode f2fs: fix to requery extent which cross boundary of inquiry f2fs: fix to adjust appropriate length for fiemap f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK} f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow f2fs: replace deprecated strcpy with strscpy Revert "f2fs: remove unreachable lazytime mount option parsing" f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode f2fs: fix to map blocks correctly for direct write f2fs: fix race in concurrent f2fs_stop_gc_thread f2fs: fix fiemap failure issue when page size is 16KB f2fs: remove redundant atomic file check in defragment f2fs: fix to convert log type to segment data type correctly f2fs: clean up the unused variable additional_reserved_segments ...
2024-11-18Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...
2024-11-05f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inodeChao Yu
Jinsu Lee reported a performance regression issue, after commit 5c8764f8679e ("f2fs: fix to force buffered IO on inline_data inode"), we forced direct write to use buffered IO on inline_data inode, it will cause performace regression due to memory copy and data flush. It's fine to not force direct write to use buffered IO, as it can convert inline inode before committing direct write IO. Fixes: 5c8764f8679e ("f2fs: fix to force buffered IO on inline_data inode") Reported-by: Jinsu Lee <jinsu1.lee@samsung.com> Closes: https://lore.kernel.org/linux-f2fs-devel/af03dd2c-e361-4f80-b2fd-39440766cf6e@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05f2fs: fix race in concurrent f2fs_stop_gc_threadLong Li
In my test case, concurrent calls to f2fs shutdown report the following stack trace: Oops: general protection fault, probably for non-canonical address 0xc6cfff63bb5513fc: 0000 [#1] PREEMPT SMP PTI CPU: 0 UID: 0 PID: 678 Comm: f2fs_rep_shutdo Not tainted 6.12.0-rc5-next-20241029-g6fb2fa9805c5-dirty #85 Call Trace: <TASK> ? show_regs+0x8b/0xa0 ? __die_body+0x26/0xa0 ? die_addr+0x54/0x90 ? exc_general_protection+0x24b/0x5c0 ? asm_exc_general_protection+0x26/0x30 ? kthread_stop+0x46/0x390 f2fs_stop_gc_thread+0x6c/0x110 f2fs_do_shutdown+0x309/0x3a0 f2fs_ioc_shutdown+0x150/0x1c0 __f2fs_ioctl+0xffd/0x2ac0 f2fs_ioctl+0x76/0xe0 vfs_ioctl+0x23/0x60 __x64_sys_ioctl+0xce/0xf0 x64_sys_call+0x2b1b/0x4540 do_syscall_64+0xa7/0x240 entry_SYSCALL_64_after_hwframe+0x76/0x7e The root cause is a race condition in f2fs_stop_gc_thread() called from different f2fs shutdown paths: [CPU0] [CPU1] ---------------------- ----------------------- f2fs_stop_gc_thread f2fs_stop_gc_thread gc_th = sbi->gc_thread gc_th = sbi->gc_thread kfree(gc_th) sbi->gc_thread = NULL < gc_th != NULL > kthread_stop(gc_th->f2fs_gc_task) //UAF The commit c7f114d864ac ("f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()") attempted to fix this issue by using a read semaphore to prevent races between shutdown and remount threads, but it fails to prevent all race conditions. Fix it by converting to write lock of s_umount in f2fs_do_shutdown(). Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-03fdget(), more trivial conversionsAl Viro
all failure exits prior to fdget() leave the scope, all matching fdput() are immediately followed by leaving the scope. [xfs_ioc_commit_range() chunk moved here as well] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-01f2fs: remove redundant atomic file check in defragmentZhiguo Niu
f2fs_is_atomic_file(inode) is checked in f2fs_defragment_range, so remove the redundant checking in f2fs_ioc_defragment. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01f2fs: fix to parse temperature correctly in f2fs_get_segment_temp()Chao Yu
In __get_segment_type(), __get_segment_type_6() may return CURSEG_COLD_DATA_PINNED or CURSEG_ALL_DATA_ATGC log type, but following f2fs_get_segment_temp() can only handle persistent log type, fix it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01f2fs: introduce device aliasing fileDaeho Jeong
F2FS should understand how the device aliasing file works and support deleting the file after use. A device aliasing file can be created by mkfs.f2fs tool and it can map the whole device with an extent, not using node blocks. The file space should be pinned and normally used for read-only usages. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01f2fs: decrease spare area for pinned files for zoned devicesDaeho Jeong
Now we reclaim too much space before allocating pinned space for zoned devices. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-10-14f2fs: compress: fix inconsistent update of i_blocks in ↵Qi Han
release_compress_blocks and reserve_compress_blocks After release a file and subsequently reserve it, the FSCK flag is set when the file is deleted, as shown in the following backtrace: F2FS-fs (dm-48): Inconsistent i_blocks, ino:401231, iblocks:1448, sectors:1472 fs_rec_info_write_type+0x58/0x274 f2fs_rec_info_write+0x1c/0x2c set_sbi_flag+0x74/0x98 dec_valid_block_count+0x150/0x190 f2fs_truncate_data_blocks_range+0x2d4/0x3cc f2fs_do_truncate_blocks+0x2fc/0x5f0 f2fs_truncate_blocks+0x68/0x100 f2fs_truncate+0x80/0x128 f2fs_evict_inode+0x1a4/0x794 evict+0xd4/0x280 iput+0x238/0x284 do_unlinkat+0x1ac/0x298 __arm64_sys_unlinkat+0x48/0x68 invoke_syscall+0x58/0x11c For clusters of the following type, i_blocks are decremented by 1 and i_compr_blocks are incremented by 7 in release_compress_blocks, while updates to i_blocks and i_compr_blocks are skipped in reserve_compress_blocks. raw node: D D D D D D D D after compress: C D D D D D D D after reserve: C D D D D D D D Let's update i_blocks and i_compr_blocks properly in reserve_compress_blocks. Fixes: eb8fbaa53374 ("f2fs: compress: fix to check unreleased compressed cluster") Signed-off-by: Qi Han <hanqi@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-10-11f2fs: allow parallel DIO readsJaegeuk Kim
This fixes a regression which prevents parallel DIO reads. Fixes: 0cac51185e65 ("f2fs: fix to avoid racing in between read and OPU dio write") Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-24Merge tag 'f2fs-for-6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The main changes include converting major IO paths to use folio, and adding various knobs to control GC more flexibly for Zoned devices. In addition, there are several patches to address corner cases of atomic file operations and better support for file pinning on zoned device. Enhancement: - add knobs to tune foreground/background GCs for Zoned devices - convert IO paths to use folio - reduce expensive checkpoint trigger frequency - allow F2FS_IPU_NOCACHE for pinned file - forcibly migrate to secure space for zoned device file pinning - get rid of buffer_head use - add write priority option based on zone UFS - get rid of online repair on corrupted directory Bug fixes: - fix to don't panic system for no free segment fault injection - fix to don't set SB_RDONLY in f2fs_handle_critical_error() - avoid unused block when dio write in LFS mode - compress: don't redirty sparse cluster during {,de}compress - check discard support for conventional zones - atomic: prevent atomic file from being dirtied before commit - atomic: fix to check atomic_file in f2fs ioctl interfaces - atomic: fix to forbid dio in atomic_file - atomic: fix to truncate pagecache before on-disk metadata truncation - atomic: create COW inode from parent dentry - atomic: fix to avoid racing w/ GC - atomic: require FMODE_WRITE for atomic write ioctls - fix to wait page writeback before setting gcing flag - fix to avoid racing in between read and OPU dio write, dio completion - fix several potential integer overflows in file offsets and dir_block_index - fix to avoid use-after-free in f2fs_stop_gc_thread() As usual, there are several code clean-ups and refactorings" * tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits) f2fs: allow F2FS_IPU_NOCACHE for pinned file f2fs: forcibly migrate to secure space for zoned device file pinning f2fs: remove unused parameters f2fs: fix to don't panic system for no free segment fault injection f2fs: fix to don't set SB_RDONLY in f2fs_handle_critical_error() f2fs: add valid block ratio not to do excessive GC for one time GC f2fs: create gc_no_zoned_gc_percent and gc_boost_zoned_gc_percent f2fs: do FG_GC when GC boosting is required for zoned devices f2fs: increase BG GC migration window granularity when boosted for zoned devices f2fs: add reserved_segments sysfs node f2fs: introduce migration_window_granularity f2fs: make BG GC more aggressive for zoned devices f2fs: avoid unused block when dio write in LFS mode f2fs: fix to check atomic_file in f2fs ioctl interfaces f2fs: get rid of online repaire on corrupted directory f2fs: prevent atomic file from being dirtied before commit f2fs: get rid of page->index f2fs: convert read_node_page() to use folio f2fs: convert __write_node_page() to use folio f2fs: convert f2fs_write_data_page() to use folio ...
2024-09-11f2fs: fix to check atomic_file in f2fs ioctl interfacesChao Yu
Some f2fs ioctl interfaces like f2fs_ioc_set_pin_file(), f2fs_move_file_range(), and f2fs_defragment_range() missed to check atomic_write status, which may cause potential race issue, fix it. Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06f2fs: convert f2fs_vm_page_mkwrite() to use folioChao Yu
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/ Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: atomic: fix to forbid dio in atomic_fileChao Yu
atomic write can only be used via buffered IO, let's fail direct IO on atomic_file and return -EOPNOTSUPP. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: compress: don't redirty sparse cluster during {,de}compressYeongjin Gil
In f2fs_do_write_data_page, when the data block is NULL_ADDR, it skips writepage considering that it has been already truncated. This results in an infinite loop as the PAGECACHE_TAG_TOWRITE tag is not cleared during the writeback process for a compressed file including NULL_ADDR in compress_mode=user. This is the reproduction process: 1. dd if=/dev/zero bs=4096 count=1024 seek=1024 of=testfile 2. f2fs_io compress testfile 3. dd if=/dev/zero bs=4096 count=1 conv=notrunc of=testfile 4. f2fs_io decompress testfile To prevent the problem, let's check whether the cluster is fully allocated before redirty its pages. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Tested-by: Jaewook Kim <jw5454.kim@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()Chao Yu
syzbot reports a f2fs bug as below: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_report+0xe8/0x550 mm/kasan/report.c:491 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline] __refcount_add include/linux/refcount.h:184 [inline] __refcount_inc include/linux/refcount.h:241 [inline] refcount_inc include/linux/refcount.h:258 [inline] get_task_struct include/linux/sched/task.h:118 [inline] kthread_stop+0xca/0x630 kernel/kthread.c:704 f2fs_stop_gc_thread+0x65/0xb0 fs/f2fs/gc.c:210 f2fs_do_shutdown+0x192/0x540 fs/f2fs/file.c:2283 f2fs_ioc_shutdown fs/f2fs/file.c:2325 [inline] __f2fs_ioctl+0x443a/0xbe60 fs/f2fs/file.c:4325 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is below race condition, it may cause use-after-free issue in sbi->gc_th pointer. - remount - f2fs_remount - f2fs_stop_gc_thread - kfree(gc_th) - f2fs_ioc_shutdown - f2fs_do_shutdown - f2fs_stop_gc_thread - kthread_stop(gc_th->f2fs_gc_task) : sbi->gc_thread = NULL; We will call f2fs_do_shutdown() in two paths: - for f2fs_ioc_shutdown() path, we should grab sb->s_umount semaphore for fixing. - for f2fs_shutdown() path, it's safe since caller has already grabbed sb->s_umount semaphore. Reported-by: syzbot+1a8e2b31f2ac9bd3d148@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000005c7ccb061e032b9b@google.com Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: atomic: fix to truncate pagecache before on-disk metadata truncationChao Yu
We should always truncate pagecache while truncating on-disk data. Fixes: a46bebd502fe ("f2fs: synchronize atomic write aborts") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: fix to wait page writeback before setting gcing flagChao Yu
Soft IRQ Thread - f2fs_write_end_io - f2fs_defragment_range - set_page_private_gcing - type = WB_DATA_TYPE(page, false); : assign type w/ F2FS_WB_CP_DATA due to page_private_gcing() is true - dec_page_count() w/ wrong type - end_page_writeback() Value of F2FS_WB_CP_DATA reference count may become negative under above race condition, the root cause is we missed to wait page writeback before setting gcing page private flag, let's fix it. Fixes: 2d1fe8a86bf5 ("f2fs: fix to tag gcing flag on page during file defragment") Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: Create COW inode from parent dentry for atomic writeYeongjin Gil
The i_pino in f2fs_inode_info has the previous parent's i_ino when inode was renamed, which may cause f2fs_ioc_start_atomic_write to fail. If file_wrong_pino is true and i_nlink is 1, then to find a valid pino, we should refer to the dentry from inode. To resolve this issue, let's get parent inode using parent dentry directly. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: Require FMODE_WRITE for atomic write ioctlsJann Horn
The F2FS ioctls for starting and committing atomic writes check for inode_owner_or_capable(), but this does not give LSMs like SELinux or Landlock an opportunity to deny the write access - if the caller's FSUID matches the inode's UID, inode_owner_or_capable() immediately returns true. There are scenarios where LSMs want to deny a process the ability to write particular files, even files that the FSUID of the process owns; but this can currently partially be bypassed using atomic write ioctls in two ways: - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can truncate an inode to size 0 - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert changes another process concurrently made to a file Fix it by requiring FMODE_WRITE for these operations, just like for F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these ioctls when intending to write into the file, that seems unlikely to break anything. Fixes: 88b88a667971 ("f2fs: support atomic writes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-21f2fs: clean up val{>>,<<}F2FS_BLKSIZE_BITSZhiguo Niu
Use F2FS_BYTES_TO_BLK(bytes) and F2FS_BLK_TO_BYTES(blk) for cleanup Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>