summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)Author
2025-07-10f2fs: zone: fix to calculate first_zoned_segno correctlyChao Yu
[ Upstream commit dc6d9ef57fcf42fac1b3be4bff5ac5b3f1e8f9f3 ] A zoned device can has both conventional zones and sequential zones, so we should not treat first segment of zoned device as first_zoned_segno, instead, we need to check zone type for each zone during traversing zoned device to find first_zoned_segno. Otherwise, for below case, first_zoned_segno will be 0, which could be wrong. create_null_blk 512 2 1024 1024 mkfs.f2fs -m /dev/nullb0 Testcase: export SCRIPTS_PATH=/share/git/scripts test multiple devices w/ zoned device for ((i=0;i<8;i++)) do { zonesize=$((2<<$i)) conzone=$((4096/$zonesize)) seqzone=$((4096/$zonesize)) $SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone mkfs.f2fs -f -m /dev/vdb -c /dev/nullb0 mount /dev/vdb /mnt/f2fs touch /mnt/f2fs/file f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2)) stat /mnt/f2fs/file df cat /proc/fs/f2fs/vdb/segment_info umount /mnt/f2fs $SCRIPTS_PATH/nullblk_remove.sh 0 } done test single zoned device for ((i=0;i<8;i++)) do { zonesize=$((2<<$i)) conzone=$((4096/$zonesize)) seqzone=$((4096/$zonesize)) $SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone mkfs.f2fs -f -m /dev/nullb0 mount /dev/nullb0 /mnt/f2fs touch /mnt/f2fs/file f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2)) stat /mnt/f2fs/file df cat /proc/fs/f2fs/nullb0/segment_info umount /mnt/f2fs $SCRIPTS_PATH/nullblk_remove.sh 0 } done Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-10f2fs: zone: introduce first_zoned_segno in f2fs_sb_infoChao Yu
[ Upstream commit 5bc5aae843128aefb1c55d769d057c92dd8a32c9 ] first_zoned_segno() returns a fixed value, let's cache it in structure f2fs_sb_info to avoid redundant calculation. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: dc6d9ef57fcf ("f2fs: zone: fix to calculate first_zoned_segno correctly") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-10f2fs: decrease spare area for pinned files for zoned devicesDaeho Jeong
[ Upstream commit fa08972bcb7baaf5f1f4fdf251dc08bdd3ab1cf0 ] Now we reclaim too much space before allocating pinned space for zoned devices. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: dc6d9ef57fcf ("f2fs: zone: fix to calculate first_zoned_segno correctly") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06f2fs: fix to zero post-eof pageChao Yu
commit ba8dac350faf16afc129ce6303ca4feaf083ccb1 upstream. fstest reports a f2fs bug: #generic/363 42s ... [failed, exit status 1]- output mismatch (see /share/git/fstests/results//generic/363.out.bad) # --- tests/generic/363.out 2025-01-12 21:57:40.271440542 +0800 # +++ /share/git/fstests/results//generic/363.out.bad 2025-05-19 19:55:58.000000000 +0800 # @@ -1,2 +1,78 @@ # QA output created by 363 # fsx -q -S 0 -e 1 -N 100000 # +READ BAD DATA: offset = 0xd6fb, size = 0xf044, fname = /mnt/f2fs/junk # +OFFSET GOOD BAD RANGE # +0x1540d 0x0000 0x2a25 0x0 # +operation# (mod 256) for the bad data may be 37 # +0x1540e 0x0000 0x2527 0x1 # ... # (Run 'diff -u /share/git/fstests/tests/generic/363.out /share/git/fstests/results//generic/363.out.bad' to see the entire diff) Ran: generic/363 Failures: generic/363 Failed 1 of 1 tests The root cause is user can update post-eof page via mmap [1], however, f2fs missed to zero post-eof page in below operations, so, once it expands i_size, then it will include dummy data locates previous post-eof page, so during below operations, we need to zero post-eof page. Operations which can include dummy data after previous i_size after expanding i_size: - write - mapwrite [1] - truncate - fallocate * preallocate * zero_range * insert_range * collapse_range - clone_range (doesn’t support in f2fs) - copy_range (doesn’t support in f2fs) [1] https://man7.org/linux/man-pages/man2/mmap.2.html 'BUG section' Cc: stable@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06f2fs: don't over-report free space or inodes in statvfsChao Yu
[ Upstream commit a9201960623287927bf5776de3f70fb2fbde7e02 ] This fixes an analogus bug that was fixed in modern filesystems: a) xfs in commit 4b8d867ca6e2 ("xfs: don't over-report free space or inodes in statvfs") b) ext4 in commit f87d3af74193 ("ext4: don't over-report free space or inodes in statvfs") where statfs can report misleading / incorrect information where project quota is enabled, and the free space is less than the remaining quota. This commit will resolve a test failure in generic/762 which tests for this bug. generic/762 - output mismatch (see /share/git/fstests/results//generic/762.out.bad) # --- tests/generic/762.out 2025-04-15 10:21:53.371067071 +0800 # +++ /share/git/fstests/results//generic/762.out.bad 2025-05-13 16:13:37.000000000 +0800 # @@ -6,8 +6,10 @@ # root blocks2 is in range # dir blocks2 is in range # root bavail2 is in range # -dir bavail2 is in range # +dir bavail2 has value of 1539066 # +dir bavail2 is NOT in range 304734.87 .. 310891.13 # root blocks3 is in range # ... # (Run 'diff -u /share/git/fstests/tests/generic/762.out /share/git/fstests/results//generic/762.out.bad' to see the entire diff) HINT: You _MAY_ be missing kernel fix: XXXXXXXXXXXXXX xfs: don't over-report free space or inodes in statvfs Cc: stable@kernel.org Fixes: ddc34e328d06 ("f2fs: introduce f2fs_statfs_project") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27f2fs: fix to set atomic write status more clearChao Yu
[ Upstream commit db03c20c0850dc8d2bcabfa54b9438f7d666c863 ] 1. After we start atomic write in a database file, before committing all data, we'd better not set inode w/ vfs dirty status to avoid redundant updates, instead, we only set inode w/ atomic dirty status. 2. After we commit all data, before committing metadata, we need to clear atomic dirty status, and set vfs dirty status to allow vfs flush dirty inode. Cc: Daeho Jeong <daehojeong@google.com> Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27f2fs: fix to bail out in get_new_segment()Chao Yu
[ Upstream commit bb5eb8a5b222fa5092f60d5555867a05ebc3bdf2 ] ------------[ cut here ]------------ WARNING: CPU: 3 PID: 579 at fs/f2fs/segment.c:2832 new_curseg+0x5e8/0x6dc pc : new_curseg+0x5e8/0x6dc Call trace: new_curseg+0x5e8/0x6dc f2fs_allocate_data_block+0xa54/0xe28 do_write_page+0x6c/0x194 f2fs_do_write_node_page+0x38/0x78 __write_node_page+0x248/0x6d4 f2fs_sync_node_pages+0x524/0x72c f2fs_write_checkpoint+0x4bc/0x9b0 __checkpoint_and_complete_reqs+0x80/0x244 issue_checkpoint_thread+0x8c/0xec kthread+0x114/0x1bc ret_from_fork+0x10/0x20 get_new_segment() detects inconsistent status in between free_segmap and free_secmap, let's record such error into super block, and bail out get_new_segment() instead of continue using the segment. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27f2fs: use vmalloc instead of kvmalloc in .init_{,de}compress_ctxChao Yu
[ Upstream commit 70dd07c888451503c3e93b6821e10d1ea1ec9930 ] .init_{,de}compress_ctx uses kvmalloc() to alloc memory, it will try to allocate physically continuous page first, it may cause more memory allocation pressure, let's use vmalloc instead to mitigate it. [Test] cd /data/local/tmp touch file f2fs_io setflags compression file f2fs_io getflags file for i in $(seq 1 10); do sync; echo 3 > /proc/sys/vm/drop_caches;\ time f2fs_io write 512 0 4096 zero osync file; truncate -s 0 file;\ done [Result] Before After Delta 21.243 21.694 -2.12% For compression, we recommend to use ioctl to compress file data in background for workaround. For decompression, only zstd will be affected. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27f2fs: fix to do sanity check on sit_bitmap_sizeChao Yu
commit 5db0d252c64e91ba1929c70112352e85dc5751e7 upstream. w/ below testcase, resize will generate a corrupted image which contains inconsistent metadata, so when mounting such image, it will trigger kernel panic: touch img truncate -s $((512*1024*1024*1024)) img mkfs.f2fs -f img $((256*1024*1024)) resize.f2fs -s -i img -t $((1024*1024*1024)) mount img /mnt/f2fs ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.h:863! Oops: invalid opcode: 0000 [#1] SMP PTI CPU: 11 UID: 0 PID: 3922 Comm: mount Not tainted 6.15.0-rc1+ #191 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:f2fs_ra_meta_pages+0x47c/0x490 Call Trace: f2fs_build_segment_manager+0x11c3/0x2600 f2fs_fill_super+0xe97/0x2840 mount_bdev+0xf4/0x140 legacy_get_tree+0x2b/0x50 vfs_get_tree+0x29/0xd0 path_mount+0x487/0xaf0 __x64_sys_mount+0x116/0x150 do_syscall_64+0x82/0x190 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fdbfde1bcfe The reaseon is: sit_i->bitmap_size is 192, so size of sit bitmap is 192*8=1536, at maximum there are 1536 sit blocks, however MAIN_SEGS is 261893, so that sit_blk_cnt is 4762, build_sit_entries() -> current_sit_addr() tries to access out-of-boundary in sit_bitmap at offset from [1536, 4762), once sit_bitmap and sit_bitmap_mirror is not the same, it will trigger f2fs_bug_on(). Let's add sanity check in f2fs_sanity_check_ckpt() to avoid panic. Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27f2fs: prevent kernel warning due to negative i_nlink from corrupted imageJaegeuk Kim
commit 42cb74a92adaf88061039601ddf7c874f58b554e upstream. WARNING: CPU: 1 PID: 9426 at fs/inode.c:417 drop_nlink+0xac/0xd0 home/cc/linux/fs/inode.c:417 Modules linked in: CPU: 1 UID: 0 PID: 9426 Comm: syz-executor568 Not tainted 6.14.0-12627-g94d471a4f428 #2 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:drop_nlink+0xac/0xd0 home/cc/linux/fs/inode.c:417 Code: 48 8b 5d 28 be 08 00 00 00 48 8d bb 70 07 00 00 e8 f9 67 e6 ff f0 48 ff 83 70 07 00 00 5b 5d e9 9a 12 82 ff e8 95 12 82 ff 90 &lt;0f&gt; 0b 90 c7 45 48 ff ff ff ff 5b 5d e9 83 12 82 ff e8 fe 5f e6 ff RSP: 0018:ffffc900026b7c28 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8239710f RDX: ffff888041345a00 RSI: ffffffff8239717b RDI: 0000000000000005 RBP: ffff888054509ad0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff9ab36f08 R12: ffff88804bb40000 R13: ffff8880545091e0 R14: 0000000000008000 R15: ffff8880545091e0 FS: 000055555d0c5880(0000) GS:ffff8880eb3e3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f915c55b178 CR3: 0000000050d20000 CR4: 0000000000352ef0 Call Trace: <task> f2fs_i_links_write home/cc/linux/fs/f2fs/f2fs.h:3194 [inline] f2fs_drop_nlink+0xd1/0x3c0 home/cc/linux/fs/f2fs/dir.c:845 f2fs_delete_entry+0x542/0x1450 home/cc/linux/fs/f2fs/dir.c:909 f2fs_unlink+0x45c/0x890 home/cc/linux/fs/f2fs/namei.c:581 vfs_unlink+0x2fb/0x9b0 home/cc/linux/fs/namei.c:4544 do_unlinkat+0x4c5/0x6a0 home/cc/linux/fs/namei.c:4608 __do_sys_unlink home/cc/linux/fs/namei.c:4654 [inline] __se_sys_unlink home/cc/linux/fs/namei.c:4652 [inline] __x64_sys_unlink+0xc5/0x110 home/cc/linux/fs/namei.c:4652 do_syscall_x64 home/cc/linux/arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xc7/0x250 home/cc/linux/arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb3d092324b Code: 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 57 00 00 00 0f 05 &lt;48&gt; 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffdc232d938 EFLAGS: 00000206 ORIG_RAX: 0000000000000057 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb3d092324b RDX: 00007ffdc232d960 RSI: 00007ffdc232d960 RDI: 00007ffdc232d9f0 RBP: 00007ffdc232d9f0 R08: 0000000000000001 R09: 00007ffdc232d7c0 R10: 00000000fffffffd R11: 0000000000000206 R12: 00007ffdc232eaf0 R13: 000055555d0cebb0 R14: 00007ffdc232d958 R15: 0000000000000001 </task> Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27f2fs: fix to do sanity check on ino and xnidChao Yu
commit 061cf3a84bde038708eb0f1d065b31b7c2456533 upstream. syzbot reported a f2fs bug as below: INFO: task syz-executor140:5308 blocked for more than 143 seconds. Not tainted 6.14.0-rc7-syzkaller-00069-g81e4f8d68c66 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor140 state:D stack:24016 pid:5308 tgid:5308 ppid:5306 task_flags:0x400140 flags:0x00000006 Call Trace: <TASK> context_switch kernel/sched/core.c:5378 [inline] __schedule+0x190e/0x4c90 kernel/sched/core.c:6765 __schedule_loop kernel/sched/core.c:6842 [inline] schedule+0x14b/0x320 kernel/sched/core.c:6857 io_schedule+0x8d/0x110 kernel/sched/core.c:7690 folio_wait_bit_common+0x839/0xee0 mm/filemap.c:1317 __folio_lock mm/filemap.c:1664 [inline] folio_lock include/linux/pagemap.h:1163 [inline] __filemap_get_folio+0x147/0xb40 mm/filemap.c:1917 pagecache_get_page+0x2c/0x130 mm/folio-compat.c:87 find_get_page_flags include/linux/pagemap.h:842 [inline] f2fs_grab_cache_page+0x2b/0x320 fs/f2fs/f2fs.h:2776 __get_node_page+0x131/0x11b0 fs/f2fs/node.c:1463 read_xattr_block+0xfb/0x190 fs/f2fs/xattr.c:306 lookup_all_xattrs fs/f2fs/xattr.c:355 [inline] f2fs_getxattr+0x676/0xf70 fs/f2fs/xattr.c:533 __f2fs_get_acl+0x52/0x870 fs/f2fs/acl.c:179 f2fs_acl_create fs/f2fs/acl.c:375 [inline] f2fs_init_acl+0xd7/0x9b0 fs/f2fs/acl.c:418 f2fs_init_inode_metadata+0xa0f/0x1050 fs/f2fs/dir.c:539 f2fs_add_inline_entry+0x448/0x860 fs/f2fs/inline.c:666 f2fs_add_dentry+0xba/0x1e0 fs/f2fs/dir.c:765 f2fs_do_add_link+0x28c/0x3a0 fs/f2fs/dir.c:808 f2fs_add_link fs/f2fs/f2fs.h:3616 [inline] f2fs_mknod+0x2e8/0x5b0 fs/f2fs/namei.c:766 vfs_mknod+0x36d/0x3b0 fs/namei.c:4191 unix_bind_bsd net/unix/af_unix.c:1286 [inline] unix_bind+0x563/0xe30 net/unix/af_unix.c:1379 __sys_bind_socket net/socket.c:1817 [inline] __sys_bind+0x1e4/0x290 net/socket.c:1848 __do_sys_bind net/socket.c:1853 [inline] __se_sys_bind net/socket.c:1851 [inline] __x64_sys_bind+0x7a/0x90 net/socket.c:1851 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Let's dump and check metadata of corrupted inode, it shows its xattr_nid is the same to its i_ino. dump.f2fs -i 3 chaseyu.img.raw i_xattr_nid [0x 3 : 3] So that, during mknod in the corrupted directory, it tries to get and lock inode page twice, result in deadlock. - f2fs_mknod - f2fs_add_inline_entry - f2fs_get_inode_page --- lock dir's inode page - f2fs_init_acl - f2fs_acl_create(dir,..) - __f2fs_get_acl - f2fs_getxattr - lookup_all_xattrs - __get_node_page --- try to lock dir's inode page In order to fix this, let's add sanity check on ino and xnid. Cc: stable@vger.kernel.org Reported-by: syzbot+cc448dcdc7ae0b4e4ffa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/67e06150.050a0220.21942d.0005.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-19f2fs: fix to correct check conditions in f2fs_cross_renameZhiguo Niu
[ Upstream commit 9883494c45a13dc88d27dde4f988c04823b42a2f ] Should be "old_dir" here. Fixes: 5c57132eaf52 ("f2fs: support project quota") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19f2fs: use d_inode(dentry) cleanup dentry->d_inodeZhiguo Niu
[ Upstream commit a6c397a31f58a1d577c2c8d04b624e9baa31951c ] no logic changes. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19f2fs: fix to detect gcing page in f2fs_is_cp_guaranteed()Chao Yu
[ Upstream commit aa1be8dd64163eca4dde7fd2557eb19927a06a47 ] Jan Prusakowski reported a f2fs bug as below: f2fs/007 will hang kernel during testing w/ below configs: kernel 6.12.18 (from pixel-kernel/android16-6.12) export MKFS_OPTIONS="-O encrypt -O extra_attr -O project_quota -O quota" export F2FS_MOUNT_OPTIONS="test_dummy_encryption,discard,fsync_mode=nobarrier,reserve_root=32768,checkpoint_merge,atgc" cat /proc/<umount_proc_id>/stack f2fs_wait_on_all_pages+0xa3/0x130 do_checkpoint+0x40c/0x5d0 f2fs_write_checkpoint+0x258/0x550 kill_f2fs_super+0x14f/0x190 deactivate_locked_super+0x30/0xb0 cleanup_mnt+0xba/0x150 task_work_run+0x59/0xa0 syscall_exit_to_user_mode+0x12d/0x130 do_syscall_64+0x57/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e cat /sys/kernel/debug/f2fs/status - IO_W (CP: -256, Data: 256, Flush: ( 0 0 1), Discard: ( 0 0)) cmd: 0 undiscard: 0 CP IOs reference count becomes negative. The root cause is: After 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration"), we will tag page w/ gcing flag for raw page of cluster during its migration. However, if the inode is both encrypted and compressed, during ioc_decompress(), it will tag page w/ gcing flag, and it increase F2FS_WB_DATA reference count: - f2fs_write_multi_page - f2fs_write_raw_page - f2fs_write_single_page - do_write_page - f2fs_submit_page_write - WB_DATA_TYPE(bio_page, fio->compressed_page) : bio_page is encrypted, so mapping is NULL, and fio->compressed_page is NULL, it returns F2FS_WB_DATA - inc_page_count(.., F2FS_WB_DATA) Then, during end_io(), it decrease F2FS_WB_CP_DATA reference count: - f2fs_write_end_io - f2fs_compress_write_end_io - fscrypt_pagecache_folio : get raw page from encrypted page - WB_DATA_TYPE(&folio->page, false) : raw page has gcing flag, it returns F2FS_WB_CP_DATA - dec_page_count(.., F2FS_WB_CP_DATA) In order to fix this issue, we need to detect gcing flag in raw page in f2fs_is_cp_guaranteed(). Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration") Reported-by: Jan Prusakowski <jprusakowski@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19f2fs: clean up w/ fscrypt_is_bounce_page()Chao Yu
[ Upstream commit 0c708e35cf26449ca317fcbfc274704660b6d269 ] Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19f2fs: fix to do sanity check on sbi->total_valid_block_countChao Yu
[ Upstream commit 05872a167c2cab80ef186ef23cc34a6776a1a30c ] syzbot reported a f2fs bug as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/f2fs.h:2521! RIP: 0010:dec_valid_block_count+0x3b2/0x3c0 fs/f2fs/f2fs.h:2521 Call Trace: f2fs_truncate_data_blocks_range+0xc8c/0x11a0 fs/f2fs/file.c:695 truncate_dnode+0x417/0x740 fs/f2fs/node.c:973 truncate_nodes+0x3ec/0xf50 fs/f2fs/node.c:1014 f2fs_truncate_inode_blocks+0x8e3/0x1370 fs/f2fs/node.c:1197 f2fs_do_truncate_blocks+0x840/0x12b0 fs/f2fs/file.c:810 f2fs_truncate_blocks+0x10d/0x300 fs/f2fs/file.c:838 f2fs_truncate+0x417/0x720 fs/f2fs/file.c:888 f2fs_setattr+0xc4f/0x12f0 fs/f2fs/file.c:1112 notify_change+0xbca/0xe90 fs/attr.c:552 do_truncate+0x222/0x310 fs/open.c:65 handle_truncate fs/namei.c:3466 [inline] do_open fs/namei.c:3849 [inline] path_openat+0x2e4f/0x35d0 fs/namei.c:4004 do_filp_open+0x284/0x4e0 fs/namei.c:4031 do_sys_openat2+0x12b/0x1d0 fs/open.c:1429 do_sys_open fs/open.c:1444 [inline] __do_sys_creat fs/open.c:1522 [inline] __se_sys_creat fs/open.c:1516 [inline] __x64_sys_creat+0x124/0x170 fs/open.c:1516 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/syscall_64.c:94 The reason is: in fuzzed image, sbi->total_valid_block_count is inconsistent w/ mapped blocks indexed by inode, so, we should not trigger panic for such case, instead, let's print log and set fsck flag. Fixes: 39a53e0ce0df ("f2fs: add superblock and major in-memory structure") Reported-by: syzbot+8b376a77b2f364097fbe@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/67f3c0b2.050a0220.396535.0547.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19f2fs: prevent the current section from being selected as a victim during GCyohan.joung
[ Upstream commit d26fecb03e1f1069480d41fa2a6cea87ebbb89b8 ] When selecting a victim using next_victim_seg in a large section, the selected section might already have been cleared and designated as the new current section, making it actively in use. This behavior causes inconsistency between the SIT and SSA. F2FS-fs (dm-54): Inconsistent segment (70961) type [0, 1] in SSA and SIT Call trace: dump_backtrace+0xe8/0x10c show_stack+0x18/0x28 dump_stack_lvl+0x50/0x6c dump_stack+0x18/0x28 f2fs_stop_checkpoint+0x1c/0x3c do_garbage_collect+0x41c/0x271c f2fs_gc+0x27c/0x828 gc_thread_func+0x290/0x88c kthread+0x11c/0x164 ret_from_fork+0x10/0x20 issue scenario segs_per_sec=2 - seg#0 and seg#1 are all dirty - all valid blocks are removed in seg#1 - gc select this sec and next_victim_seg=seg#0 - migrate seg#0, next_victim_seg=seg#1 - checkpoint -> sec(seg#0, seg#1) becomes free - allocator assigns sec(seg#0, seg#1) to curseg - gc tries to migrate seg#1 Fixes: e3080b0120a1 ("f2fs: support subsectional garbage collection") Signed-off-by: yohan.joung <yohan.joung@sk.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19f2fs: clean up unnecessary indentationJaegeuk Kim
[ Upstream commit 05d3273ad03fa5ea1177b4f3dfeeb6de4899b504 ] No functional change. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19f2fs: zone: fix to avoid inconsistence in between SIT and SSAChao Yu
[ Upstream commit 773704c1ef96a8b70d0d186ab725f50548de82c4 ] w/ below testcase, it will cause inconsistence in between SIT and SSA. create_null_blk 512 2 1024 1024 mkfs.f2fs -m /dev/nullb0 mount /dev/nullb0 /mnt/f2fs/ touch /mnt/f2fs/file f2fs_io pinfile set /mnt/f2fs/file fallocate -l 4GiB /mnt/f2fs/file F2FS-fs (nullb0): Inconsistent segment (0) type [1, 0] in SSA and SIT CPU: 5 UID: 0 PID: 2398 Comm: fallocate Tainted: G O 6.13.0-rc1 #84 Tainted: [O]=OOT_MODULE Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 Call Trace: <TASK> dump_stack_lvl+0xb3/0xd0 dump_stack+0x14/0x20 f2fs_handle_critical_error+0x18c/0x220 [f2fs] f2fs_stop_checkpoint+0x38/0x50 [f2fs] do_garbage_collect+0x674/0x6e0 [f2fs] f2fs_gc_range+0x12b/0x230 [f2fs] f2fs_allocate_pinning_section+0x5c/0x150 [f2fs] f2fs_expand_inode_data+0x1cc/0x3c0 [f2fs] f2fs_fallocate+0x3c3/0x410 [f2fs] vfs_fallocate+0x15f/0x4b0 __x64_sys_fallocate+0x4a/0x80 x64_sys_call+0x15e8/0x1b80 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x67/0x6f RIP: 0033:0x7f9dba5197ca F2FS-fs (nullb0): Stopped filesystem due to reason: 4 The reason is f2fs_gc_range() may try to migrate block in curseg, however, its SSA block is not uptodate due to the last summary block data is still in cache of curseg. In this patch, we add a condition in f2fs_gc_range() to check whether section is opened or not, and skip block migration for opened section. Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Reviewed-by: Daeho Jeong <daehojeong@google.com> Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-10f2fs: fix to avoid accessing uninitialized cursegChao Yu
commit 986c50f6bca109c6cf362b4e2babcb85aba958f6 upstream. syzbot reports a f2fs bug as below: F2FS-fs (loop3): Stopped filesystem due to reason: 7 kworker/u8:7: attempt to access beyond end of device BUG: unable to handle page fault for address: ffffed1604ea3dfa RIP: 0010:get_ckpt_valid_blocks fs/f2fs/segment.h:361 [inline] RIP: 0010:has_curseg_enough_space fs/f2fs/segment.h:570 [inline] RIP: 0010:__get_secs_required fs/f2fs/segment.h:620 [inline] RIP: 0010:has_not_enough_free_secs fs/f2fs/segment.h:633 [inline] RIP: 0010:has_enough_free_secs+0x575/0x1660 fs/f2fs/segment.h:649 <TASK> f2fs_is_checkpoint_ready fs/f2fs/segment.h:671 [inline] f2fs_write_inode+0x425/0x540 fs/f2fs/inode.c:791 write_inode fs/fs-writeback.c:1525 [inline] __writeback_single_inode+0x708/0x10d0 fs/fs-writeback.c:1745 writeback_sb_inodes+0x820/0x1360 fs/fs-writeback.c:1976 wb_writeback+0x413/0xb80 fs/fs-writeback.c:2156 wb_do_writeback fs/fs-writeback.c:2303 [inline] wb_workfn+0x410/0x1080 fs/fs-writeback.c:2343 process_one_work kernel/workqueue.c:3236 [inline] process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317 worker_thread+0x870/0xd30 kernel/workqueue.c:3398 kthread+0x7a9/0x920 kernel/kthread.c:464 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Commit 8b10d3653735 ("f2fs: introduce FAULT_NO_SEGMENT") allows to trigger no free segment fault in allocator, then it will update curseg->segno to NULL_SEGNO, though, CP_ERROR_FLAG has been set, f2fs_write_inode() missed to check the flag, and access invalid curseg->segno directly in below call path, then resulting in panic: - f2fs_write_inode - f2fs_is_checkpoint_ready - has_enough_free_secs - has_not_enough_free_secs - __get_secs_required - has_curseg_enough_space - get_ckpt_valid_blocks : access invalid curseg->segno To avoid this issue, let's: - check CP_ERROR_FLAG flag in prior to f2fs_is_checkpoint_ready() in f2fs_write_inode(). - in has_curseg_enough_space(), save curseg->segno into a temp variable, and verify its validation before use. Fixes: 8b10d3653735 ("f2fs: introduce FAULT_NO_SEGMENT") Reported-by: syzbot+b6b347b7a4ea1b2e29b6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67973c2b.050a0220.11b1bb.0089.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Alva Lan <alvalan9@foxmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-29f2fs: introduce f2fs_base_attr for global sysfs entriesJaegeuk Kim
[ Upstream commit 21925ede449e038ed6f9efdfe0e79f15bddc34bc ] In /sys/fs/f2fs/features, there's no f2fs_sb_info, so let's avoid to get the pointer. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20f2fs: fix to avoid atomicity corruption of atomic fileYeongjin Gil
commit f098aeba04c9328571567dca45159358a250240c upstream. In the case of the following call stack for an atomic file, FI_DIRTY_INODE is set, but FI_ATOMIC_DIRTIED is not subsequently set. f2fs_file_write_iter f2fs_map_blocks f2fs_reserve_new_blocks inc_valid_block_count __mark_inode_dirty(dquot) f2fs_dirty_inode If FI_ATOMIC_DIRTIED is not set, atomic file can encounter corruption due to a mismatch between old file size and new data. To resolve this issue, I changed to set FI_ATOMIC_DIRTIED when FI_DIRTY_INODE is set. This ensures that FI_DIRTY_INODE, which was previously cleared by the Writeback thread during the commit atomic, is set and i_size is updated. Cc: <stable@vger.kernel.org> Fixes: fccaa81de87e ("f2fs: prevent atomic file from being dirtied before commit") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20Revert "f2fs: rebuild nat_bits during umount"Chao Yu
[ Upstream commit 19426c4988aa85298c1b4caf2889d37ec5c80fea ] This reverts commit 94c821fb286b545d37549ff30a0c341e066f0d6c. It reports that there is potential corruption in node footer, the most suspious feature is nat_bits, let's revert recovery related code. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20f2fs: fix to avoid out-of-bounds access in f2fs_truncate_inode_blocks()Chao Yu
[ Upstream commit e6494977bd4a83862118a05f57a8df40256951c0 ] syzbot reports an UBSAN issue as below: ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in fs/f2fs/node.h:381:10 index 18446744073709550692 is out of range for type '__le32[5]' (aka 'unsigned int[5]') CPU: 0 UID: 0 PID: 5318 Comm: syz.0.0 Not tainted 6.14.0-rc3-syzkaller-00060-g6537cfb395f3 #0 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429 get_nid fs/f2fs/node.h:381 [inline] f2fs_truncate_inode_blocks+0xa5e/0xf60 fs/f2fs/node.c:1181 f2fs_do_truncate_blocks+0x782/0x1030 fs/f2fs/file.c:808 f2fs_truncate_blocks+0x10d/0x300 fs/f2fs/file.c:836 f2fs_truncate+0x417/0x720 fs/f2fs/file.c:886 f2fs_file_write_iter+0x1bdb/0x2550 fs/f2fs/file.c:5093 aio_write+0x56b/0x7c0 fs/aio.c:1633 io_submit_one+0x8a7/0x18a0 fs/aio.c:2052 __do_sys_io_submit fs/aio.c:2111 [inline] __se_sys_io_submit+0x171/0x2e0 fs/aio.c:2081 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f238798cde9 index 18446744073709550692 (decimal, unsigned long long) = 0xfffffffffffffc64 (hexadecimal, unsigned long long) = -924 (decimal, long long) In f2fs_truncate_inode_blocks(), UBSAN detects that get_nid() tries to access .i_nid[-924], it means both offset[0] and level should zero. The possible case should be in f2fs_do_truncate_blocks(), we try to truncate inode size to zero, however, dn.ofs_in_node is zero and dn.node_page is not an inode page, so it fails to truncate inode page, and then pass zeroed free_from to f2fs_truncate_inode_blocks(), result in this issue. if (dn.ofs_in_node || IS_INODE(dn.node_page)) { f2fs_truncate_data_blocks_range(&dn, count); free_from += count; } I guess the reason why dn.node_page is not an inode page could be: there are multiple nat entries share the same node block address, once the node block address was reused, f2fs_get_node_page() may load a non-inode block. Let's add a sanity check for such condition to avoid out-of-bounds access issue. Reported-by: syzbot+6653f10281a1badc749e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/66fdcdf3.050a0220.40bef.0025.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20f2fs: don't retry IO for corrupted data scenarioChao Yu
[ Upstream commit 1534747d3170646ddeb9ea5f7caaac90359707cf ] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942] If node block is loaded successfully, but its content is inconsistent, it doesn't need to retry IO. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08f2fs: Introduce linear search for dentriesDaniel Lee
commit 91b587ba79e1b68bb718d12b0758dbcdab4e9cb7 upstream. This patch addresses an issue where some files in case-insensitive directories become inaccessible due to changes in how the kernel function, utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points"). F2FS uses these case-folded names to calculate hash values for locating dentries and stores them on disk. Since utf8_casefold() can produce different output across kernel versions, stored hash values and newly calculated hash values may differ. This results in affected files no longer being found via the hash-based lookup. To resolve this, the patch introduces a linear search fallback. If the initial hash-based search fails, F2FS will sequentially scan the directory entries. Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586 Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Daniel Rosenberg <drosen@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17fs/writeback: convert wbc_account_cgroup_owner to take a folioPankaj Raghav
[ Upstream commit 30dac24e14b52e1787572d1d4e06eeabe8a63630 ] Most of the callers of wbc_account_cgroup_owner() are converting a folio to page before calling the function. wbc_account_cgroup_owner() is converting the page back to a folio to call mem_cgroup_css_from_folio(). Convert wbc_account_cgroup_owner() to take a folio instead of a page, and convert all callers to pass a folio directly except f2fs. Convert the page to folio for all the callers from f2fs as they were the only callers calling wbc_account_cgroup_owner() with a page. As f2fs is already in the process of converting to folios, these call sites might also soon be calling wbc_account_cgroup_owner() with a folio directly in the future. No functional changes. Only compile tested. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com Acked-by: David Sterba <dsterba@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: 51d20d1dacbe ("iomap: fix zero padding data issue in concurrent append writes") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: add a sysfs node to limit max read extent count per-inodeChao Yu
[ Upstream commit 009a8241a8e5a14ea2dd0b8db42dbf283527dd44 ] Quoted: "at this time, there are still 1086911 extent nodes in this zombie extent tree that need to be cleaned up. crash_arm64_sprd_v8.0.3++> extent_tree.node_cnt ffffff80896cc500 node_cnt = { counter = 1086911 }, " As reported by Xiuhong, there will be a huge number of extent nodes in extent tree, it may potentially cause: - slab memory fragments - extreme long time shrink on extent tree - low mapping efficiency Let's add a sysfs node to limit max read extent count for each inode, by default, value of this threshold is 10240, it can be updated according to user's requirement. Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/ Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: fix to shrink read extent node in batchesChao Yu
[ Upstream commit 3fc5d5a182f6a1f8bd4dc775feb54c369dd2c343 ] We use rwlock to protect core structure data of extent tree during its shrink, however, if there is a huge number of extent nodes in extent tree, during shrink of extent tree, it may hold rwlock for a very long time, which may trigger kernel hang issue. This patch fixes to shrink read extent node in batches, so that, critical region of the rwlock can be shrunk to avoid its extreme long time hold. Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/ Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: print message if fscorrupted was found in f2fs_new_node_page()Chao Yu
[ Upstream commit 81520c684ca67aea6a589461a3caebb9b11dcc90 ] If fs corruption occurs in f2fs_new_node_page(), let's print more information about corrupted metadata into kernel log. Meanwhile, it updates to record ERROR_INCONSISTENT_NAT instead of ERROR_INVALID_BLKADDR if blkaddr in nat entry is not NULL_ADDR which means nat bitmap and nat entry is inconsistent. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: fix f2fs_bug_on when uninstalling filesystem call f2fs_evict_inode.Qi Han
[ Upstream commit d5c367ef8287fb4d235c46a2f8c8d68715f3a0ca ] creating a large files during checkpoint disable until it runs out of space and then delete it, then remount to enable checkpoint again, and then unmount the filesystem triggers the f2fs_bug_on as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:896! CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360 Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:f2fs_evict_inode+0x58c/0x610 Call Trace: __die_body+0x15/0x60 die+0x33/0x50 do_trap+0x10a/0x120 f2fs_evict_inode+0x58c/0x610 do_error_trap+0x60/0x80 f2fs_evict_inode+0x58c/0x610 exc_invalid_op+0x53/0x60 f2fs_evict_inode+0x58c/0x610 asm_exc_invalid_op+0x16/0x20 f2fs_evict_inode+0x58c/0x610 evict+0x101/0x260 dispose_list+0x30/0x50 evict_inodes+0x140/0x190 generic_shutdown_super+0x2f/0x150 kill_block_super+0x11/0x40 kill_f2fs_super+0x7d/0x140 deactivate_locked_super+0x2a/0x70 cleanup_mnt+0xb3/0x140 task_work_run+0x61/0x90 The root cause is: creating large files during disable checkpoint period results in not enough free segments, so when writing back root inode will failed in f2fs_enable_checkpoint. When umount the file system after enabling checkpoint, the root inode is dirty in f2fs_evict_inode function, which triggers BUG_ON. The steps to reproduce are as follows: dd if=/dev/zero of=f2fs.img bs=1M count=55 mount f2fs.img f2fs_dir -o checkpoint=disable:10% dd if=/dev/zero of=big bs=1M count=50 sync rm big mount -o remount,checkpoint=enable f2fs_dir umount f2fs_dir Let's redirty inode when there is not free segments during checkpoint is disable. Signed-off-by: Qi Han <hanqi@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: fix to requery extent which cross boundary of inquiryChao Yu
[ Upstream commit 6787a82245857271133b63ae7f72f1dc9f29e985 ] dd if=/dev/zero of=file bs=4k count=5 xfs_io file -c "fiemap -v 2 16384" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..31]: 139272..139303 32 0x1000 1: [32..39]: 139304..139311 8 0x1001 xfs_io file -c "fiemap -v 0 16384" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..31]: 139272..139303 32 0x1000 xfs_io file -c "fiemap -v 0 16385" file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..39]: 139272..139311 40 0x1001 There are two problems: - continuous extent is split to two - FIEMAP_EXTENT_LAST is missing in last extent The root cause is: if upper boundary of inquiry crosses extent, f2fs_map_blocks() will truncate length of returned extent to F2FS_BYTES_TO_BLK(len), and also, it will stop to query latter extent or hole to make sure current extent is last or not. In order to fix this issue, once we found an extent locates in the end of inquiry range by f2fs_map_blocks(), we need to expand inquiry range to requiry. Cc: stable@vger.kernel.org Fixes: 7f63eb77af7b ("f2fs: report unwritten area in f2fs_fiemap") Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: fix to adjust appropriate length for fiemapZhiguo Niu
[ Upstream commit 77569f785c8624fa4189795fb52e635a973672e5 ] If user give a file size as "length" parameter for fiemap operations, but if this size is non-block size aligned, it will show 2 segments fiemap results even this whole file is contiguous on disk, such as the following results: ./f2fs_io fiemap 0 19034 ylog/analyzer.py Fiemap: offset = 0 len = 19034 logical addr. physical addr. length flags 0 0000000000000000 0000000020baa000 0000000000004000 00001000 1 0000000000004000 0000000020bae000 0000000000001000 00001001 after this patch: ./f2fs_io fiemap 0 19034 ylog/analyzer.py Fiemap: offset = 0 len = 19034 logical addr. physical addr. length flags 0 0000000000000000 00000000315f3000 0000000000005000 00001001 Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: 6787a8224585 ("f2fs: fix to requery extent which cross boundary of inquiry") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}Chao Yu
[ Upstream commit 7461f37094180200cb2f98e60ef99a0cea97beec ] f2fs doesn't support different blksize in one instance, so bytes_to_blks() and blks_to_bytes() are equal to F2FS_BYTES_TO_BLK and F2FS_BLK_TO_BYTES, let's use F2FS_BYTES_TO_BLK/F2FS_BLK_TO_BYTES instead for cleanup. Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: 6787a8224585 ("f2fs: fix to requery extent which cross boundary of inquiry") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09f2fs: fix to drop all discards after creating snapshot on lvm deviceChao Yu
commit bc8aeb04fd80cb8cfae3058445c84410fd0beb5e upstream. Piergiorgio reported a bug in bugzilla as below: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330 RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs] Call Trace: __issue_discard_cmd+0x1ca/0x350 [f2fs] issue_discard_thread+0x191/0x480 [f2fs] kthread+0xcf/0x100 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1a/0x30 w/ below testcase, it can reproduce this bug quickly: - pvcreate /dev/vdb - vgcreate myvg1 /dev/vdb - lvcreate -L 1024m -n mylv1 myvg1 - mount /dev/myvg1/mylv1 /mnt/f2fs - dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20 - sync - rm /mnt/f2fs/file - sync - lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1 - umount /mnt/f2fs The root cause is: it will update discard_max_bytes of mounted lvm device to zero after creating snapshot on this lvm device, then, __submit_discard_cmd() will pass parameter @nr_sects w/ zero value to __blkdev_issue_discard(), it returns a NULL bio pointer, result in panic. This patch changes as below for fixing: 1. Let's drop all remained discards in f2fs_unfreeze() if snapshot of lvm device is created. 2. Checking discard_max_bytes before submitting discard during __submit_discard_cmd(). Cc: stable@vger.kernel.org Fixes: 35ec7d574884 ("f2fs: split discard command in prior to block layer") Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05f2fs: fix to do sanity check on node blkaddr in truncate_node()Chao Yu
commit 6babe00ccd34fc65b78ef8b99754e32b4385f23d upstream. syzbot reports a f2fs bug as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:2534! RIP: 0010:f2fs_invalidate_blocks+0x35f/0x370 fs/f2fs/segment.c:2534 Call Trace: truncate_node+0x1ae/0x8c0 fs/f2fs/node.c:909 f2fs_remove_inode_page+0x5c2/0x870 fs/f2fs/node.c:1288 f2fs_evict_inode+0x879/0x15c0 fs/f2fs/inode.c:856 evict+0x4e8/0x9b0 fs/inode.c:723 f2fs_handle_failed_inode+0x271/0x2e0 fs/f2fs/inode.c:986 f2fs_create+0x357/0x530 fs/f2fs/namei.c:394 lookup_open fs/namei.c:3595 [inline] open_last_lookups fs/namei.c:3694 [inline] path_openat+0x1c03/0x3590 fs/namei.c:3930 do_filp_open+0x235/0x490 fs/namei.c:3960 do_sys_openat2+0x13e/0x1d0 fs/open.c:1415 do_sys_open fs/open.c:1430 [inline] __do_sys_openat fs/open.c:1446 [inline] __se_sys_openat fs/open.c:1441 [inline] __x64_sys_openat+0x247/0x2a0 fs/open.c:1441 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0010:f2fs_invalidate_blocks+0x35f/0x370 fs/f2fs/segment.c:2534 The root cause is: on a fuzzed image, blkaddr in nat entry may be corrupted, then it will cause system panic when using it in f2fs_invalidate_blocks(), to avoid this, let's add sanity check on nat blkaddr in truncate_node(). Reported-by: syzbot+33379ce4ac76acf7d0c7@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000009a6cd706224ca720@google.com/ Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05Revert "f2fs: remove unreachable lazytime mount option parsing"Jaegeuk Kim
commit acff9409dd40beaca2bd982678d222e2740ad84b upstream. This reverts commit 54f43a10fa257ad4af02a1d157fefef6ebcfa7dc. The above commit broke the lazytime mount, given mount("/dev/vdb", "/mnt/test", "f2fs", 0, "lazytime"); CC: stable@vger.kernel.org # 6.11+ Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05f2fs: fix fiemap failure issue when page size is 16KBXiuhong Wang
commit a7a7c1d423a6351a6541e95c797da5358e5ad1ea upstream. After enable 16K page size, an infinite loop may occur in fiemap (fm_length=UINT64_MAX) on a file, such as the 16KB scratch.img during the remount operation in Android. The condition for whether fiemap continues to map is to check whether the number of bytes corresponding to the next map.m_lblk exceeds blks_to_bytes(inode,max_inode_blocks(inode)) if there are HOLE. The latter does not take into account the maximum size of a file with 16KB page size, so the loop cannot be jumped out. The following is the fail trace: When f2fs_map_blocks reaches map.m_lblk=3936, it needs to go to the first direct node block, so the map is 3936 + 4090 = 8026, The next map is the second direct node block, that is, 8026 + 4090 = 12116, The next map is the first indirect node block, that is, 12116 + 4090 * 4090 = 16740216, The next map is the second indirect node block, that is, 16740216 + 4090 * 4090 = 33468316, The next map is the first double indirect node block, that is, 33468316 + 4090 * 4090 * 4090 = 68451397316 Since map.m_lblk represents the address of a block, which is 32 bits, truncation will occur, that is, 68451397316 becomes 4026887876, and the number of bytes corresponding to the block number does not exceed blks_to_bytes(inode,max_inode_blocks(inode)), so the loop will not be jumped out. The next time, it will be considered that it should still be a double indirect node block, that is, 4026887876 + 4090 * 4090 * 4090 = 72444816876, which will be truncated to 3725340140, and the loop will not be jumped out. 156.374871: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 0, start blkaddr = 0x8e00, len = 0x200, flags = 2,seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.374916: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 512, start blkaddr = 0x0, len = 0x0, flags = 0 , seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.374920: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 513, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 ...... 156.385747: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3935, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385752: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3936, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385755: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 8026, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385758: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 12116, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385761: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 16740216, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385764: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 33468316, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385767: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 4026887876, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385770: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3725340140, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385772: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 4026887876, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 156.385775: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3725340140, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0 Commit a6a010f5def5 ("f2fs: Restrict max filesize for 16K f2fs") has set the maximum allowed file size to (U32_MAX + 1) * F2FS_BLKSIZE, so max_file_blocks should be used here to limit it, that is, maxbytes defined above. And the max_inode_blocks function is not called by other functions except here, so cleanup it. Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Daniel Rosenberg <drosen@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflowChao Yu
[ Upstream commit 3273d8ad947dea925a65a78ca29e5351c960c801 ] It missed to cast variable to unsigned long long type before bit shift, which will cause overflow, fix it. Fixes: f7ef9b83b583 ("f2fs: introduce macros to convert bytes and blocks in f2fs") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inodeChao Yu
[ Upstream commit 26e6f59d0bbaac76fa3413462d780bd2b5f9f653 ] Jinsu Lee reported a performance regression issue, after commit 5c8764f8679e ("f2fs: fix to force buffered IO on inline_data inode"), we forced direct write to use buffered IO on inline_data inode, it will cause performace regression due to memory copy and data flush. It's fine to not force direct write to use buffered IO, as it can convert inline inode before committing direct write IO. Fixes: 5c8764f8679e ("f2fs: fix to force buffered IO on inline_data inode") Reported-by: Jinsu Lee <jinsu1.lee@samsung.com> Closes: https://lore.kernel.org/linux-f2fs-devel/af03dd2c-e361-4f80-b2fd-39440766cf6e@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: fix to map blocks correctly for direct writeChao Yu
[ Upstream commit 5dd00ebda337b9295e7027691fa70540da369ff2 ] f2fs_map_blocks() supports to map continuous holes or preallocated address, we should avoid setting F2FS_MAP_MAPPED for these cases only, otherwise, it may fail f2fs_iomap_begin(), and make direct write fallbacking to use buffered IO and flush, result in performance regression. Fixes: 9f0f6bf42714 ("f2fs: support to map continuous holes or preallocated address") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409122103.e45aa13b-oliver.sang@intel.com Cc: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: fix race in concurrent f2fs_stop_gc_threadLong Li
[ Upstream commit 7b0033dbc48340a1c1c3f12448ba17d6587ca092 ] In my test case, concurrent calls to f2fs shutdown report the following stack trace: Oops: general protection fault, probably for non-canonical address 0xc6cfff63bb5513fc: 0000 [#1] PREEMPT SMP PTI CPU: 0 UID: 0 PID: 678 Comm: f2fs_rep_shutdo Not tainted 6.12.0-rc5-next-20241029-g6fb2fa9805c5-dirty #85 Call Trace: <TASK> ? show_regs+0x8b/0xa0 ? __die_body+0x26/0xa0 ? die_addr+0x54/0x90 ? exc_general_protection+0x24b/0x5c0 ? asm_exc_general_protection+0x26/0x30 ? kthread_stop+0x46/0x390 f2fs_stop_gc_thread+0x6c/0x110 f2fs_do_shutdown+0x309/0x3a0 f2fs_ioc_shutdown+0x150/0x1c0 __f2fs_ioctl+0xffd/0x2ac0 f2fs_ioctl+0x76/0xe0 vfs_ioctl+0x23/0x60 __x64_sys_ioctl+0xce/0xf0 x64_sys_call+0x2b1b/0x4540 do_syscall_64+0xa7/0x240 entry_SYSCALL_64_after_hwframe+0x76/0x7e The root cause is a race condition in f2fs_stop_gc_thread() called from different f2fs shutdown paths: [CPU0] [CPU1] ---------------------- ----------------------- f2fs_stop_gc_thread f2fs_stop_gc_thread gc_th = sbi->gc_thread gc_th = sbi->gc_thread kfree(gc_th) sbi->gc_thread = NULL < gc_th != NULL > kthread_stop(gc_th->f2fs_gc_task) //UAF The commit c7f114d864ac ("f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()") attempted to fix this issue by using a read semaphore to prevent races between shutdown and remount threads, but it fails to prevent all race conditions. Fix it by converting to write lock of s_umount in f2fs_do_shutdown(). Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: fix to avoid use GC_AT when setting gc_mode as GC_URGENT_LOW or ↵Zhiguo Niu
GC_URGENT_MID [ Upstream commit 296b8cb34e65fa93382cf919be5a056f719c9a26 ] If gc_mode is set to GC_URGENT_LOW or GC_URGENT_MID, cost benefit GC approach should be used, but if ATGC is enabled at the same time, Age-threshold approach will be selected, which can only do amount of GC and it is much less than the numbers of CB approach. some traces: f2fs_gc-254:48-396 [007] ..... 2311600.684028: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1053, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0 f2fs_gc-254:48-396 [007] ..... 2311600.684527: f2fs_get_victim: dev = (254,48), type = No TYPE, policy = (Background GC, LFS-mode, Age-threshold), victim = 10, cost = 4294364975, ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 44898 f2fs_gc-254:48-396 [007] ..... 2311600.714835: f2fs_gc_end: dev = (254,48), ret = 0, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0 f2fs_gc-254:48-396 [007] ..... 2311600.714843: f2fs_background_gc: dev = (254,48), wait_ms = 50, prefree = 0, free = 44898 f2fs_gc-254:48-396 [007] ..... 2311600.771785: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg: f2fs_gc-254:48-396 [007] ..... 2311600.772275: f2fs_gc_end: dev = (254,48), ret = -61, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0 Fixes: 0e5e81114de1 ("f2fs: add GC_URGENT_LOW mode in gc_urgent") Fixes: d98af5f45520 ("f2fs: introduce gc_urgent_mid mode") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: fix to avoid potential deadlock in f2fs_record_stop_reason()Chao Yu
[ Upstream commit f10a890308a7cd8794e21f646f09827c6cb4bf5d ] syzbot reports deadlock issue of f2fs as below: ====================================================== WARNING: possible circular locking dependency detected 6.12.0-rc3-syzkaller-00087-gc964ced77262 #0 Not tainted ------------------------------------------------------ kswapd0/79 is trying to acquire lock: ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2199 [inline] ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068 but task is already holding lock: ffff88804bd92610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sb_internal#2){.+.+}-{0:0}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write include/linux/fs.h:1716 [inline] sb_start_intwrite+0x4d/0x1c0 include/linux/fs.h:1899 f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842 evict+0x4e8/0x9b0 fs/inode.c:725 f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807 evict+0x4e8/0x9b0 fs/inode.c:725 dispose_list fs/inode.c:774 [inline] prune_icache_sb+0x239/0x2f0 fs/inode.c:963 super_cache_scan+0x38c/0x4b0 fs/super.c:223 do_shrink_slab+0x701/0x1160 mm/shrinker.c:435 shrink_slab+0x1093/0x14d0 mm/shrinker.c:662 shrink_one+0x43b/0x850 mm/vmscan.c:4818 shrink_many mm/vmscan.c:4879 [inline] lru_gen_shrink_node mm/vmscan.c:4957 [inline] shrink_node+0x3799/0x3de0 mm/vmscan.c:5937 kswapd_shrink_node mm/vmscan.c:6765 [inline] balance_pgdat mm/vmscan.c:6957 [inline] kswapd+0x1ca3/0x3700 mm/vmscan.c:7226 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 -> #1 (fs_reclaim){+.+.}-{0:0}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 __fs_reclaim_acquire mm/page_alloc.c:3834 [inline] fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3848 might_alloc include/linux/sched/mm.h:318 [inline] prepare_alloc_pages+0x147/0x5b0 mm/page_alloc.c:4493 __alloc_pages_noprof+0x16f/0x710 mm/page_alloc.c:4722 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 alloc_pages_noprof mm/mempolicy.c:2345 [inline] folio_alloc_noprof+0x128/0x180 mm/mempolicy.c:2352 filemap_alloc_folio_noprof+0xdf/0x500 mm/filemap.c:1010 do_read_cache_folio+0x2eb/0x850 mm/filemap.c:3787 read_mapping_folio include/linux/pagemap.h:1011 [inline] f2fs_commit_super+0x3c0/0x7d0 fs/f2fs/super.c:4032 f2fs_record_stop_reason+0x13b/0x1d0 fs/f2fs/super.c:4079 f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174 f2fs_write_inode+0x35f/0x4d0 fs/f2fs/inode.c:785 write_inode fs/fs-writeback.c:1503 [inline] __writeback_single_inode+0x711/0x10d0 fs/fs-writeback.c:1723 writeback_single_inode+0x1f3/0x660 fs/fs-writeback.c:1779 sync_inode_metadata+0xc4/0x120 fs/fs-writeback.c:2849 f2fs_release_file+0xa8/0x100 fs/f2fs/file.c:1941 __fput+0x23f/0x880 fs/file_table.c:431 task_work_run+0x24f/0x310 kernel/task_work.c:228 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&sbi->sb_lock){++++}-{3:3}: check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 down_write+0x99/0x220 kernel/locking/rwsem.c:1577 f2fs_down_write fs/f2fs/f2fs.h:2199 [inline] f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068 f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174 f2fs_evict_inode+0xa61/0x15c0 fs/f2fs/inode.c:883 evict+0x4e8/0x9b0 fs/inode.c:725 f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807 evict+0x4e8/0x9b0 fs/inode.c:725 dispose_list fs/inode.c:774 [inline] prune_icache_sb+0x239/0x2f0 fs/inode.c:963 super_cache_scan+0x38c/0x4b0 fs/super.c:223 do_shrink_slab+0x701/0x1160 mm/shrinker.c:435 shrink_slab+0x1093/0x14d0 mm/shrinker.c:662 shrink_one+0x43b/0x850 mm/vmscan.c:4818 shrink_many mm/vmscan.c:4879 [inline] lru_gen_shrink_node mm/vmscan.c:4957 [inline] shrink_node+0x3799/0x3de0 mm/vmscan.c:5937 kswapd_shrink_node mm/vmscan.c:6765 [inline] balance_pgdat mm/vmscan.c:6957 [inline] kswapd+0x1ca3/0x3700 mm/vmscan.c:7226 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Chain exists of: &sbi->sb_lock --> fs_reclaim --> sb_internal#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(sb_internal#2); lock(fs_reclaim); lock(sb_internal#2); lock(&sbi->sb_lock); Root cause is there will be potential deadlock in between below tasks: Thread A Kswapd - f2fs_ioc_commit_atomic_write - mnt_want_write_file -- down_read lock A - balance_pgdat - __fs_reclaim_acquire -- lock B - shrink_node - prune_icache_sb - dispose_list - f2fs_evict_inode - sb_start_intwrite -- down_read lock A - f2fs_do_sync_file - f2fs_write_inode - f2fs_handle_critical_error - f2fs_record_stop_reason - f2fs_commit_super - read_mapping_folio - filemap_alloc_folio_noprof - fs_reclaim_acquire -- lock B Both threads try to acquire read lock of lock A, then its upcoming write lock grabber will trigger deadlock. Let's always create an asynchronous task in f2fs_handle_critical_error() rather than calling f2fs_record_stop_reason() synchronously to avoid this potential deadlock issue. Fixes: b62e71be2110 ("f2fs: support errors=remount-ro|continue|panic mountoption") Reported-by: syzbot+be4a9983e95a5e25c8d3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6704d667.050a0220.1e4d62.0081.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: Fix not used variable 'index'Zeng Heng
[ Upstream commit 0c3a38a4b442893f8baca72e44a2a27d52d6cc75 ] Fix the following compilation warning: fs/f2fs/data.c:2391:10: warning: variable ‘index’ set but not used [-Wunused-but-set-variable] 2391 | pgoff_t index; Only define and set the variable index when the CONFIG_F2FS_FS_COMPRESSION is enabled. Fixes: db92e6c729d8 ("f2fs: convert f2fs_mpage_readpages() to use folio") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: check curseg->inited before write_sum_page in change_cursegYongpeng Yang
[ Upstream commit 43563069e1c1df417d2eed6eca8a22fc6b04691d ] In the __f2fs_init_atgc_curseg->get_atssr_segment calling, curseg->segno is NULL_SEGNO, indicating that there is no summary block that needs to be written. Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: fix the wrong f2fs_bug_on condition in f2fs_do_replace_blockLongPing Wei
[ Upstream commit c3af1f13476ec23fd99c98d060a89be28c1e8871 ] This f2fs_bug_on was introduced by commit 2c1905042c8c ("f2fs: check segment type in __f2fs_replace_block") when there were only 6 curseg types. After commit d0b9e42ab615 ("f2fs: introduce inmem curseg") was introduced, the condition should be changed to checking curseg->seg_type. Fixes: d0b9e42ab615 ("f2fs: introduce inmem curseg") Signed-off-by: LongPing Wei <weilongping@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: fix to account dirty data in __get_secs_required()Chao Yu
[ Upstream commit 1acd73edbbfef2c3c5b43cba4006a7797eca7050 ] It will trigger system panic w/ testcase in [1]: ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:2752! RIP: 0010:new_curseg+0xc81/0x2110 Call Trace: f2fs_allocate_data_block+0x1c91/0x4540 do_write_page+0x163/0xdf0 f2fs_outplace_write_data+0x1aa/0x340 f2fs_do_write_data_page+0x797/0x2280 f2fs_write_single_data_page+0x16cd/0x2190 f2fs_write_cache_pages+0x994/0x1c80 f2fs_write_data_pages+0x9cc/0xea0 do_writepages+0x194/0x7a0 filemap_fdatawrite_wbc+0x12b/0x1a0 __filemap_fdatawrite_range+0xbb/0xf0 file_write_and_wait_range+0xa1/0x110 f2fs_do_sync_file+0x26f/0x1c50 f2fs_sync_file+0x12b/0x1d0 vfs_fsync_range+0xfa/0x230 do_fsync+0x3d/0x80 __x64_sys_fsync+0x37/0x50 x64_sys_call+0x1e88/0x20d0 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e The root cause is if checkpoint_disabling and lfs_mode are both on, it will trigger OPU for all overwritten data, it may cost more free segment than expected, so f2fs must account those data correctly to calculate cosumed free segments later, and return ENOSPC earlier to avoid run out of free segment during block allocation. [1] https://lore.kernel.org/fstests/20241015025106.3203676-1-chao@kernel.org/ Fixes: 4354994f097d ("f2fs: checkpoint disabling") Cc: Daniel Rosenberg <drosen@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: fix null-ptr-deref in f2fs_submit_page_bio()Ye Bin
[ Upstream commit b7d0a97b28083084ebdd8e5c6bccd12e6ec18faa ] There's issue as follows when concurrently installing the f2fs.ko module and mounting the f2fs file system: KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] RIP: 0010:__bio_alloc+0x2fb/0x6c0 [f2fs] Call Trace: <TASK> f2fs_submit_page_bio+0x126/0x8b0 [f2fs] __get_meta_page+0x1d4/0x920 [f2fs] get_checkpoint_version.constprop.0+0x2b/0x3c0 [f2fs] validate_checkpoint+0xac/0x290 [f2fs] f2fs_get_valid_checkpoint+0x207/0x950 [f2fs] f2fs_fill_super+0x1007/0x39b0 [f2fs] mount_bdev+0x183/0x250 legacy_get_tree+0xf4/0x1e0 vfs_get_tree+0x88/0x340 do_new_mount+0x283/0x5e0 path_mount+0x2b2/0x15b0 __x64_sys_mount+0x1fe/0x270 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Above issue happens as the biset of the f2fs file system is not initialized before register "f2fs_fs_type". To address above issue just register "f2fs_fs_type" at the last in init_f2fs_fs(). Ensure that all f2fs file system resources are initialized. Fixes: f543805fcd60 ("f2fs: introduce private bioset") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05f2fs: compress: fix inconsistent update of i_blocks in ↵Qi Han
release_compress_blocks and reserve_compress_blocks [ Upstream commit 26413ce18e85de3dda2cd3d72c3c3e8ab8f4f996 ] After release a file and subsequently reserve it, the FSCK flag is set when the file is deleted, as shown in the following backtrace: F2FS-fs (dm-48): Inconsistent i_blocks, ino:401231, iblocks:1448, sectors:1472 fs_rec_info_write_type+0x58/0x274 f2fs_rec_info_write+0x1c/0x2c set_sbi_flag+0x74/0x98 dec_valid_block_count+0x150/0x190 f2fs_truncate_data_blocks_range+0x2d4/0x3cc f2fs_do_truncate_blocks+0x2fc/0x5f0 f2fs_truncate_blocks+0x68/0x100 f2fs_truncate+0x80/0x128 f2fs_evict_inode+0x1a4/0x794 evict+0xd4/0x280 iput+0x238/0x284 do_unlinkat+0x1ac/0x298 __arm64_sys_unlinkat+0x48/0x68 invoke_syscall+0x58/0x11c For clusters of the following type, i_blocks are decremented by 1 and i_compr_blocks are incremented by 7 in release_compress_blocks, while updates to i_blocks and i_compr_blocks are skipped in reserve_compress_blocks. raw node: D D D D D D D D after compress: C D D D D D D D after reserve: C D D D D D D D Let's update i_blocks and i_compr_blocks properly in reserve_compress_blocks. Fixes: eb8fbaa53374 ("f2fs: compress: fix to check unreleased compressed cluster") Signed-off-by: Qi Han <hanqi@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>