summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2022-05-11ext4: fix spelling errors in commentsChin Yik Ming
'functoin' and 'entres' should be 'function' and 'entries' respectively Signed-off-by: Chin Yik Ming <yikming2222@gmail.com> Link: https://lore.kernel.org/r/20220402090744.8918-1-yikming2222@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-11ext4: remove unnecessary type castingsYu Zhe
remove unnecessary void* type castings. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Link: https://lore.kernel.org/r/20220401081321.73735-1-yuzhe@nfschina.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-11ext4: fix warning in ext4_handle_inode_extensionYe Bin
We got issue as follows: EXT4-fs error (device loop0) in ext4_reserve_inode_write:5741: Out of memory EXT4-fs error (device loop0): ext4_setattr:5462: inode #13: comm syz-executor.0: mark_inode_dirty error EXT4-fs error (device loop0) in ext4_setattr:5519: Out of memory EXT4-fs error (device loop0): ext4_ind_map_blocks:595: inode #13: comm syz-executor.0: Can't allocate blocks for non-extent mapped inodes with bigalloc ------------[ cut here ]------------ WARNING: CPU: 1 PID: 4361 at fs/ext4/file.c:301 ext4_file_write_iter+0x11c9/0x1220 Modules linked in: CPU: 1 PID: 4361 Comm: syz-executor.0 Not tainted 5.10.0+ #1 RIP: 0010:ext4_file_write_iter+0x11c9/0x1220 RSP: 0018:ffff924d80b27c00 EFLAGS: 00010282 RAX: ffffffff815a3379 RBX: 0000000000000000 RCX: 000000003b000000 RDX: ffff924d81601000 RSI: 00000000000009cc RDI: 00000000000009cd RBP: 000000000000000d R08: ffffffffbc5a2c6b R09: 0000902e0e52a96f R10: ffff902e2b7c1b40 R11: ffff902e2b7c1b40 R12: 000000000000000a R13: 0000000000000001 R14: ffff902e0e52aa10 R15: ffffffffffffff8b FS: 00007f81a7f65700(0000) GS:ffff902e3bc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600400 CR3: 000000012db88001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: do_iter_readv_writev+0x2e5/0x360 do_iter_write+0x112/0x4c0 do_pwritev+0x1e5/0x390 __x64_sys_pwritev2+0x7e/0xa0 do_syscall_64+0x37/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Above issue may happen as follows: Assume inode.i_size=4096 EXT4_I(inode)->i_disksize=4096 step 1: set inode->i_isize = 8192 ext4_setattr if (attr->ia_size != inode->i_size) EXT4_I(inode)->i_disksize = attr->ia_size; rc = ext4_mark_inode_dirty ext4_reserve_inode_write ext4_get_inode_loc __ext4_get_inode_loc sb_getblk --> return -ENOMEM ... if (!error) ->will not update i_size i_size_write(inode, attr->ia_size); Now: inode.i_size=4096 EXT4_I(inode)->i_disksize=8192 step 2: Direct write 4096 bytes ext4_file_write_iter ext4_dio_write_iter iomap_dio_rw ->return error if (extend) ext4_handle_inode_extension WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize); ->Then trigger warning. To solve above issue, if mark inode dirty failed in ext4_setattr just set 'EXT4_I(inode)->i_disksize' with old value. Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20220326065351.761952-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-05-11ext4: get rid of unused DEFAULT_MB_OPTIMIZE_SCANOjaswin Mujoo
After recent changes to the mb_optimize_scan mount option the DEFAULT_MB_OPTIMIZE_SCAN is no longer needed so get rid of it. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20220315114454.104182-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-09fs: Change try_to_free_buffers() to take a folioMatthew Wilcox (Oracle)
All but two of the callers already have a folio; pass a folio into try_to_free_buffers(). This removes the last user of cancel_dirty_page() so remove that wrapper function too. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-09jbd2: Convert jbd2_journal_try_to_free_buffers to take a folioMatthew Wilcox (Oracle)
Also convert it to return a bool since it's called from release_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-09ext4: Convert to release_folioMatthew Wilcox (Oracle)
The use of folios should be pushed deeper into ext4 from here. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-09ext4: Convert ext4 to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Convert block_read_full_page() to block_read_full_folio()Matthew Wilcox (Oracle)
This function is NOT converted to handle large folios, so include an assert that the filesystem isn't passing one in. Otherwise, use the folio functions instead of the page functions, where they exist. Convert all filesystems which use block_read_full_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-08ext4: Call aops write_begin() and write_end() directlyMatthew Wilcox (Oracle)
pagecache_write_begin() and pagecache_write_end() are now trivial wrappers, so call the aops directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-08fs: Remove flags parameter from aops->write_beginMatthew Wilcox (Oracle)
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-08fs: Remove aop flags parameter from grab_cache_page_write_begin()Matthew Wilcox (Oracle)
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-08ext4: Use scoped memory APIs in ext4_write_begin()Matthew Wilcox (Oracle)
Instead of setting AOP_FLAG_NOFS, use memalloc_nofs_save() and memalloc_nofs_restore() to prevent GFP_FS allocations recursing into the filesystem with a journal already started. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Theodore Ts'o <tytso@mit.edu>
2022-05-08ext4: Use scoped memory APIs in ext4_da_write_begin()Matthew Wilcox (Oracle)
Instead of setting AOP_FLAG_NOFS, use memalloc_nofs_save() and memalloc_nofs_restore() to prevent GFP_FS allocations recursing into the filesystem with a journal already started. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Theodore Ts'o <tytso@mit.edu>
2022-05-08ext4: Use scoped memory API in mext_page_double_lock()Matthew Wilcox (Oracle)
Replace use of AOP_FLAG_NOFS with calls to memalloc_nofs_save() and memalloc_nofs_restore(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Theodore Ts'o <tytso@mit.edu>
2022-05-08ext4: Allow GFP_FS allocations in ext4_da_convert_inline_data_to_extent()Matthew Wilcox (Oracle)
Since commit 8bc1379b82b8, the transaction is stopped before calling ext4_da_convert_inline_data_to_extent(), which means we can do GFP_FS allocations and recurse into the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Theodore Ts'o <tytso@mit.edu>
2022-05-08ext4: Use page_symlink() instead of __page_symlink()Matthew Wilcox (Oracle)
By using the memalloc_nofs_save() functionality, we can call page_symlink(), safe in the knowledge that it won't recurse into the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-04-22Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix some syzbot-detected bugs, as well as other bugs found by I/O injection testing. Change ext4's fallocate to consistently drop set[ug]id bits when an fallocate operation might possibly change the user-visible contents of a file. Also, improve handling of potentially invalid values in the the s_overhead_cluster superblock field to avoid ext4 returning a negative number of free blocks" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix a potential race while discarding reserved buffers after an abort ext4: update the cached overhead value in the superblock ext4: force overhead calculation if the s_overhead_cluster makes no sense ext4: fix overhead calculation to account for the reserved gdt blocks ext4, doc: fix incorrect h_reserved size ext4: limit length to bitmap_maxbytes - blocksize in punch_hole ext4: fix use-after-free in ext4_search_dir ext4: fix bug_on in start_this_handle during umount filesystem ext4: fix symlink file size not match to file content ext4: fix fallocate to use file_modified to update permissions consistently
2022-04-17block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARDChristoph Hellwig
Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_discard_granularity helperChristoph Hellwig
Abstract away implementation details from file systems by providing a block_device based helper to retrieve the discard granularity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: remove QUEUE_FLAG_DISCARDChristoph Hellwig
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_nonrot helperChristoph Hellwig
Add a helper to check the nonrot flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-14ext4: update the cached overhead value in the superblockTheodore Ts'o
If we (re-)calculate the file system overhead amount and it's different from the on-disk s_overhead_clusters value, update the on-disk version since this can take potentially quite a while on bigalloc file systems. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-04-14ext4: force overhead calculation if the s_overhead_cluster makes no senseTheodore Ts'o
If the file system does not use bigalloc, calculating the overhead is cheap, so force the recalculation of the overhead so we don't have to trust the precalculated overhead in the superblock. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-04-14ext4: fix overhead calculation to account for the reserved gdt blocksTheodore Ts'o
The kernel calculation was underestimating the overhead by not taking into account the reserved gdt blocks. With this change, the overhead calculated by the kernel matches the overhead calculation in mke2fs. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-04-12ext4: limit length to bitmap_maxbytes - blocksize in punch_holeTadeusz Struk
Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the bitmap_maxbytes for the inode. This triggers a BUG in ext4_ind_remove_space(). According to the comments in this function the 'end' parameter needs to be one block after the last block to be removed. In the case when the BUG is triggered it points to the last block. Modify the ext4_punch_hole() function and add constraint that caps the length to satisfy the one before laster block requirement. LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331 LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000 Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality") Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-04-12ext4: fix use-after-free in ext4_search_dirYe Bin
We got issue as follows: EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue ================================================================== BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline] BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline] BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553 Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331 CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:83 [inline] dump_stack+0x144/0x187 lib/dump_stack.c:124 print_address_description+0x7d/0x630 mm/kasan/report.c:387 __kasan_report+0x132/0x190 mm/kasan/report.c:547 kasan_report+0x47/0x60 mm/kasan/report.c:564 ext4_search_dir fs/ext4/namei.c:1394 [inline] search_dirblock fs/ext4/namei.c:1199 [inline] __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553 ext4_lookup_entry fs/ext4/namei.c:1622 [inline] ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690 __lookup_hash+0xc5/0x190 fs/namei.c:1451 do_rmdir+0x19e/0x310 fs/namei.c:3760 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x445e59 Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002 R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000 R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 The buggy address belongs to the page: page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3 flags: 0x200000000000000() raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== ext4_search_dir: ... de = (struct ext4_dir_entry_2 *)search_buf; dlimit = search_buf + buf_size; while ((char *) de < dlimit) { ... if ((char *) de + de->name_len <= dlimit && ext4_match(dir, fname, de)) { ... } ... de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize); if (de_len <= 0) return -1; offset += de_len; de = (struct ext4_dir_entry_2 *) ((char *) de + de_len); } Assume: de=0xffff8881317c2fff dlimit=0x0xffff8881317c3000 If read 'de->name_len' which address is 0xffff8881317c3005, obviously is out of range, then will trigger use-after-free. To solve this issue, 'dlimit' must reserve 8 bytes, as we will read 'de->name_len' to judge if '(char *) de + de->name_len' out of range. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-04-12ext4: fix bug_on in start_this_handle during umount filesystemYe Bin
We got issue as follows: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:389! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 9 PID: 131 Comm: kworker/9:1 Not tainted 5.17.0-862.14.0.6.x86_64-00001-g23f87daf7d74-dirty #197 Workqueue: events flush_stashed_error_work RIP: 0010:start_this_handle+0x41c/0x1160 RSP: 0018:ffff888106b47c20 EFLAGS: 00010202 RAX: ffffed10251b8400 RBX: ffff888128dc204c RCX: ffffffffb52972ac RDX: 0000000000000200 RSI: 0000000000000004 RDI: ffff888128dc2050 RBP: 0000000000000039 R08: 0000000000000001 R09: ffffed10251b840a R10: ffff888128dc204f R11: ffffed10251b8409 R12: ffff888116d78000 R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888128dc2000 FS: 0000000000000000(0000) GS:ffff88839d680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001620068 CR3: 0000000376c0e000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> jbd2__journal_start+0x38a/0x790 jbd2_journal_start+0x19/0x20 flush_stashed_error_work+0x110/0x2b3 process_one_work+0x688/0x1080 worker_thread+0x8b/0xc50 kthread+0x26f/0x310 ret_from_fork+0x22/0x30 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- Above issue may happen as follows: umount read procfs error_work ext4_put_super flush_work(&sbi->s_error_work); ext4_mb_seq_groups_show ext4_mb_load_buddy_gfp ext4_mb_init_group ext4_mb_init_cache ext4_read_block_bitmap_nowait ext4_validate_block_bitmap ext4_error ext4_handle_error schedule_work(&EXT4_SB(sb)->s_error_work); ext4_unregister_sysfs(sb); jbd2_journal_destroy(sbi->s_journal); journal_kill_thread journal->j_flags |= JBD2_UNMOUNT; flush_stashed_error_work jbd2_journal_start start_this_handle BUG_ON(journal->j_flags & JBD2_UNMOUNT); To solve this issue, we call 'ext4_unregister_sysfs() before flushing s_error_work in ext4_put_super(). Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20220322012419.725457-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-04-12ext4: fix symlink file size not match to file contentYe Bin
We got issue as follows: [home]# fsck.ext4 -fn ram0yb e2fsck 1.45.6 (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Symlink /p3/d14/d1a/l3d (inode #3494) is invalid. Clear? no Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0). Fix? no As the symlink file size does not match the file content. If the writeback of the symlink data block failed, ext4_finish_bio() handles the end of IO. However this function fails to mark the buffer with BH_write_io_error and so when unmount does journal checkpoint it cannot detect the writeback error and will cleanup the journal. Thus we've lost the correct data in the journal area. To solve this issue, mark the buffer as BH_write_io_error in ext4_finish_bio(). Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-04-12ext4: fix fallocate to use file_modified to update permissions consistentlyDarrick J. Wong
Since the initial introduction of (posix) fallocate back at the turn of the century, it has been possible to use this syscall to change the user-visible contents of files. This can happen by extending the file size during a preallocation, or through any of the newer modes (punch, zero, collapse, insert range). Because the call can be used to change file contents, we should treat it like we do any other modification to a file -- update the mtime, and drop set[ug]id privileges/capabilities. The VFS function file_modified() does all this for us if pass it a locked inode, so let's make fallocate drop permissions correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220308185043.GA117678@magnolia Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-04-01ext4: Correct ext4_journalled_dirty_folio() conversionMatthew Wilcox (Oracle)
This should use the new folio_buffers() instead of page_has_buffers(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01fs: Pass an iocb to generic_perform_write()Matthew Wilcox (Oracle)
We can extract both the file pointer and the pos from the iocb. This simplifies each caller as well as allowing generic_perform_write() to see more of the iocb contents in the future. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01fs: Remove ->readpages address space operationMatthew Wilcox (Oracle)
All filesystems have now been converted to use ->readahead, so remove the ->readpages operation and fix all the comments that used to refer to it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-03-26Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull NVMe write streams removal from Jens Axboe: "This removes the write streams support in NVMe. No vendor ever really shipped working support for this, and they are not interested in supporting it. With the NVMe support gone, we have nothing in the tree that supports this. Remove passing around of the hints. The only discussion point in this patchset imho is the fact that the file specific write hint setting/getting fcntl helpers will now return -1/EINVAL like they did before we supported write hints. No known applications use these functions, I only know of one prototype that I help do for RocksDB, and that's not used. That said, with a change like this, it's always a bit controversial. Alternatively, we could just make them return 0 and pretend it worked. It's placement based hints after all" * tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block: fs: remove fs.f_write_hint fs: remove kiocb.ki_hint block: remove the per-bio/request write hint nvme: remove support or stream based temperature hint
2022-03-24Merge tag 'flexible-array-transformations-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull flexible-array transformations from Gustavo Silva: "Treewide patch that replaces zero-length arrays with flexible-array members. This has been baking in linux-next for a whole development cycle" * tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: treewide: Replace zero-length arrays with flexible-array members
2022-03-22Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ...
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22fs: allocate inode by using alloc_inode_sb()Muchun Song
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Fix some bugs in converting ext4 to use the new mount API, as well as more bug fixes and clean ups in the ext4 fast_commit feature (most notably, in the tracepoints). In the jbd2 layer, the t_handle_lock spinlock has been removed, with the last place where it was actually needed replaced with an atomic cmpxchg" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (35 commits) ext4: fix kernel doc warnings ext4: fix remaining two trace events to use same printk convention ext4: add commit tid info in ext4_fc_commit_start/stop trace events ext4: add commit_tid info in jbd debug log ext4: add transaction tid info in fc_track events ext4: add new trace event in ext4_fc_cleanup ext4: return early for non-eligible fast_commit track events ext4: do not call FC trace event in ext4_fc_commit() if FS does not support FC ext4: convert ext4_fc_track_dentry type events to use event class ext4: fix ext4_fc_stats trace point ext4: remove unused enum EXT4_FC_COMMIT_FAILED ext4: warn when dirtying page w/o buffers in data=journal mode doc: fixed a typo in ext4 documentation ext4: make mb_optimize_scan performance mount option work with extents ext4: make mb_optimize_scan option work with set/unset mount cmd ext4: don't BUG if someone dirty pages without asking ext4 first ext4: remove redundant assignment to variable split_flag1 ext4: fix underflow in ext4_max_bitmap_size() ext4: fix ext4_mb_clear_bb() kernel-doc comment ext4: fix fs corruption when tring to remove a non-empty directory with IO error ...
2022-03-22Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt updates from Eric Biggers: "Add support for direct I/O on encrypted files when blk-crypto (inline encryption) is being used for file contents encryption" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: update documentation for direct I/O support f2fs: support direct I/O with fscrypt using blk-crypto ext4: support direct I/O with fscrypt using blk-crypto iomap: support direct I/O with fscrypt using blk-crypto fscrypt: add functions for direct I/O support
2022-03-21Merge tag 'for-5.18/alloc-cleanups-2022-03-18' of ↵Linus Torvalds
git://git.kernel.dk/linux-block Pull bio_alloc() cleanups from Jens Axboe: "Filesystem cleanups to pass the bio op to bio_alloc() instead of setting it just before bio submission". * tag 'for-5.18/alloc-cleanups-2022-03-18' of git://git.kernel.dk/linux-block: f2fs: pass the bio operation to bio_alloc_bioset f2fs: don't pass a bio to f2fs_target_device nilfs2: pass the operation to bio_alloc ext4: pass the operation to bio_alloc mpage: pass the operation to bio_alloc
2022-03-21Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo) - blk-rq-qos completion fix (Tejun) - blk-cgroup merge fix (Tejun) - Add offline error return value to distinguish it from an IO error on the device (Song) - IO stats fixes (Zhang, Christoph) - blkcg refcount fixes (Ming, Yu) - Fix for indefinite dispatch loop softlockup (Shin'ichiro) - blk-mq hardware queue management improvements (Ming) - sbitmap dead code removal (Ming, John) - Plugging merge improvements (me) - Show blk-crypto capabilities in sysfs (Eric) - Multiple delayed queue run improvement (David) - Block throttling fixes (Ming) - Start deprecating auto module loading based on dev_t (Christoph) - bio allocation improvements (Christoph, Chaitanya) - Get rid of bio_devname (Christoph) - bio clone improvements (Christoph) - Block plugging improvements (Christoph) - Get rid of genhd.h header (Christoph) - Ensure drivers use appropriate flush helpers (Christoph) - Refcounting improvements (Christoph) - Queue initialization and teardown improvements (Ming, Christoph) - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng, Lukas, Nian, Yang, Eric, Chengming) * tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits) block: cancel all throttled bios in del_gendisk() block: let blkcg_gq grab request queue's refcnt block: avoid use-after-free on throttle data block: limit request dispatch loop duration block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative" sr: simplify the local variable initialization in sr_block_open() block: don't merge across cgroup boundaries if blkcg is enabled block: fix rq-qos breakage from skipping rq_qos_done_bio() block: flush plug based on hardware and software queue order block: ensure plug merging checks the correct queue at least once block: move rq_qos_exit() into disk_release() block: do more work in elevator_exit block: move blk_exit_queue into disk_release block: move q_usage_counter release into blk_queue_release block: don't remove hctx debugfs dir from blk_mq_exit_queue block: move blkcg initialization/destroy into disk allocation/release handler sr: implement ->free_disk to simplify refcounting sd: implement ->free_disk to simplify refcounting sd: delay calling free_opal_dev sd: call sd_zbc_release_disk before releasing the scsi_device reference ...
2022-03-16fs: Convert __set_page_dirty_no_writeback to noop_dirty_folioMatthew Wilcox (Oracle)
This is a mechanical change. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-16fs: Convert __set_page_dirty_buffers to block_dirty_folioMatthew Wilcox (Oracle)
Convert all callers; mostly this is just changing the aops to point at it, but a few implementations need a little more work. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15ext4: fix kernel doc warningsTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15ext4: add commit tid info in ext4_fc_commit_start/stop trace eventsRitesh Harjani
This adds commit_tid info in ext4_fc_commit_start/stop which is helpful in debugging fast_commit issues. For e.g. issues where due to jbd2 journal full commit, FC miss to commit updates to a file. Also improves TP_prink format string i.e. all ext4 and jbd2 trace events starts with "dev MAjOR,MINOR". Let's follow the same convention while we are still at it. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/ebcd6b9ab5b718db30f90854497886801ce38c63.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15ext4: add commit_tid info in jbd debug logRitesh Harjani
This adds commit_tid argument in ext4_fc_update_stats() so that we can add this information too in jbd_debug logs. This is also required in a later patch to pass the commit_tid info in ext4_fc_commit_start/stop() trace events. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/dabda3f2919a60e01887e798bf5915216b451733.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15ext4: add transaction tid info in fc_track eventsRitesh Harjani
This patch adds the transaction & inode tid info in trace events for callers of ext4_fc_track_template(). This is helpful in debugging race conditions where an inode could belong to two different transaction tids. It also fixes the checkpatch warnings which says use tabs instead of spaces. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c203c09dc11bb372803c430f621f25a4b8c2c8b4.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15ext4: add new trace event in ext4_fc_cleanupRitesh Harjani
This adds a new trace event in ext4_fc_cleanup() which is helpful in debugging some fast_commit issues. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/794cdb1d5d3622f3f80d30c222ff6652ea68c375.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-03-15ext4: return early for non-eligible fast_commit track eventsRitesh Harjani
Currently ext4_fc_track_template() checks, whether the trace event path belongs to replay or does sb has ineligible set, if yes it simply returns. This patch pulls those checks before calling ext4_fc_track_template() in the callers of ext4_fc_track_template(). [ Add checks to ext4_rename() which calls the __ext4_fc_track_*() functions directly. -- TYT ] Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/3cd025d9c490218a92e6d8fb30b6123e693373e3.1647057583.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>