summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-04-13btrfs: export: handle invalid inode or root reference in btrfs_get_parent()David Sterba
[ Upstream commit 26b66d1d366a375745755ca7365f67110bbf6bd5 ] The get_parent handler looks up a parent of a given dentry, this can be either a subvolume or a directory. The search is set up with offset -1 but it's never expected to find such item, as it would break allowed range of inode number or a root id. This means it's a corruption (ext4 also returns this error code). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()David Sterba
[ Upstream commit 7411055db5ce64f836aaffd422396af0075fdc99 ] The unhandled case in btrfs_relocate_sys_chunks() loop is a corruption, as it could be caused only by two impossible conditions: - at first the search key is set up to look for a chunk tree item, with offset -1, this is an inexact search and the key->offset will contain the correct offset upon a successful search, a valid chunk tree item cannot have an offset -1 - after first successful search, the found_key corresponds to a chunk item, the offset is decremented by 1 before the next loop, it's impossible to find a chunk item there due to alignment and size constraints Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13fs: add a vfs_fchmod helperChristoph Hellwig
[ Upstream commit 9e96c8c0e94eea2f69a9705f5d0f51928ea26c17 ] Add a helper for struct file based chmode operations. To be used by the initramfs code soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 4624b346cf67 ("init: open /initrd.image with O_LARGEFILE") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13fs: add a vfs_fchown helperChristoph Hellwig
[ Upstream commit c04011fe8cbd80af1be6e12b53193bf3846750d7 ] Add a helper for struct file based chown operations. To be used by the initramfs code soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 4624b346cf67 ("init: open /initrd.image with O_LARGEFILE") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13btrfs: allocate btrfs_ioctl_defrag_range_args on stackGoldwyn Rodrigues
commit c853a5783ebe123847886d432354931874367292 upstream. Instead of using kmalloc() to allocate btrfs_ioctl_defrag_range_args, allocate btrfs_ioctl_defrag_range_args on stack, the size is reasonably small and ioctls are called in process context. sizeof(btrfs_ioctl_defrag_range_args) = 48 Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [ This patch is needed to fix a memory leak of "range" that was introduced when commit 173431b274a9 ("btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args") was backported to kernels lacking this patch. Now with these two patches applied in reverse order, range->flags needed to change back to range.flags. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc.] Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversionBart Van Assche
commit 961ebd120565cb60cebe21cb634fbc456022db4a upstream. The first kiocb_set_cancel_fn() argument may point at a struct kiocb that is not embedded inside struct aio_kiocb. With the current code, depending on the compiler, the req->ki_ctx read happens either before the IOCB_AIO_RW test or after that test. Move the req->ki_ctx read such that it is guaranteed that the IOCB_AIO_RW test happens first. Reported-by: Eric Biggers <ebiggers@kernel.org> Cc: Benjamin LaHaise <ben@communityfibre.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: stable@vger.kernel.org Fixes: b820de741ae4 ("fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240304235715.3790858-1-bvanassche@acm.org Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()Max Filippov
commit 2aea94ac14d1e0a8ae9e34febebe208213ba72f7 upstream. In NOMMU kernel the value of linux_binprm::p is the offset inside the temporary program arguments array maintained in separate pages in the linux_binprm::page. linux_binprm::exec being a copy of linux_binprm::p thus must be adjusted when that array is copied to the user stack. Without that adjustment the value passed by the NOMMU kernel to the ELF program in the AT_EXECFN entry of the aux array doesn't make any sense and it may break programs that try to access memory pointed to by that entry. Adjust linux_binprm::exec before the successful return from the transfer_args_to_stack(). Cc: <stable@vger.kernel.org> Fixes: b6a2fea39318 ("mm: variable length argument support") Fixes: 5edc2a5123a7 ("binfmt_elf_fdpic: wire up AT_EXECFD, AT_EXECFN, AT_SECURE") Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Link: https://lore.kernel.org/r/20240320182607.1472887-1-jcmvbkbc@gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13nilfs2: prevent kernel bug at submit_bh_wbc()Ryusuke Konishi
[ Upstream commit 269cdf353b5bdd15f1a079671b0f889113865f20 ] Fix a bug where nilfs_get_block() returns a successful status when searching and inserting the specified block both fail inconsistently. If this inconsistent behavior is not due to a previously fixed bug, then an unexpected race is occurring, so return a temporary error -EAGAIN instead. This prevents callers such as __block_write_begin_int() from requesting a read into a buffer that is not mapped, which would cause the BUG_ON check for the BH_Mapped flag in submit_bh_wbc() to fail. Link: https://lkml.kernel.org/r/20240313105827.5296-3-konishi.ryusuke@gmail.com Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13nilfs2: use a more common logging styleJoe Perches
[ Upstream commit a1d0747a393a079631130d61faa2a61027d1c789 ] Add macros for nilfs_<level>(sb, fmt, ...) and convert the uses of 'nilfs_msg(sb, KERN_<LEVEL>, ...)' to 'nilfs_<level>(sb, ...)' so nilfs2 uses a logging style more like the typical kernel logging style. Miscellanea: o Realign arguments for these uses Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1595860111-3920-4-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 269cdf353b5b ("nilfs2: prevent kernel bug at submit_bh_wbc()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13nilfs2: fix failure to detect DAT corruption in btree and direct mappingsRyusuke Konishi
[ Upstream commit f2f26b4a84a0ef41791bd2d70861c8eac748f4ba ] Patch series "nilfs2: fix kernel bug at submit_bh_wbc()". This resolves a kernel BUG reported by syzbot. Since there are two flaws involved, I've made each one a separate patch. The first patch alone resolves the syzbot-reported bug, but I think both fixes should be sent to stable, so I've tagged them as such. This patch (of 2): Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data to a nilfs2 file system whose metadata is corrupted. There are two flaws involved in this issue. The first flaw is that when nilfs_get_block() locates a data block using btree or direct mapping, if the disk address translation routine nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata corruption, it can be passed back to nilfs_get_block(). This causes nilfs_get_block() to misidentify an existing block as non-existent, causing both data block lookup and insertion to fail inconsistently. The second flaw is that nilfs_get_block() returns a successful status in this inconsistent state. This causes the caller __block_write_begin_int() or others to request a read even though the buffer is not mapped, resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc() failing. This fixes the first issue by changing the return value to code -EINVAL when a conversion using DAT fails with code -ENOENT, avoiding the conflicting condition that leads to the kernel bug described above. Here, code -EINVAL indicates that metadata corruption was detected during the block lookup, which will be properly handled as a file system error and converted to -EIO when passing through the nilfs2 bmap layer. Link: https://lkml.kernel.org/r/20240313105827.5296-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240313105827.5296-2-konishi.ryusuke@gmail.com Fixes: c3a7abf06ce7 ("nilfs2: support contiguous lookup of blocks") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+cfed5b56649bddf80d6e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13ext4: fix corruption during on-line resizeMaximilian Heyne
[ Upstream commit a6b3bfe176e8a5b05ec4447404e412c2a3fc92cc ] We observed a corruption during on-line resize of a file system that is larger than 16 TiB with 4k block size. With having more then 2^32 blocks resize_inode is turned off by default by mke2fs. The issue can be reproduced on a smaller file system for convenience by explicitly turning off resize_inode. An on-line resize across an 8 GiB boundary (the size of a meta block group in this setup) then leads to a corruption: dev=/dev/<some_dev> # should be >= 16 GiB mkdir -p /corruption /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15)) mount -t ext4 $dev /corruption dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15)) sha1sum /corruption/test # 79d2658b39dcfd77274e435b0934028adafaab11 /corruption/test /sbin/resize2fs $dev $((2*2**21)) # drop page cache to force reload the block from disk echo 1 > /proc/sys/vm/drop_caches sha1sum /corruption/test # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3 /corruption/test 2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per block group and 2^6 are the number of block groups that make a meta block group. The last checksum might be different depending on how the file is laid out across the physical blocks. The actual corruption occurs at physical block 63*2^15 = 2064384 which would be the location of the backup of the meta block group's block descriptor. During the on-line resize the file system will be converted to meta_bg starting at s_first_meta_bg which is 2 in the example - meaning all block groups after 16 GiB. However, in ext4_flex_group_add we might add block groups that are not part of the first meta block group yet. In the reproducer we achieved this by substracting the size of a whole block group from the point where the meta block group would start. This must be considered when updating the backup block group descriptors to follow the non-meta_bg layout. The fix is to add a test whether the group to add is already part of the meta block group or not. Fixes: 01f795f9e0d67 ("ext4: add online resizing support for meta_bg and 64-bit file systems") Cc: <stable@vger.kernel.org> Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Tested-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> Reviewed-by: Srivathsa Dara <srivathsa.d.dara@oracle.com> Link: https://lore.kernel.org/r/20240215155009.94493-1-mheyne@amazon.de Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13fuse: don't unhash rootMiklos Szeredi
[ Upstream commit b1fe686a765e6c0d71811d825b5a1585a202b777 ] The root inode is assumed to be always hashed. Do not unhash the root inode even if it is marked BAD. Fixes: 5d069dbe8aaf ("fuse: fix bad inode") Cc: <stable@vger.kernel.org> # v5.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13ubifs: Set page uptodate in the correct placeMatthew Wilcox (Oracle)
[ Upstream commit 723012cab779eee8228376754e22c6594229bf8f ] Page cache reads are lockless, so setting the freshly allocated page uptodate before we've overwritten it with the data it's supposed to have in it will allow a simultaneous reader to see old data. Move the call to SetPageUptodate into ubifs_write_end(), which is after we copied the new data into the page. Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13fat: fix uninitialized field in nostale filehandlesJan Kara
[ Upstream commit fde2497d2bc3a063d8af88b258dbadc86bd7b57c ] When fat_encode_fh_nostale() encodes file handle without a parent it stores only first 10 bytes of the file handle. However the length of the file handle must be a multiple of 4 so the file handle is actually 12 bytes long and the last two bytes remain uninitialized. This is not great at we potentially leak uninitialized information with the handle to userspace. Properly initialize the full handle length. Link: https://lkml.kernel.org/r/20240205122626.13701-1-jack@suse.cz Reported-by: syzbot+3ce5dea5b1539ff36769@syzkaller.appspotmail.com Fixes: ea3983ace6b7 ("fat: restructure export_operations") Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Amir Goldstein <amir73il@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26NFS: Fix an off by one in root_nfs_cat()Christophe JAILLET
[ Upstream commit 698ad1a538da0b6bf969cfee630b4e3a026afb87 ] The intent is to check if 'dest' is truncated or not. So, >= should be used instead of >, because strlcat() returns the length of 'dest' and 'src' excluding the trailing NULL. Fixes: 56463e50d1fc ("NFS: Use super.c for NFSROOT mount option parsing") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26quota: Fix rcu annotations of inode dquot pointersJan Kara
[ Upstream commit 179b8c97ebf63429589f5afeba59a181fe70603e ] Dquot pointers in i_dquot array in the inode are protected by dquot_srcu. Annotate the array pointers with __rcu, perform the locked dereferences with srcu_dereference_check() instead of plain reads, and set the array elements with rcu_assign_pointer(). Fixes: b9ba6f94b238 ("quota: remove dqptr_sem") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402061900.rTuYDlo6-lkp@intel.com/ Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26quota: Fix potential NULL pointer dereferenceWang Jianjian
[ Upstream commit d0aa72604fbd80c8aabb46eda00535ed35570f1f ] Below race may cause NULL pointer dereference P1 P2 dquot_free_inode quota_off drop_dquot_ref remove_dquot_ref dquots = i_dquot(inode) dquots = i_dquot(inode) srcu_read_lock dquots[cnt]) != NULL (1) dquots[type] = NULL (2) spin_lock(&dquots[cnt]->dq_dqb_lock) (3) .... If dquot_free_inode(or other routines) checks inode's quota pointers (1) before quota_off sets it to NULL(2) and use it (3) after that, NULL pointer dereference will be triggered. So let's fix it by using a temporary pointer to avoid this issue. Signed-off-by: Wang Jianjian <wangjianjian3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240202081852.2514092-1-wangjianjian3@huawei.com> Stable-dep-of: 179b8c97ebf6 ("quota: Fix rcu annotations of inode dquot pointers") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26quota: simplify drop_dquot_ref()Baokun Li
[ Upstream commit 7bce48f0fec602b3b6c335963b26d9eefa417788 ] As Honza said, remove_inode_dquot_ref() currently does not release the last dquot reference but instead adds the dquot to tofree_head list. This is because dqput() can sleep while dropping of the last dquot reference (writing back the dquot and calling ->release_dquot()) and that must not happen under dq_list_lock. Now that dqput() queues the final dquot cleanup into a workqueue, remove_inode_dquot_ref() can call dqput() unconditionally and we can significantly simplify it. Here we open code the simplified code of remove_inode_dquot_ref() into remove_dquot_ref() and remove the function put_dquot_list() which is no longer used. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-6-libaokun1@huawei.com> Stable-dep-of: 179b8c97ebf6 ("quota: Fix rcu annotations of inode dquot pointers") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26quota: check time limit when back out space/inode changeChengguang Xu
[ Upstream commit 632a9f3acd6687376cbb0b178df6048e19cbacc9 ] When we fail from allocating inode/space, we back out the change we already did. In a special case which has exceeded soft limit by the change, we should also check time limit and reset it properly. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jan Kara <jack@suse.cz> Stable-dep-of: 179b8c97ebf6 ("quota: Fix rcu annotations of inode dquot pointers") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26fs/quota: erase unused but set variable warningJiang Biao
[ Upstream commit 78bc3334a69ff289dbc973a9db7c52a2d7757e5b ] Local variable *reserved* of remove_dquot_ref() is only used if define CONFIG_QUOTA_DEBUG, but not ebraced in CONFIG_QUOTA_DEBUG macro, which leads to unused-but-set-variable warning when compiling. This patch ebrace it into CONFIG_QUOTA_DEBUG macro like what is done in add_dquot_ref(). Signed-off-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Jan Kara <jack@suse.cz> Stable-dep-of: 179b8c97ebf6 ("quota: Fix rcu annotations of inode dquot pointers") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26quota: code cleanup for __dquot_alloc_space()Chengguang Xu
[ Upstream commit df15a2a59d0b29d86e17140b83ed231adaded12f ] Replace (flags & DQUOT_SPACE_RESERVE) with variable reserve. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jan Kara <jack@suse.cz> Stable-dep-of: 179b8c97ebf6 ("quota: Fix rcu annotations of inode dquot pointers") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26fs/select: rework stack allocation hack for clangArnd Bergmann
[ Upstream commit ddb9fd7a544088ed70eccbb9f85e9cc9952131c1 ] A while ago, we changed the way that select() and poll() preallocate a temporary buffer just under the size of the static warning limit of 1024 bytes, as clang was frequently going slightly above that limit. The warnings have recently returned and I took another look. As it turns out, clang is not actually inherently worse at reserving stack space, it just happens to inline do_select() into core_sys_select(), while gcc never inlines it. Annotate do_select() to never be inlined and in turn remove the special case for the allocation size. This should give the same behavior for both clang and gcc all the time and once more avoids those warnings. Fixes: ad312f95d41c ("fs/select: avoid clang stack usage warning") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240216202352.2492798-1-arnd@kernel.org Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleakNikita Zhandarovich
[ Upstream commit 3948abaa4e2be938ccdfc289385a27342fb13d43 ] syzbot identified a kernel information leak vulnerability in do_sys_name_to_handle() and issued the following report [1]. [1] "BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0xbc/0x100 lib/usercopy.c:40 instrument_copy_to_user include/linux/instrumented.h:114 [inline] _copy_to_user+0xbc/0x100 lib/usercopy.c:40 copy_to_user include/linux/uaccess.h:191 [inline] do_sys_name_to_handle fs/fhandle.c:73 [inline] __do_sys_name_to_handle_at fs/fhandle.c:112 [inline] __se_sys_name_to_handle_at+0x949/0xb10 fs/fhandle.c:94 __x64_sys_name_to_handle_at+0xe4/0x140 fs/fhandle.c:94 ... Uninit was created at: slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768 slab_alloc_node mm/slub.c:3478 [inline] __kmem_cache_alloc_node+0x5c9/0x970 mm/slub.c:3517 __do_kmalloc_node mm/slab_common.c:1006 [inline] __kmalloc+0x121/0x3c0 mm/slab_common.c:1020 kmalloc include/linux/slab.h:604 [inline] do_sys_name_to_handle fs/fhandle.c:39 [inline] __do_sys_name_to_handle_at fs/fhandle.c:112 [inline] __se_sys_name_to_handle_at+0x441/0xb10 fs/fhandle.c:94 __x64_sys_name_to_handle_at+0xe4/0x140 fs/fhandle.c:94 ... Bytes 18-19 of 20 are uninitialized Memory access of size 20 starts at ffff888128a46380 Data copied to user address 0000000020000240" Per Chuck Lever's suggestion, use kzalloc() instead of kmalloc() to solve the problem. Fixes: 990d6c2d7aee ("vfs: Add name to file handle conversion support") Suggested-by: Chuck Lever III <chuck.lever@oracle.com> Reported-and-tested-by: <syzbot+09b349b3066c2e0b1e96@syzkaller.appspotmail.com> Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20240119153906.4367-1-n.zhandarovich@fintech.ru Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-15btrfs: ref-verify: free ref cache before clearing mount optFedor Pchelkin
[ Upstream commit f03e274a8b29d1d1c1bbd7f764766cb5ca537ab7 ] As clearing REF_VERIFY mount option indicates there were some errors in a ref-verify process, a ref cache is not relevant anymore and should be freed. btrfs_free_ref_cache() requires REF_VERIFY option being set so call it just before clearing the mount option. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Reported-by: syzbot+be14ed7728594dc8bd42@syzkaller.appspotmail.com Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 5.4+ Closes: https://lore.kernel.org/lkml/000000000000e5a65c05ee832054@google.com/ Reported-by: syzbot+c563a3c79927971f950f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000007fe09705fdc6086c@google.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06cachefiles: fix memory leak in cachefiles_add_cache()Baokun Li
commit e21a2f17566cbd64926fb8f16323972f7a064444 upstream. The following memory leak was reported after unbinding /dev/cachefiles: ================================================================== unreferenced object 0xffff9b674176e3c0 (size 192): comm "cachefilesd2", pid 680, jiffies 4294881224 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc ea38a44b): [<ffffffff8eb8a1a5>] kmem_cache_alloc+0x2d5/0x370 [<ffffffff8e917f86>] prepare_creds+0x26/0x2e0 [<ffffffffc002eeef>] cachefiles_determine_cache_security+0x1f/0x120 [<ffffffffc00243ec>] cachefiles_add_cache+0x13c/0x3a0 [<ffffffffc0025216>] cachefiles_daemon_write+0x146/0x1c0 [<ffffffff8ebc4a3b>] vfs_write+0xcb/0x520 [<ffffffff8ebc5069>] ksys_write+0x69/0xf0 [<ffffffff8f6d4662>] do_syscall_64+0x72/0x140 [<ffffffff8f8000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 ================================================================== Put the reference count of cache_cred in cachefiles_daemon_unbind() to fix the problem. And also put cache_cred in cachefiles_add_cache() error branch to avoid memory leaks. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") CC: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240217081431.796809-1-libaokun1@huawei.com Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06btrfs: dev-replace: properly validate device namesDavid Sterba
commit 9845664b9ee47ce7ee7ea93caf47d39a9d4552c4 upstream. There's a syzbot report that device name buffers passed to device replace are not properly checked for string termination which could lead to a read out of bounds in getname_kernel(). Add a helper that validates both source and target device name buffers. For devid as the source initialize the buffer to empty string in case something tries to read it later. This was originally analyzed and fixed in a different way by Edward Adam Davis (see links). Link: https://lore.kernel.org/linux-btrfs/000000000000d1a1d1060cc9c5e7@google.com/ Link: https://lore.kernel.org/linux-btrfs/tencent_44CA0665C9836EF9EEC80CB9E7E206DF5206@qq.com/ CC: stable@vger.kernel.org # 4.19+ CC: Edward Adam Davis <eadavis@qq.com> Reported-and-tested-by: syzbot+33f23b49ac24f986c9e8@syzkaller.appspotmail.com Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaioBart Van Assche
commit b820de741ae48ccf50dd95e297889c286ff4f760 upstream. If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears: WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace: kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc __arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8 Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is submitted by libaio. Suggested-by: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()Baokun Li
[ Upstream commit 832698373a25950942c04a512daa652c18a9b513 ] Places the logic for checking if the group's block bitmap is corrupt under the protection of the group lock to avoid allocating blocks from the group with a corrupted block bitmap. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-8-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()Baokun Li
[ Upstream commit 4530b3660d396a646aad91a787b6ab37cf604b53 ] Determine if the group block bitmap is corrupted before using ac_b_ex in ext4_mb_try_best_found() to avoid allocating blocks from a group with a corrupted block bitmap in the following concurrency and making the situation worse. ext4_mb_regular_allocator ext4_lock_group(sb, group) ext4_mb_good_group // check if the group bbitmap is corrupted ext4_mb_complex_scan_group // Scan group gets ac_b_ex but doesn't use it ext4_unlock_group(sb, group) ext4_mark_group_bitmap_corrupted(group) // The block bitmap was corrupted during // the group unlock gap. ext4_mb_try_best_found ext4_lock_group(ac->ac_sb, group) ext4_mb_use_best_found mb_mark_used // Allocating blocks in block bitmap corrupted group Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01nilfs2: replace WARN_ONs for invalid DAT metadata block requestsRyusuke Konishi
commit 5124a0a549857c4b87173280e192eea24dea72ad upstream. If DAT metadata file block access fails due to corruption of the DAT file or abnormal virtual block numbers held by b-trees or inodes, a kernel warning is generated. This replaces the WARN_ONs by error output, so that a kernel, booted with panic_on_warn, does not panic. This patch also replaces the detected return code -ENOENT with another internal code -EINVAL to notify the bmap layer of metadata corruption. When the bmap layer sees -EINVAL, it handles the abnormal situation with nilfs_bmap_convert_error() and finally returns code -EIO as it should. Link: https://lkml.kernel.org/r/0000000000005cc3d205ea23ddcf@google.com Link: https://lkml.kernel.org/r/20230126164114.6911-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: <syzbot+5d5d25f90f195a3cfcb4@syzkaller.appspotmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23lsm: new security_file_ioctl_compat() hookAlfred Piccioni
commit f1bb47a31dff6d4b34fb14e99850860ee74bb003 upstream. Some ioctl commands do not require ioctl permission, but are routed to other permissions such as FILE_GETATTR or FILE_SETATTR. This routing is done by comparing the ioctl cmd to a set of 64-bit flags (FS_IOC_*). However, if a 32-bit process is running on a 64-bit kernel, it emits 32-bit flags (FS_IOC32_*) for certain ioctl operations. These flags are being checked erroneously, which leads to these ioctl operations being routed to the ioctl permission, rather than the correct file permissions. This was also noted in a RED-PEN finding from a while back - "/* RED-PEN how should LSM module know it's handling 32bit? */". This patch introduces a new hook, security_file_ioctl_compat(), that is called from the compat ioctl syscall. All current LSMs have been changed to support this hook. Reviewing the three places where we are currently using security_file_ioctl(), it appears that only SELinux needs a dedicated compat change; TOMOYO and SMACK appear to be functional without any change. Cc: stable@vger.kernel.org Fixes: 0b24dcb7f2f7 ("Revert "selinux: simplify ioctl checking"") Signed-off-by: Alfred Piccioni <alpic@google.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: subject tweak, line length fixes, and alignment corrections] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nilfs2: fix potential bug in end_buffer_async_writeRyusuke Konishi
commit 5bc09b397cbf1221f8a8aacb1152650c9195b02b upstream. According to a syzbot report, end_buffer_async_write(), which handles the completion of block device writes, may detect abnormal condition of the buffer async_write flag and cause a BUG_ON failure when using nilfs2. Nilfs2 itself does not use end_buffer_async_write(). But, the async_write flag is now used as a marker by commit 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") as a means of resolving double list insertion of dirty blocks in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the resulting crash. This modification is safe as long as it is used for file data and b-tree node blocks where the page caches are independent. However, it was irrelevant and redundant to also introduce async_write for segment summary and super root blocks that share buffers with the backing device. This led to the possibility that the BUG_ON check in end_buffer_async_write would fail as described above, if independent writebacks of the backing device occurred in parallel. The use of async_write for segment summary buffers has already been removed in a previous change. Fix this issue by removing the manipulation of the async_write flag for the remaining super root block buffer. Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com Fixes: 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()Ryusuke Konishi
commit 38296afe3c6ee07319e01bb249aa4bb47c07b534 upstream. Syzbot reported a hang issue in migrate_pages_batch() called by mbind() and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2. While migrate_pages_batch() locks a folio and waits for the writeback to complete, the log writer thread that should bring the writeback to completion picks up the folio being written back in nilfs_lookup_dirty_data_buffers() that it calls for subsequent log creation and was trying to lock the folio. Thus causing a deadlock. In the first place, it is unexpected that folios/pages in the middle of writeback will be updated and become dirty. Nilfs2 adds a checksum to verify the validity of the log being written and uses it for recovery at mount, so data changes during writeback are suppressed. Since this is broken, an unclean shutdown could potentially cause recovery to fail. Investigation revealed that the root cause is that the wait for writeback completion in nilfs_page_mkwrite() is conditional, and if the backing device does not require stable writes, data may be modified without waiting. Fix these issues by making nilfs_page_mkwrite() wait for writeback to finish regardless of the stable write requirement of the backing device. Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nilfs2: fix data corruption in dsync block recovery for small block sizesRyusuke Konishi
commit 67b8bcbaed4777871bb0dcc888fb02a614a98ab1 upstream. The helper function nilfs_recovery_copy_block() of nilfs_recovery_dsync_blocks(), which recovers data from logs created by data sync writes during a mount after an unclean shutdown, incorrectly calculates the on-page offset when copying repair data to the file's page cache. In environments where the block size is smaller than the page size, this flaw can cause data corruption and leak uninitialized memory bytes during the recovery process. Fix these issues by correcting this byte offset calculation on the page. Link: https://lkml.kernel.org/r/20240124121936.10575-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23ext4: fix double-free of blocks due to wrong extents moved_lenBaokun Li
commit 55583e899a5357308274601364741a83e78d6ac4 upstream. In ext4_move_extents(), moved_len is only updated when all moves are successfully executed, and only discards orig_inode and donor_inode preallocations when moved_len is not zero. When the loop fails to exit after successfully moving some extents, moved_len is not updated and remains at 0, so it does not discard the preallocations. If the moved extents overlap with the preallocated extents, the overlapped extents are freed twice in ext4_mb_release_inode_pa() and ext4_process_freed_data() (as described in commit 94d7c16cbbbd ("ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is incremented twice. Hence when trim is executed, a zero-division bug is triggered in mb_update_avg_fragment_size() because bb_free is not zero and bb_fragments is zero. Therefore, update move_len after each extent move to avoid the issue. Reported-by: Wei Chen <harperchen1110@gmail.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base") CC: <stable@vger.kernel.org> # 3.18 Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23btrfs: send: return EOPNOTSUPP on unknown flagsDavid Sterba
commit f884a9f9e59206a2d41f265e7e403f080d10b493 upstream. When some ioctl flags are checked we return EOPNOTSUPP, like for BTRFS_SCRUB_SUPPORTED_FLAGS, BTRFS_SUBVOL_CREATE_ARGS_MASK or fallocate modes. The EINVAL is supposed to be for a supported but invalid values or combination of options. Fix that when checking send flags so it's consistent with the rest. CC: stable@vger.kernel.org # 4.14+ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5rryOLzp3EKq8RTbjMHMHeaJubfpsVLF6H4qJnKCUR1w@mail.gmail.com/ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23btrfs: forbid creating subvol qgroupsBoris Burkov
commit 0c309d66dacddf8ce939b891d9ead4a8e21ad6f0 upstream. Creating a qgroup 0/subvolid leads to various races and it isn't helpful, because you can't specify a subvol id when creating a subvol, so you can't be sure it will be the right one. Any requirements on the automatic subvol can be gratified by using a higher level qgroup and the inheritance parameters of subvol creation. Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23ceph: fix deadlock or deadcode of misusing dget()Xiubo Li
[ Upstream commit b493ad718b1f0357394d2cdecbf00a44a36fa085 ] The lock order is incorrect between denty and its parent, we should always make sure that the parent get the lock first. But since this deadcode is never used and the parent dir will always be set from the callers, let's just remove it. Link: https://lore.kernel.org/r/20231116081919.GZ1957730@ZenIV Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23fast_dput(): handle underflows gracefullyAl Viro
[ Upstream commit 504e08cebe1d4e1efe25f915234f646e74a364a8 ] If refcount is less than 1, we should just warn, unlock dentry and return true, so that the caller doesn't try to do anything else. Taking care of that leaves the rest of "lockref_put_return() has failed" case equivalent to "decrement refcount and rejoin the normal slow path after the point where we grab ->d_lock". NOTE: lockref_put_return() is strictly a fastpath thing - unlike the rest of lockref primitives, it does not contain a fallback. Caller (and it looks like fast_dput() is the only legitimate one in the entire kernel) has to do that itself. Reasons for lockref_put_return() failures: * ->d_lock held by somebody * refcount <= 0 * ... or an architecture not supporting lockref use of cmpxchg - sparc, anything non-SMP, config with spinlock debugging... We could add a fallback, but it would be a clumsy API - we'd have to distinguish between: (1) refcount > 1 - decremented, lock not held on return (2) refcount < 1 - left alone, probably no sense to hold the lock (3) refcount is 1, no cmphxcg - decremented, lock held on return (4) refcount is 1, cmphxcg supported - decremented, lock *NOT* held on return. We want to return with no lock held in case (4); that's the whole point of that thing. We very much do not want to have the fallback in case (3) return without a lock, since the caller might have to retake it in that case. So it wouldn't be more convenient than doing the fallback in the caller and it would be very easy to screw up, especially since the test coverage would suck - no way to test (3) and (4) on the same kernel build. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23f2fs: fix to check return value of f2fs_reserve_new_block()Chao Yu
[ Upstream commit 956fa1ddc132e028f3b7d4cf17e6bfc8cb36c7fd ] Let's check return value of f2fs_reserve_new_block() in do_recover_data() rather than letting it fails silently. Also refactoring check condition on return value of f2fs_reserve_new_block() as below: - trigger f2fs_bug_on() only for ENOSPC case; - use do-while statement to avoid redundant codes; Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23ext4: avoid online resizing failures due to oversized flex bgBaokun Li
[ Upstream commit 5d1935ac02ca5aee364a449a35e2977ea84509b0 ] When we online resize an ext4 filesystem with a oversized flexbg_size, mkfs.ext4 -F -G 67108864 $dev -b 4096 100M mount $dev $dir resize2fs $dev 16G the following WARN_ON is triggered: ================================================================== WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550 Modules linked in: sg(E) CPU: 0 PID: 427 Comm: resize2fs Tainted: G E 6.6.0-rc5+ #314 RIP: 0010:__alloc_pages+0x411/0x550 Call Trace: <TASK> __kmalloc_large_node+0xa2/0x200 __kmalloc+0x16e/0x290 ext4_resize_fs+0x481/0xd80 __ext4_ioctl+0x1616/0x1d90 ext4_ioctl+0x12/0x20 __x64_sys_ioctl+0xf0/0x150 do_syscall_64+0x3b/0x90 ================================================================== This is because flexbg_size is too large and the size of the new_group_data array to be allocated exceeds MAX_ORDER. Currently, the minimum value of MAX_ORDER is 8, the minimum value of PAGE_SIZE is 4096, the corresponding maximum number of groups that can be allocated is: (PAGE_SIZE << MAX_ORDER) / sizeof(struct ext4_new_group_data) ≈ 21845 And the value that is down-aligned to the power of 2 is 16384. Therefore, this value is defined as MAX_RESIZE_BG, and the number of groups added each time does not exceed this value during resizing, and is added multiple times to complete the online resizing. The difference is that the metadata in a flex_bg may be more dispersed. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231023013057.2117948-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23ext4: remove unnecessary check from alloc_flex_gd()Baokun Li
[ Upstream commit b099eb87de105cf07cad731ded6fb40b2675108b ] In commit 967ac8af4475 ("ext4: fix potential integer overflow in alloc_flex_gd()"), an overflow check is added to alloc_flex_gd() to prevent the allocated memory from being smaller than expected due to the overflow. However, after kmalloc() is replaced with kmalloc_array() in commit 6da2ec56059c ("treewide: kmalloc() -> kmalloc_array()"), the kmalloc_array() function has an overflow check, so the above problem will not occur. Therefore, the extra check is removed. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231023013057.2117948-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23ext4: unify the type of flexbg_size to unsigned intBaokun Li
[ Upstream commit 658a52344fb139f9531e7543a6e0015b630feb38 ] The maximum value of flexbg_size is 2^31, but the maximum value of int is (2^31 - 1), so overflow may occur when the type of flexbg_size is declared as int. For example, when uninit_mask is initialized in ext4_alloc_group_tables(), if flexbg_size == 2^31, the initialized uninit_mask is incorrect, and this may causes set_flexbg_block_bitmap() to trigger a BUG_ON(). Therefore, the flexbg_size type is declared as unsigned int to avoid overflow and memory waste. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231023013057.2117948-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23ext4: fix inconsistent between segment fstrim and full fstrimYe Bin
[ Upstream commit 68da4c44b994aea797eb9821acb3a4a36015293e ] Suppose we issue two FITRIM ioctls for ranges [0,15] and [16,31] with mininum length of trimmed range set to 8 blocks. If we have say a range of blocks 10-22 free, this range will not be trimmed because it straddles the boundary of the two FITRIM ranges and neither part is big enough. This is a bit surprising to some users that call FITRIM on smaller ranges of blocks to limit impact on the system. Also XFS trims all free space extents that overlap with the specified range so we are inconsistent among filesystems. Let's change ext4_try_to_trim_range() to consider for trimming the whole free space extent that straddles the end of specified range, not just the part of it within the range. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231216010919.1995851-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23jfs: fix array-index-out-of-bounds in diNewExtEdward Adam Davis
[ Upstream commit 49f9637aafa6e63ba686c13cb8549bf5e6920402 ] [Syz report] UBSAN: array-index-out-of-bounds in fs/jfs/jfs_imap.c:2360:2 index -878706688 is out of range for type 'struct iagctl[128]' CPU: 1 PID: 5065 Comm: syz-executor282 Not tainted 6.7.0-rc4-syzkaller-00009-gbee0e7762ad2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 diNewExt+0x3cf3/0x4000 fs/jfs/jfs_imap.c:2360 diAllocExt fs/jfs/jfs_imap.c:1949 [inline] diAllocAG+0xbe8/0x1e50 fs/jfs/jfs_imap.c:1666 diAlloc+0x1d3/0x1760 fs/jfs/jfs_imap.c:1587 ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56 jfs_mkdir+0x1c5/0xb90 fs/jfs/namei.c:225 vfs_mkdir+0x2f1/0x4b0 fs/namei.c:4106 do_mkdirat+0x264/0x3a0 fs/namei.c:4129 __do_sys_mkdir fs/namei.c:4149 [inline] __se_sys_mkdir fs/namei.c:4147 [inline] __x64_sys_mkdir+0x6e/0x80 fs/namei.c:4147 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7fcb7e6a0b57 Code: ff ff 77 07 31 c0 c3 0f 1f 40 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd83023038 EFLAGS: 00000286 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fcb7e6a0b57 RDX: 00000000000a1020 RSI: 00000000000001ff RDI: 0000000020000140 RBP: 0000000020000140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000286 R12: 00007ffd830230d0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [Analysis] When the agstart is too large, it can cause agno overflow. [Fix] After obtaining agno, if the value is invalid, exit the subsequent process. Reported-and-tested-by: syzbot+553d90297e6d2f50dbc7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Modified the test from agno > MAXAG to agno >= MAXAG based on linux-next report by kernel test robot (Dan Carpenter). Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()Oleg Nesterov
[ Upstream commit 1702e0654ca9a7bcd7c7619c8a5004db58945b71 ] David Howells says: (5) afs_find_server(). There could be a lot of servers in the list and each server can have multiple addresses, so I think this would be better with an exclusive second pass. The server list isn't likely to change all that often, but when it does change, there's a good chance several servers are going to be added/removed one after the other. Further, this is only going to be used for incoming cache management/callback requests from the server, which hopefully aren't going to happen too often - but it is remotely drivable. (6) afs_find_server_by_uuid(). Similarly to (5), there could be a lot of servers to search through, but they are in a tree not a flat list, so it should be faster to process. Again, it's not likely to change that often and, again, when it does change it's likely to involve multiple changes. This can be driven remotely by an incoming cache management request but is mostly going to be driven by setting up or reconfiguring a volume's server list - something that also isn't likely to happen often. Make the "seq" counter odd on the 2nd pass, otherwise read_seqbegin_or_lock() never takes the lock. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20231130115614.GA21581@redhat.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23pstore/ram: Fix crash when setting number of cpus to an odd numberWeichen Chen
[ Upstream commit d49270a04623ce3c0afddbf3e984cb245aa48e9c ] When the number of cpu cores is adjusted to 7 or other odd numbers, the zone size will become an odd number. The address of the zone will become: addr of zone0 = BASE addr of zone1 = BASE + zone_size addr of zone2 = BASE + zone_size*2 ... The address of zone1/3/5/7 will be mapped to non-alignment va. Eventually crashes will occur when accessing these va. So, use ALIGN_DOWN() to make sure the zone size is even to avoid this bug. Signed-off-by: Weichen Chen <weichen.chen@mediatek.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Tested-by: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Link: https://lore.kernel.org/r/20230224023632.6840-1-weichen.chen@mediatek.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23jfs: fix uaf in jfs_evict_inodeEdward Adam Davis
[ Upstream commit e0e1958f4c365e380b17ccb35617345b31ef7bf3 ] When the execution of diMount(ipimap) fails, the object ipimap that has been released may be accessed in diFreeSpecial(). Asynchronous ipimap release occurs when rcu_core() calls jfs_free_node(). Therefore, when diMount(ipimap) fails, sbi->ipimap should not be initialized as ipimap. Reported-and-tested-by: syzbot+01cf2dbcbe2022454388@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23jfs: fix array-index-out-of-bounds in dbAdjTreeManas Ghandat
[ Upstream commit 74ecdda68242b174920fe7c6133a856fb7d8559b ] Currently there is a bound check missing in the dbAdjTree while accessing the dmt_stree. To add the required check added the bool is_ctl which is required to determine the size as suggest in the following commit. https://lore.kernel.org/linux-kernel-mentees/f9475918-2186-49b8-b801-6f0f9e75f4fa@oracle.com/ Reported-by: syzbot+39ba34a099ac2e9bd3cb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23jfs: fix slab-out-of-bounds Read in dtSearchManas Ghandat
[ Upstream commit fa5492ee89463a7590a1449358002ff7ef63529f ] Currently while searching for current page in the sorted entry table of the page there is a out of bound access. Added a bound check to fix the error. Dave: Set return code to -EIO Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202310241724.Ed02yUz9-lkp@intel.com/ Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>