summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-03-07afs: Fix the server_list to unuse a displaced server rather than putting itDavid Howells
[ Upstream commit add117e48df4788a86a21bd0515833c0a6db1ad1 ] When allocating and building an afs_server_list struct object from a VLDB record, we look up each server address to get the server record for it - but a server may have more than one entry in the record and we discard the duplicate pointers. Currently, however, when we discard, we only put a server record, not unuse it - but the lookup got as an active-user count. The active-user count on an afs_server_list object determines its lifetime whereas the refcount keeps the memory backing it around. Failing to reduce the active-user counter prevents the record from being cleaned up and can lead to multiple copied being seen - and pointing to deleted afs_cell objects and other such things. Fix this by switching the incorrect 'put' to an 'unuse' instead. Without this, occasionally, a dead server record can be seen in /proc/net/afs/servers and list corruption may be observed: list_del corruption. prev->next should be ffff888102423e40, but was 0000000000000000. (prev=ffff88810140cd38) Fixes: 977e5f8ed0ab ("afs: Split the usage count on struct afs_server") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250218192250.296870-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07afs: Make it possible to find the volumes that are using a serverDavid Howells
[ Upstream commit ca0e79a46097d54e4af46c67c852479d97af35bb ] Make it possible to find the afs_volume structs that are using an afs_server struct to aid in breaking volume callbacks. The way this is done is that each afs_volume already has an array of afs_server_entry records that point to the servers where that volume might be found. An afs_volume backpointer and a list node is added to each entry and each entry is then added to an RCU-traversable list on the afs_server to which it points. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Stable-dep-of: add117e48df4 ("afs: Fix the server_list to unuse a displaced server rather than putting it") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_upVasiliy Kovalev
[ Upstream commit c84e125fff2615b4d9c259e762596134eddd2f27 ] The issue was caused by dput(upper) being called before ovl_dentry_update_reval(), while upper->d_flags was still accessed in ovl_dentry_remote(). Move dput(upper) after its last use to prevent use-after-free. BUG: KASAN: slab-use-after-free in ovl_dentry_remote fs/overlayfs/util.c:162 [inline] BUG: KASAN: slab-use-after-free in ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc3/0x620 mm/kasan/report.c:488 kasan_report+0xd9/0x110 mm/kasan/report.c:601 ovl_dentry_remote fs/overlayfs/util.c:162 [inline] ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167 ovl_link_up fs/overlayfs/copy_up.c:610 [inline] ovl_copy_up_one+0x2105/0x3490 fs/overlayfs/copy_up.c:1170 ovl_copy_up_flags+0x18d/0x200 fs/overlayfs/copy_up.c:1223 ovl_rename+0x39e/0x18c0 fs/overlayfs/dir.c:1136 vfs_rename+0xf84/0x20a0 fs/namei.c:4893 ... </TASK> Fixes: b07d5cc93e1b ("ovl: update of dentry revalidate flags after copy up") Reported-by: syzbot+316db8a1191938280eb6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=316db8a1191938280eb6 Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Link: https://lore.kernel.org/r/20250214215148.761147-1-kovalev@altlinux.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27nilfs2: handle errors that nilfs_prepare_chunk() may returnRyusuke Konishi
commit ee70999a988b8abc3490609142f50ebaa8344432 upstream. Patch series "nilfs2: fix issues with rename operations". This series fixes BUG_ON check failures reported by syzbot around rename operations, and a minor behavioral issue where the mtime of a child directory changes when it is renamed instead of moved. This patch (of 2): The directory manipulation routines nilfs_set_link() and nilfs_delete_entry() rewrite the directory entry in the folio/page previously read by nilfs_find_entry(), so error handling is omitted on the assumption that nilfs_prepare_chunk(), which prepares the buffer for rewriting, will always succeed for these. And if an error is returned, it triggers the legacy BUG_ON() checks in each routine. This assumption is wrong, as proven by syzbot: the buffer layer called by nilfs_prepare_chunk() may call nilfs_get_block() if necessary, which may fail due to metadata corruption or other reasons. This has been there all along, but improved sanity checks and error handling may have made it more reproducible in fuzzing tests. Fix this issue by adding missing error paths in nilfs_set_link(), nilfs_delete_entry(), and their caller nilfs_rename(). [konishi.ryusuke@gmail.com: adjusted for page/folio conversion] Link: https://lkml.kernel.org/r/20250111143518.7901-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250111143518.7901-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+32c3706ebf5d95046ea1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=32c3706ebf5d95046ea1 Reported-by: syzbot+1097e95f134f37d9395c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1097e95f134f37d9395c Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27nilfs2: eliminate staggered calls to kunmap in nilfs_renameRyusuke Konishi
commit 8cf57c6df818f58fdad16a909506be213623a88e upstream. In nilfs_rename(), calls to nilfs_put_page() to release pages obtained with nilfs_find_entry() or nilfs_dotdot() are alternated in the normal path. When replacing the kernel memory mapping method from kmap to kmap_local_{page,folio}, this violates the constraint on the calling order of kunmap_local(). Swap the order of nilfs_put_page calls where the kmap sections of multiple pages overlap so that they are nested, allowing direct replacement of nilfs_put_page() -> unmap_and_put_page(). Without this reordering, that replacement will cause a kernel WARNING in kunmap_local_indexed() on architectures with high memory mapping. Link: https://lkml.kernel.org/r/20231127143036.2425-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: ee70999a988b ("nilfs2: handle errors that nilfs_prepare_chunk() may return") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_linkRyusuke Konishi
commit 584db20c181f5e28c0386d7987406ace7fbd3e49 upstream. Patch series "nilfs2: Folio conversions for directory paths". This series applies page->folio conversions to nilfs2 directory operations. This reduces hidden compound_head() calls and also converts deprecated kmap calls to kmap_local in the directory code. Although nilfs2 does not yet support large folios, Matthew has done his best here to include support for large folios, which will be needed for devices with large block sizes. This series corresponds to the second half of the original post [1], but with two complementary patches inserted at the beginning and some adjustments, to prevent a kmap_local constraint violation found during testing with highmem mapping. [1] https://lkml.kernel.org/r/20231106173903.1734114-1-willy@infradead.org I have reviewed all changes and tested this for regular and small block sizes, both on machines with and without highmem mapping. No issues found. This patch (of 17): In a few directory operations, the call to nilfs_put_page() for a page obtained using nilfs_find_entry() or nilfs_dotdot() is hidden in nilfs_set_link() and nilfs_delete_entry(), making it difficult to track page release and preventing change of its call position. By moving nilfs_put_page() out of these functions, this makes the page get/put correspondence clearer and makes it easier to swap nilfs_put_page() calls (and kunmap calls within them) when modifying multiple directory entries simultaneously in nilfs_rename(). Also, update comments for nilfs_set_link() and nilfs_delete_entry() to reflect changes in their behavior. To make nilfs_put_page() visible from namei.c, this moves its definition to nilfs.h and replaces existing equivalents to use it, but the exposure of that definition is temporary and will be removed on a later kmap -> kmap_local conversion. Link: https://lkml.kernel.org/r/20231127143036.2425-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20231127143036.2425-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: ee70999a988b ("nilfs2: handle errors that nilfs_prepare_chunk() may return") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27smb: client: Add check for next_buffer in receive_encrypted_standard()Haoxiang Li
commit 860ca5e50f73c2a1cef7eefc9d39d04e275417f7 upstream. Add check for the return value of cifs_buf_get() and cifs_small_buf_get() in receive_encrypted_standard() to prevent null pointer dereference. Fixes: eec04ea11969 ("smb: client: fix OOB in receive_encrypted_standard()") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: don't over-report free space or inodes in statvfsDarrick J. Wong
[ Upstream commit 4b8d867ca6e2fc6d152f629fdaf027053b81765a ] Emmanual Florac reports a strange occurrence when project quota limits are enabled, free space is lower than the remaining quota, and someone runs statvfs: # mkfs.xfs -f /dev/sda # mount /dev/sda /mnt -o prjquota # xfs_quota -x -c 'limit -p bhard=2G 55' /mnt # mkdir /mnt/dir # xfs_io -c 'chproj 55' -c 'chattr +P' -c 'stat -vvvv' /mnt/dir # fallocate -l 19g /mnt/a # df /mnt /mnt/dir Filesystem Size Used Avail Use% Mounted on /dev/sda 20G 20G 345M 99% /mnt /dev/sda 2.0G 0 2.0G 0% /mnt I think the bug here is that xfs_fill_statvfs_from_dquot unconditionally assigns to f_bfree without checking that the filesystem has enough free space to fill the remaining project quota. However, this is a longstanding behavior of xfs so it's unclear what to do here. Cc: <stable@vger.kernel.org> # v2.6.18 Fixes: 932f2c323196c2 ("[XFS] statvfs component of directory/project quota support, code originally by Glen.") Reported-by: Emmanuel Florac <eflorac@intellique.com> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27xfs: report realtime block quota limits on realtime directoriesDarrick J. Wong
[ Upstream commit 9a17ebfea9d0c7e0bb7409dcf655bf982a5d6e52 ] On the data device, calling statvfs on a projinherit directory results in the block and avail counts being curtailed to the project quota block limits, if any are set. Do the same for realtime files or directories, only use the project quota rt block limits. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Stable-dep-of: 4b8d867ca6e2 ("xfs: don't over-report free space or inodes in statvfs") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27xfs: Check for delayed allocations before setting extsizeOjaswin Mujoo
commit 2a492ff66673c38a77d0815d67b9a8cce2ef57f8 upstream. Extsize should only be allowed to be set on files with no data in it. For this, we check if the files have extents but miss to check if delayed extents are present. This patch adds that check. While we are at it, also refactor this check into a helper since it's used in some other places as well like xfs_inactive() or xfs_ioctl_setattr_xflags() **Without the patch (SUCCEEDS)** $ xfs_io -c 'open -f testfile' -c 'pwrite 0 1024' -c 'extsize 65536' wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0002 sec (4.628 MiB/sec and 4739.3365 ops/sec) **With the patch (FAILS as expected)** $ xfs_io -c 'open -f testfile' -c 'pwrite 0 1024' -c 'extsize 65536' wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0002 sec (4.628 MiB/sec and 4739.3365 ops/sec) xfs_io: FS_IOC_FSSETXATTR testfile: Invalid argument Fixes: e94af02a9cd7 ("[XFS] fix old xfs_setattr mis-merge from irix; mostly harmless esp if not using xfs rt") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: streamline xfs_filestream_pick_agChristoph Hellwig
commit 81a1e1c32ef474c20ccb9f730afe1ac25b1c62a4 upstream. Directly return the error from xfs_bmap_longest_free_extent instead of breaking from the loop and handling it there, and use a done label to directly jump to the exist when we found a suitable perag structure to reduce the indentation level and pag/max_pag check complexity in the tail of the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: Reduce unnecessary searches when searching for the best extentsChi Zhiling
commit 3ef22684038aa577c10972ee9c6a2455f5fac941 upstream. Recently, we found that the CPU spent a lot of time in xfs_alloc_ag_vextent_size when the filesystem has millions of fragmented spaces. The reason is that we conducted much extra searching for extents that could not yield a better result, and these searches would cost a lot of time when there were millions of extents to search through. Even if we get the same result length, we don't switch our choice to the new one, so we can definitely terminate the search early. Since the result length cannot exceed the found length, when the found length equals the best result length we already have, we can conclude the search. We did a test in that filesystem: [root@localhost ~]# xfs_db -c freesp /dev/vdb from to extents blocks pct 1 1 215 215 0.01 2 3 994476 1988952 99.99 Before this patch: 0) | xfs_alloc_ag_vextent_size [xfs]() { 0) * 15597.94 us | } After this patch: 0) | xfs_alloc_ag_vextent_size [xfs]() { 0) 19.176 us | } Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: update the pag for the last AG at recovery timeChristoph Hellwig
commit 4a201dcfa1ff0dcfe4348c40f3ad8bd68b97eb6c upstream. Currently log recovery never updates the in-core perag values for the last allocation group when they were grown by growfs. This leads to btree record validation failures for the alloc, ialloc or finotbt trees if a transaction references this new space. Found by Brian's new growfs recovery stress test. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_peragChristoph Hellwig
commit 069cf5e32b700f94c6ac60f6171662bdfb04f325 upstream. [backport: uses kmem_zalloc instead of kzalloc] __GFP_RETRY_MAYFAIL increases the likelyhood of allocations to fail, which isn't really helpful during log recovery. Remove the flag and stick to the default GFP_KERNEL policies. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: error out when a superblock buffer update reduces the agcountChristoph Hellwig
commit b882b0f8138ffa935834e775953f1630f89bbb62 upstream. XFS currently does not support reducing the agcount, so error out if a logged sb buffer tries to shrink the agcount. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: update the file system geometry after recoverying superblock buffersChristoph Hellwig
commit 6a18765b54e2e52aebcdb84c3b4f4d1f7cb2c0ca upstream. Primary superblock buffers that change the file system geometry after a growfs operation can affect the operation of later CIL checkpoints that make use of the newly added space and allocation groups. Apply the changes to the in-memory structures as part of recovery pass 2, to ensure recovery works fine for such cases. In the future we should apply the logic to other updates such as features bits as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: pass the exact range to initialize to xfs_initialize_peragChristoph Hellwig
commit 82742f8c3f1a93787a05a00aca50c2a565231f84 upstream. [backport: dependency of 6a18765b] Currently only the new agcount is passed to xfs_initialize_perag, which requires lookups of existing AGs to skip them and complicates error handling. Also pass the previous agcount so that the range that xfs_initialize_perag operates on is exactly defined. That way the extra lookups can be avoided, and error handling can clean up the exact range from the old count to the last added perag structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: Remove empty declartion in header fileZhang Zekun
commit f6225eebd76f371dab98b4d1c1a7c1e255190aef upstream. The definition of xfs_attr_use_log_assist() has been removed since commit d9c61ccb3b09 ("xfs: move xfs_attr_use_log_assist out of xfs_log.c"). So, Remove the empty declartion in header files. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()Uros Bizjak
commit 20195d011c840b01fa91a85ebcd099ca95fbf8fc upstream. Use !try_cmpxchg instead of cmpxchg (*ptr, old, new) != old in xlog_cil_insert_pcp_aggregate(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg. Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. Note that the value from *ptr should be read using READ_ONCE to prevent the compiler from merging, refetching or reordering the read. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_allocChristoph Hellwig
commit 6aac77059881e4419df499392c995bf02fb9630b upstream. Currently the debug-only xfs_bmap_exact_minlen_extent_alloc allocation variant fails to drop into the lowmode last resort allocator, and thus can sometimes fail allocations for which the caller has a transaction block reservation. Fix this by using xfs_bmap_btalloc_low_space to do the actual allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btallocChristoph Hellwig
commit 405ee87c6938f67e6ab62a3f8f85b3c60a093886 upstream. [backport: dependency of 6aac770] xfs_bmap_exact_minlen_extent_alloc duplicates the args setup in xfs_bmap_btalloc. Switch to call it from xfs_bmap_btalloc after doing the basic setup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: don't ifdef around the exact minlen allocationsChristoph Hellwig
commit b611fddc0435738e64453bbf1dadd4b12a801858 upstream. Exact minlen allocations only exist as an error injection tool for debug builds. Currently this is implemented using ifdefs, which means the code isn't even compiled for non-XFS_DEBUG builds. Enhance the compile test coverage by always building the code and use the compilers' dead code elimination to remove it from the generated binary instead. The only downside is that the alloc_minlen_only field is unconditionally added to struct xfs_alloc_args now, but by moving it around and packing it tightly this doesn't actually increase the size of the structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocateChristoph Hellwig
commit 865469cd41bce2b04bef9539cbf70676878bc8df upstream. [backport: dependency of 6aac770] Userdata and metadata allocations end up in the same allocation helpers. Remove the separate xfs_bmap_alloc_userdata function to make this more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addnameChristoph Hellwig
commit b3f4e84e2f438a119b7ca8684a25452b3e57c0f0 upstream. Just like xfs_attr3_leaf_split, xfs_attr_node_try_addname can return -ENOSPC both for an actual failure to allocate a disk block, but also to signal the caller to convert the format of the attr fork. Use magic 1 to ask for the conversion here as well. Note that unlike the similar issue in xfs_attr3_leaf_split, this one was only found by code review. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_splitChristoph Hellwig
commit a5f73342abe1f796140f6585e43e2aa7bc1b7975 upstream. xfs_attr3_leaf_split propagates the need for an extra btree split as -ENOSPC to it's only caller, but the same return value can also be returned from xfs_da_grow_inode when it fails to find free space. Distinguish the two cases by returning 1 for the extra split case instead of overloading -ENOSPC. This can be triggered relatively easily with the pending realtime group support and a file system with a lot of small zones that use metadata space on the main device. In this case every about 5-10th run of xfs/538 runs into the following assert: ASSERT(oldblk->magic == XFS_ATTR_LEAF_MAGIC); in xfs_attr3_leaf_split caused by an allocation failure. Note that the allocation failure is caused by another bug that will be fixed subsequently, but this commit at least sorts out the error handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: return bool from xfs_attr3_leaf_addChristoph Hellwig
commit 346c1d46d4c631c0c88592d371f585214d714da4 upstream. [backport: dependency of a5f7334 and b3f4e84] xfs_attr3_leaf_add only has two potential return values, indicating if the entry could be added or not. Replace the errno return with a bool so that ENOSPC from it can't easily be confused with a real ENOSPC. Remove the return value from the xfs_attr3_leaf_add_work helper entirely, as it always return 0. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addnameChristoph Hellwig
commit b1c649da15c2e4c86344c8e5af69c8afa215efec upstream. [backport: dependency of a5f7334 and b3f4e84] xfs_attr_leaf_try_add is only called by xfs_attr_leaf_addname, and merging the two will simplify a following error handling fix. To facilitate this move the remote block state save/restore helpers up in the file so that they don't need forward declarations now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: don't free cowblocks from under dirty pagecache on unshareBrian Foster
commit 4390f019ad7866c3791c3d768d2ff185d89e8ebe upstream. fallocate unshare mode explicitly breaks extent sharing. When a command completes, it checks the data fork for any remaining shared extents to determine whether the reflink inode flag and COW fork preallocation can be removed. This logic doesn't consider in-core pagecache and I/O state, however, which means we can unsafely remove COW fork blocks that are still needed under certain conditions. For example, consider the following command sequence: xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \ -c "pwrite 0 32k" -c "funshare 0 1k" <file> This allocates a data block at offset 0, shares it, and then overwrites it with a larger buffered write. The overwrite triggers COW fork preallocation, 32 blocks by default, which maps the entire 32k write to delalloc in the COW fork. All but the shared block at offset 0 remains hole mapped in the data fork. The unshare command redirties and flushes the folio at offset 0, removing the only shared extent from the inode. Since the inode no longer maps shared extents, unshare purges the COW fork before the remaining 28k may have written back. This leaves dirty pagecache backed by holes, which writeback quietly skips, thus leaving clean, non-zeroed pagecache over holes in the file. To verify, fiemap shows holes in the first 32k of the file and reads return different data across a remount: $ xfs_io -c "fiemap -v" <file> <file>: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS ... 1: [8..511]: hole 504 ... $ xfs_io -c "pread -v 4k 8" <file> 00001000: cd cd cd cd cd cd cd cd ........ $ umount <mnt>; mount <dev> <mnt> $ xfs_io -c "pread -v 4k 8" <file> 00001000: 00 00 00 00 00 00 00 00 ........ To avoid this problem, make unshare follow the same rules used for background cowblock scanning and never purge the COW fork for inodes with dirty pagecache or in-flight I/O. Fixes: 46afb0628b86347 ("xfs: only flush the unshared range in xfs_reflink_unshare") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: skip background cowblock trims on inodes open for writeBrian Foster
commit 90a71daaf73f5d39bb0cbb3c7ab6af942fe6233e upstream. The background blockgc scanner runs on a 5m interval by default and trims preallocation (post-eof and cow fork) from inodes that are otherwise idle. Idle effectively means that iolock can be acquired without blocking and that the inode has no dirty pagecache or I/O in flight. This simple mechanism and heuristic has worked fairly well for post-eof speculative preallocations. Support for reflink and COW fork preallocations came sometime later and plugged into the same mechanism, with similar heuristics. Some recent testing has shown that COW fork preallocation may be notably more sensitive to blockgc processing than post-eof preallocation, however. For example, consider an 8GB reflinked file with a COW extent size hint of 1MB. A worst case fully randomized overwrite of this file results in ~8k extents of an average size of ~1MB. If the same workload is interrupted a couple times for blockgc processing (assuming the file goes idle), the resulting extent count explodes to over 100k extents with an average size <100kB. This is significantly worse than ideal and essentially defeats the COW extent size hint mechanism. While this particular test is instrumented, it reflects a fairly reasonable pattern in practice where random I/Os might spread out over a large period of time with varying periods of (in)activity. For example, consider a cloned disk image file for a VM or container with long uptime and variable and bursty usage. A background blockgc scan that races and processes the image file when it happens to be clean and idle can have a significant effect on the future fragmentation level of the file, even when still in use. To help combat this, update the heuristic to skip cowblocks inodes that are currently opened for write access during non-sync blockgc scans. This allows COW fork preallocations to persist for as long as possible unless otherwise needed for functional purposes (i.e. a sync scan), the file is idle and closed, or the inode is being evicted from cache. While here, update the comments to help distinguish performance oriented heuristics from the logic that exists to maintain functional correctness. Suggested-by: Darrick Wong <djwong@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: fix a typoAndrew Kreimer
commit 77bfe1b11ea0c0c4b0ce19b742cd1aa82f60e45d upstream. Fix a typo in comments. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: fix a sloppy memory handling bug in xfs_iroot_reallocDarrick J. Wong
commit de55149b6639e903c4d06eb0474ab2c05060e61d upstream. While refactoring code, I noticed that when xfs_iroot_realloc tries to shrink a bmbt root block, it allocates a smaller new block and then copies "records" and pointers to the new block. However, bmbt root blocks cannot ever be leaves, which means that it's not technically correct to copy records. We /should/ be copying keys. Note that this has never resulted in actual memory corruption because sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)). However, this will no longer be true when we start adding realtime rmap stuff, so fix this now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: validate inumber in xfs_igetDarrick J. Wong
commit 05aba1953f4a6e2b48e13c610e8a4545ba4ef509 upstream. Actually use the inumber validator to check the argument passed in here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27xfs: assert a valid limit in xfs_rtfind_forwChristoph Hellwig
commit 6d2db12d56a389b3e8efa236976f8dc3a8ae00f0 upstream. Protect against developers passing stupid limits when refactoring the RT code once again. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21btrfs: fix hole expansion when writing at an offset beyond EOFFilipe Manana
commit da2dccd7451de62b175fb8f0808d644959e964c7 upstream. At btrfs_write_check() if our file's i_size is not sector size aligned and we have a write that starts at an offset larger than the i_size that falls within the same page of the i_size, then we end up not zeroing the file range [i_size, write_offset). The code is this: start_pos = round_down(pos, fs_info->sectorsize); oldsize = i_size_read(inode); if (start_pos > oldsize) { /* Expand hole size to cover write data, preventing empty gap */ loff_t end_pos = round_up(pos + count, fs_info->sectorsize); ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos); if (ret) return ret; } So if our file's i_size is 90269 bytes and a write at offset 90365 bytes comes in, we get 'start_pos' set to 90112 bytes, which is less than the i_size and therefore we don't zero out the range [90269, 90365) by calling btrfs_cont_expand(). This is an old bug introduced in commit 9036c10208e1 ("Btrfs: update hole handling v2"), from 2008, and the buggy code got moved around over the years. Fix this by discarding 'start_pos' and comparing against the write offset ('pos') without any alignment. This bug was recently exposed by test case generic/363 which tests this scenario by polluting ranges beyond EOF with an mmap write and than verify that after a file increases we get zeroes for the range which is supposed to be a hole and not what we wrote with the previous mmaped write. We're only seeing this exposed now because generic/363 used to run only on xfs until last Sunday's fstests update. The test was failing like this: $ ./check generic/363 FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian0 6.13.0-rc7-btrfs-next-185+ #17 SMP PREEMPT_DYNAMIC Mon Feb 3 12:28:46 WET 2025 MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 generic/363 0s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad) # --- tests/generic/363.out 2025-02-05 15:31:14.013646509 +0000 # +++ /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad 2025-02-05 17:25:33.112630781 +0000 @@ -1 +1,46 @@ QA output created by 363 +READ BAD DATA: offset = 0xdcad, size = 0xd921, fname = /home/fdmanana/btrfs-tests/dev/junk +OFFSET GOOD BAD RANGE +0x1609d 0x0000 0x3104 0x0 +operation# (mod 256) for the bad data may be 4 +0x1609e 0x0000 0x0472 0x1 +operation# (mod 256) for the bad data may be 4 ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/363.out /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad' to see the entire diff) Ran: generic/363 Failures: generic/363 Failed 1 of 1 tests Fixes: 9036c10208e1 ("Btrfs: update hole handling v2") CC: stable@vger.kernel.org Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21orangefs: fix a oob in orangefs_debug_writeMike Marshall
[ Upstream commit f7c848431632598ff9bce57a659db6af60d75b39 ] I got a syzbot report: slab-out-of-bounds Read in orangefs_debug_write... several people suggested fixes, I tested Al Viro's suggestion and made this patch. Signed-off-by: Mike Marshall <hubcap@omnibond.com> Reported-by: syzbot+fc519d7875f2d9186c1f@syzkaller.appspotmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21NFS: Fix potential buffer overflowin nfs_sysfs_link_rpc_client()Zichen Xie
[ Upstream commit 49fd4e34751e90e6df009b70cd0659dc839e7ca8 ] name is char[64] where the size of clnt->cl_program->name remains unknown. Invoking strcat() directly will also lead to potential buffer overflow. Change them to strscpy() and strncat() to fix potential issues. Signed-off-by: Zichen Xie <zichenxie0106@gmail.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21NFSD: fix hang in nfsd4_shutdown_callbackDai Ngo
commit 036ac2778f7b28885814c6fbc07e156ad1624d03 upstream. If nfs4_client is in courtesy state then there is no point to send the callback. This causes nfsd4_shutdown_callback to hang since cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP notifies NFSD that the connection was dropped. This patch modifies nfsd4_run_cb_work to skip the RPC call if nfs4_client is in courtesy state. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Fixes: 66af25799940 ("NFSD: add courteous server support for thread with only delegation") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21nfsd: clear acl_access/acl_default after releasing themLi Lingfeng
commit 7faf14a7b0366f153284db0ad3347c457ea70136 upstream. If getting acl_default fails, acl_access and acl_default will be released simultaneously. However, acl_access will still retain a pointer pointing to the released posix_acl, which will trigger a WARNING in nfs3svc_release_getacl like this: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 26 PID: 3199 at lib/refcount.c:28 refcount_warn_saturate+0xb5/0x170 Modules linked in: CPU: 26 UID: 0 PID: 3199 Comm: nfsd Not tainted 6.12.0-rc6-00079-g04ae226af01f-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb5/0x170 Code: cc cc 0f b6 1d b3 20 a5 03 80 fb 01 0f 87 65 48 d8 00 83 e3 01 75 e4 48 c7 c7 c0 3b 9b 85 c6 05 97 20 a5 03 01 e8 fb 3e 30 ff <0f> 0b eb cd 0f b6 1d 8a3 RSP: 0018:ffffc90008637cd8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83904fde RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88871ed36380 RBP: ffff888158beeb40 R08: 0000000000000001 R09: fffff520010c6f56 R10: ffffc90008637ab7 R11: 0000000000000001 R12: 0000000000000001 R13: ffff888140e77400 R14: ffff888140e77408 R15: ffffffff858b42c0 FS: 0000000000000000(0000) GS:ffff88871ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562384d32158 CR3: 000000055cc6a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? refcount_warn_saturate+0xb5/0x170 ? __warn+0xa5/0x140 ? refcount_warn_saturate+0xb5/0x170 ? report_bug+0x1b1/0x1e0 ? handle_bug+0x53/0xa0 ? exc_invalid_op+0x17/0x40 ? asm_exc_invalid_op+0x1a/0x20 ? tick_nohz_tick_stopped+0x1e/0x40 ? refcount_warn_saturate+0xb5/0x170 ? refcount_warn_saturate+0xb5/0x170 nfs3svc_release_getacl+0xc9/0xe0 svc_process_common+0x5db/0xb60 ? __pfx_svc_process_common+0x10/0x10 ? __rcu_read_unlock+0x69/0xa0 ? __pfx_nfsd_dispatch+0x10/0x10 ? svc_xprt_received+0xa1/0x120 ? xdr_init_decode+0x11d/0x190 svc_process+0x2a7/0x330 svc_handle_xprt+0x69d/0x940 svc_recv+0x180/0x2d0 nfsd+0x168/0x200 ? __pfx_nfsd+0x10/0x10 kthread+0x1a2/0x1e0 ? kthread+0xf4/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Kernel panic - not syncing: kernel: panic_on_warn set ... Clear acl_access/acl_default after posix_acl_release is called to prevent UAF from being triggered. Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241107014705.2509463-1-lilingfeng@huaweicloud.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Rick Macklem <rmacklem@uoguelph.ca> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17cachefiles: Fix NULL pointer dereference in object->fileZizhi Wo
commit 31ad74b20227ce6b40910ff78b1c604e42975cf1 upstream. At present, the object->file has the NULL pointer dereference problem in ondemand-mode. The root cause is that the allocated fd and object->file lifetime are inconsistent, and the user-space invocation to anon_fd uses object->file. Following is the process that triggers the issue: [write fd] [umount] cachefiles_ondemand_fd_write_iter fscache_cookie_state_machine cachefiles_withdraw_cookie if (!file) return -ENOBUFS cachefiles_clean_up_object cachefiles_unmark_inode_in_use fput(object->file) object->file = NULL // file NULL pointer dereference! __cachefiles_write(..., file, ...) Fix this issue by add an additional reference count to the object->file before write/llseek, and decrement after it finished. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Link: https://lore.kernel.org/r/20241107110649.3980193-5-wozizhi@huawei.com Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Bin Lan <lanbincn@qq.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17btrfs: avoid monopolizing a core when activating a swap fileFilipe Manana
commit 2c8507c63f5498d4ee4af404a8e44ceae4345056 upstream. This commit re-attempts the backport of the change to the linux-6.6.y branch. Commit 6e1a82259307 ("btrfs: avoid monopolizing a core when activating a swap file") on this branch was reverted. During swap activation we iterate over the extents of a file and we can have many thousands of them, so we can end up in a busy loop monopolizing a core. Avoid this by doing a voluntary reschedule after processing each extent. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17Revert "btrfs: avoid monopolizing a core when activating a swap file"Koichiro Den
This reverts commit 6e1a8225930719a9f352d56320214e33e2dde0a6. The backport for linux-6.6.y, commit 6e1a82259307 ("btrfs: avoid monopolizing a core when activating a swap file"), inserted cond_resched() in the wrong location. Revert it now; a subsequent commit will re-backport the original patch. Fixes: 6e1a82259307 ("btrfs: avoid monopolizing a core when activating a swap file") # linux-6.6.y Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17ocfs2: check dir i_size in ocfs2_find_entrySu Yue
commit b0fce54b8c0d8e5f2b4c243c803c5996e73baee8 upstream. syz reports an out of bounds read: ================================================================== BUG: KASAN: slab-out-of-bounds in ocfs2_match fs/ocfs2/dir.c:334 [inline] BUG: KASAN: slab-out-of-bounds in ocfs2_search_dirblock+0x283/0x6e0 fs/ocfs2/dir.c:367 Read of size 1 at addr ffff88804d8b9982 by task syz-executor.2/14802 CPU: 0 UID: 0 PID: 14802 Comm: syz-executor.2 Not tainted 6.13.0-rc4 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Sched_ext: serialise (enabled+all), task: runnable_at=-10ms Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x229/0x350 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x164/0x530 mm/kasan/report.c:489 kasan_report+0x147/0x180 mm/kasan/report.c:602 ocfs2_match fs/ocfs2/dir.c:334 [inline] ocfs2_search_dirblock+0x283/0x6e0 fs/ocfs2/dir.c:367 ocfs2_find_entry_id fs/ocfs2/dir.c:414 [inline] ocfs2_find_entry+0x1143/0x2db0 fs/ocfs2/dir.c:1078 ocfs2_find_files_on_disk+0x18e/0x530 fs/ocfs2/dir.c:1981 ocfs2_lookup_ino_from_name+0xb6/0x110 fs/ocfs2/dir.c:2003 ocfs2_lookup+0x30a/0xd40 fs/ocfs2/namei.c:122 lookup_open fs/namei.c:3627 [inline] open_last_lookups fs/namei.c:3748 [inline] path_openat+0x145a/0x3870 fs/namei.c:3984 do_filp_open+0xe9/0x1c0 fs/namei.c:4014 do_sys_openat2+0x135/0x1d0 fs/open.c:1402 do_sys_open fs/open.c:1417 [inline] __do_sys_openat fs/open.c:1433 [inline] __se_sys_openat fs/open.c:1428 [inline] __x64_sys_openat+0x15d/0x1c0 fs/open.c:1428 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf6/0x210 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f01076903ad Code: c3 e8 a7 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f01084acfc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 00007f01077cbf80 RCX: 00007f01076903ad RDX: 0000000000105042 RSI: 0000000020000080 RDI: ffffffffffffff9c RBP: 00007f01077cbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000001ff R11: 0000000000000246 R12: 0000000000000000 R13: 00007f01077cbf80 R14: 00007f010764fc90 R15: 00007f010848d000 </TASK> ================================================================== And a general protection fault in ocfs2_prepare_dir_for_insert: ================================================================== loop0: detected capacity change from 0 to 32768 JBD2: Ignoring recovery information on journal ocfs2: Mounting device (7,0) on (node local, slot 0) with ordered data mode. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 UID: 0 PID: 5096 Comm: syz-executor792 Not tainted 6.11.0-rc4-syzkaller-00002-gb0da640826ba #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:ocfs2_find_dir_space_id fs/ocfs2/dir.c:3406 [inline] RIP: 0010:ocfs2_prepare_dir_for_insert+0x3309/0x5c70 fs/ocfs2/dir.c:4280 Code: 00 00 e8 2a 25 13 fe e9 ba 06 00 00 e8 20 25 13 fe e9 4f 01 00 00 e8 16 25 13 fe 49 8d 7f 08 49 8d 5f 09 48 89 f8 48 c1 e8 03 <42> 0f b6 04 20 84 c0 0f 85 bd 23 00 00 48 89 d8 48 c1 e8 03 42 0f RSP: 0018:ffffc9000af9f020 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000009 RCX: ffff88801e27a440 RDX: 0000000000000000 RSI: 0000000000000400 RDI: 0000000000000008 RBP: ffffc9000af9f830 R08: ffffffff8380395b R09: ffffffff838090a7 R10: 0000000000000002 R11: ffff88801e27a440 R12: dffffc0000000000 R13: ffff88803c660878 R14: f700000000000088 R15: 0000000000000000 FS: 000055555a677380(0000) GS:ffff888020800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560bce569178 CR3: 000000001de5a000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ocfs2_mknod+0xcaf/0x2b40 fs/ocfs2/namei.c:292 vfs_mknod+0x36d/0x3b0 fs/namei.c:4088 do_mknodat+0x3ec/0x5b0 __do_sys_mknodat fs/namei.c:4166 [inline] __se_sys_mknodat fs/namei.c:4163 [inline] __x64_sys_mknodat+0xa7/0xc0 fs/namei.c:4163 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f2dafda3a99 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe336a6658 EFLAGS: 00000246 ORIG_RAX: 0000000000000103 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2dafda3a99 RDX: 00000000000021c0 RSI: 0000000020000040 RDI: 00000000ffffff9c RBP: 00007f2dafe1b5f0 R08: 0000000000004480 R09: 000055555a6784c0 R10: 0000000000000103 R11: 0000000000000246 R12: 00007ffe336a6680 R13: 00007ffe336a68a8 R14: 431bde82d7b634db R15: 00007f2dafdec03b </TASK> ================================================================== The two reports are all caused invalid negative i_size of dir inode. For ocfs2, dir_inode can't be negative or zero. Here add a check in which is called by ocfs2_check_dir_for_entry(). It fixes the second report as ocfs2_check_dir_for_entry() must be called before ocfs2_prepare_dir_for_insert(). Also set a up limit for dir with OCFS2_INLINE_DATA_FL. The i_size can't be great than blocksize. Link: https://lkml.kernel.org/r/20250106140640.92260-1-glass.su@suse.com Reported-by: Jiacheng Xu <stitch@zju.edu.cn> Link: https://lore.kernel.org/ocfs2-devel/17a04f01.1ae74.19436d003fc.Coremail.stitch@zju.edu.cn/T/#u Reported-by: syzbot+5a64828fcc4c2ad9b04f@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/0000000000005894f3062018caf1@google.com/T/ Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17nilfs2: fix possible int overflows in nilfs_fiemap()Nikita Zhandarovich
commit 6438ef381c183444f7f9d1de18f22661cba1e946 upstream. Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result by being prepared to go through potentially maxblocks == INT_MAX blocks, the value in n may experience an overflow caused by left shift of blkbits. While it is extremely unlikely to occur, play it safe and cast right hand expression to wider type to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com Fixes: 622daaff0a89 ("nilfs2: fiemap support") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17ocfs2: handle a symlink read error correctlyMatthew Wilcox (Oracle)
commit 2b4c2094da6d84e69b843dd3317902e977bf64bd upstream. Patch series "Convert ocfs2 to use folios". Mark did a conversion of ocfs2 to use folios and sent it to me as a giant patch for review ;-) So I've redone it as individual patches, and credited Mark for the patches where his code is substantially the same. It's not a bad way to do it; his patch had some bugs and my patches had some bugs. Hopefully all our bugs were different from each other. And hopefully Mark likes all the changes I made to his code! This patch (of 23): If we can't read the buffer, be sure to unlock the page before returning. Link: https://lkml.kernel.org/r/20241205171653.3179945-1-willy@infradead.org Link: https://lkml.kernel.org/r/20241205171653.3179945-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Tinguely <mark.tinguely@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17ocfs2: fix incorrect CPU endianness conversion causing mount failureHeming Zhao
commit f921da2c34692dfec5f72b5ae347b1bea22bb369 upstream. Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") introduced a regression bug. The blksz_bits value is already converted to CPU endian in the previous code; therefore, the code shouldn't use le32_to_cpu() anymore. Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17pnfs/flexfiles: retry getting layout segment for readsMike Snitzer
commit eb3fabde15bccdf34f1c9b35a83aa4c0dacbb4ca upstream. If ff_layout_pg_get_read()'s attempt to get a layout segment results in -EAGAIN have ff_layout_pg_init_read() retry it after sleeping. If "softerr" mount is used, use 'io_maxretrans' to limit the number of attempts to get a layout segment. This fixes a long-standing issue of O_DIRECT reads failing with -EAGAIN (11) when using flexfiles Client Side Mirroring (CSM). Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17xfs: Add error handling for xfs_reflink_cancel_cow_rangeWentao Liang
commit 26b63bee2f6e711c5a169997fd126fddcfb90848 upstream. In xfs_inactive(), xfs_reflink_cancel_cow_range() is called without error handling, risking unnoticed failures and inconsistent behavior compared to other parts of the code. Fix this issue by adding an error handling for the xfs_reflink_cancel_cow_range(), improving code robustness. Fixes: 6231848c3aa5 ("xfs: check for cow blocks before trying to clear them") Cc: stable@vger.kernel.org # v4.17 Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17xfs: Propagate errors from xfs_reflink_cancel_cow_range in ↵Wentao Liang
xfs_dax_write_iomap_end commit fb95897b8c60653805aa09daec575ca30983f768 upstream. In xfs_dax_write_iomap_end(), directly return the result of xfs_reflink_cancel_cow_range() when !written, ensuring proper error propagation and improving code robustness. Fixes: ea6c49b784f0 ("xfs: support CoW in fsdax mode") Cc: stable@vger.kernel.org # v6.0 Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17ksmbd: fix integer overflows on 32 bit systemsDan Carpenter
commit aab98e2dbd648510f8f51b83fbf4721206ccae45 upstream. On 32bit systems the addition operations in ipc_msg_alloc() can potentially overflow leading to memory corruption. Add bounds checking using KSMBD_IPC_MAX_PAYLOAD to avoid overflow. Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17binfmt_flat: Fix integer overflow bug on 32 bit systemsDan Carpenter
commit 55cf2f4b945f6a6416cc2524ba740b83cc9af25a upstream. Most of these sizes and counts are capped at 256MB so the math doesn't result in an integer overflow. The "relocs" count needs to be checked as well. Otherwise on 32bit systems the calculation of "full_data" could be wrong. full_data = data_len + relocs * sizeof(unsigned long); Fixes: c995ee28d29d ("binfmt_flat: prevent kernel dammage from corrupted executable headers") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Nicolas Pitre <npitre@baylibre.com> Link: https://lore.kernel.org/r/5be17f6c-5338-43be-91ef-650153b975cb@stanley.mountain Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>