summaryrefslogtreecommitdiff
path: root/fs/ext4/super.c
AgeCommit message (Collapse)Author
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-05-20ext4: Enable support for ext4 multi-fsblock atomic write using bigallocRitesh Harjani (IBM)
Last couple of patches added the needed support for multi-fsblock atomic writes using bigalloc. This patch ensures that filesystem advertizes the needed atomic write unit min and max values for enabling multi-fsblock atomic write support with bigalloc. Acked-by: Darrick J. Wong <djwong@kernel.org> Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/5e45d7ed24499024b9079436ba6698dae5298e29.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: remove sb argument from ext4_superblock_csum()Eric Biggers
Since ext4_superblock_csum() no longer uses its sb argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-3-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: remove sbi argument from ext4_chksum()Eric Biggers
Since ext4_chksum() no longer uses its sbi argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-2-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: use writeback_iter in ext4_journalled_submit_inode_data_buffersChristoph Hellwig
Use writeback_iter directly instead of write_cache_pages for a nicer code structure and less indirect calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250505091604.3449879-1-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-08ext4: convert s_fc_lock to mutex typeHarshad Shirwadkar
This allows us to hold s_fc_lock during kmem_cache_* functions, which is needed in the following patch. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250508175908.1004880-9-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-08ext4: temporarily elevate commit thread priorityHarshad Shirwadkar
Unlike JBD2 based full commits, there is no dedicated journal thread for fast commits. Thus to reduce scheduling delays between IO submission and completion, temporarily elevate the committer thread's priority to match the configured priority of the JBD2 journal thread. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250508175908.1004880-8-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-08ext4: convert i_fc_lock to spinlockHarshad Shirwadkar
Convert ext4_inode_info->i_fc_lock to spinlock to avoid sleeping in invalid contexts. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://patch.msgid.link/20250508175908.1004880-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-21ext4: on a remount, only log the ro or r/w state when it has changedNicolas Bretz
A user complained that a message such as: EXT4-fs (nvme0n1p3): re-mounted UUID ro. Quota mode: none. implied that the file system was previously mounted read/write and was now remounted read-only, when it could have been some other mount state that had changed by the "mount -o remount" operation. Fix this by only logging "ro"or "r/w" when it has changed. https://bugzilla.kernel.org/show_bug.cgi?id=219132 Signed-off-by: Nicolas Bretz <bretznic@gmail.com> Link: https://patch.msgid.link/20250319171011.8372-1-bretznic@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-21ext4: Make sb update interval tunableOjaswin Mujoo
Currently, outside error paths, we auto commit the super block after 1 hour has passed and 16MB worth of updates have been written since last commit. This is a policy decision so make this tunable while keeping the defaults same. This is useful if user wants to tweak the superblock behavior or for debugging the codepath by allowing to trigger it more frequently. We can now tweak the super block update using sb_update_sec and sb_update_kb files in /sys/fs/ext4/<dev>/ Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/950fb8c9b2905620e16f02a3b9eeea5a5b6cb87e.1742279837.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-21ext4: avoid journaling sb update on error if journal is destroyingOjaswin Mujoo
Presently we always BUG_ON if trying to start a transaction on a journal marked with JBD2_UNMOUNT, since this should never happen. However, while ltp running stress tests, it was observed that in case of some error handling paths, it is possible for update_super_work to start a transaction after the journal is destroyed eg: (umount) ext4_kill_sb kill_block_super generic_shutdown_super sync_filesystem /* commits all txns */ evict_inodes /* might start a new txn */ ext4_put_super flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */ jbd2_journal_destroy journal_kill_thread journal->j_flags |= JBD2_UNMOUNT; jbd2_journal_commit_transaction jbd2_journal_get_descriptor_buffer jbd2_journal_bmap ext4_journal_bmap ext4_map_blocks ... ext4_inode_error ext4_handle_error schedule_work(&sbi->s_sb_upd_work) /* work queue kicks in */ update_super_work jbd2_journal_start start_this_handle BUG_ON(journal->j_flags & JBD2_UNMOUNT) Hence, introduce a new mount flag to indicate journal is destroying and only do a journaled (and deferred) update of sb if this flag is not set. Otherwise, just fallback to an un-journaled commit. Further, in the journal destroy path, we have the following sequence: 1. Set mount flag indicating journal is destroying 2. force a commit and wait for it 3. flush pending sb updates This sequence is important as it ensures that, after this point, there is no sb update that might be journaled so it is safe to update the sb outside the journal. (To avoid race discussed in 2d01ddc86606) Also, we don't need a similar check in ext4_grp_locked_error since it is only called from mballoc and AFAICT it would be always valid to schedule work here. Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") Reported-by: Mahesh Kumar <maheshkumar657g@gmail.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/9613c465d6ff00cd315602f99283d5f24018c3f7.1742279837.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-21ext4: define ext4_journal_destroy wrapperOjaswin Mujoo
Define an ext4 wrapper over jbd2_journal_destroy to make sure we have consistent behavior during journal destruction. This will also come useful in the next patch where we add some ext4 specific logic in the destroy path. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/c3ba78c5c419757e6d5f2d8ebb4a8ce9d21da86a.1742279837.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-20ext4: don't over-report free space or inodes in statvfsTheodore Ts'o
This fixes an analogus bug that was fixed in xfs in commit 4b8d867ca6e2 ("xfs: don't over-report free space or inodes in statvfs") where statfs can report misleading / incorrect information where project quota is enabled, and the free space is less than the remaining quota. This commit will resolve a test failure in generic/762 which tests for this bug. Cc: stable@kernel.org Fixes: 689c958cbe6b ("ext4: add project quota support") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2025-03-18ext4: clear DISCARD flag if device does not support discardDiangang Li
commit 79add3a3f795e ("ext4: notify when discard is not supported") noted that keeping the DISCARD flag is for possibility that the underlying device might change in future even without file system remount. However, this scenario has rarely occurred in practice on the device side. Even if it does occur, it can be resolved with remount. Clearing the DISCARD flag not only prevents confusion caused by mount options but also avoids sending unnecessary discard commands. Signed-off-by: Diangang Li <lidiangang@bytedance.com> Link: https://patch.msgid.link/20250311021310.669524-1-lidiangang@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18ext4: remove references to bh->b_pageMatthew Wilcox (Oracle)
Buffer heads are attached to folios, not to pages. Also flush_dcache_page() is now deprecated in favour of flush_dcache_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250213182303.2133205-1-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-17ext4: remove redundant function ext4_has_metadata_csumEric Biggers
Since commit f2b4fa19647e ("ext4: switch to using the crc32c library"), ext4_has_metadata_csum() is just an alias for ext4_has_feature_metadata_csum(). ext4_has_feature_metadata_csum() is generated by EXT4_FEATURE_RO_COMPAT_FUNCS and uses the regular naming convention for checking a single ext4 feature. Therefore, remove ext4_has_metadata_csum() and update all its callers to use ext4_has_feature_metadata_csum() directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20250207031335.42637-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: show 'shutdown' hint when ext4 is forced to shutdownBaokun Li
Now, if dmesg is cleared, we have no way of knowing if the file system has been shutdown. Moreover, ext4 allows directory reads even after the file system has been shutdown, so when reading a file returns -EIO, we cannot determine whether this is a hardware issue or if the file system has been shutdown. Therefore, when ext4 file system is shutdown, we're adding a 'shutdown' hint to commands like mount so users can easily check the file system's status. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-8-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: show 'emergency_ro' when EXT4_FLAGS_EMERGENCY_RO is setBaokun Li
After commit d3476f3dad4a ("ext4: don't set SB_RDONLY after filesystem errors") in v6.12-rc1, the 'errors=remount-ro' mode no longer sets SB_RDONLY on errors, which results in us seeing the filesystem is still in rw state after errors. Therefore, after setting EXT4_FLAGS_EMERGENCY_RO, display the emergency_ro option so that users can query whether the current file system has become emergency read-only due to errors through commands such as 'mount' or 'cat /proc/fs/ext4/sdx/options'. Fixes: d3476f3dad4a ("ext4: don't set SB_RDONLY after filesystem errors") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: correct behavior under errors=remount-ro modeBaokun Li
And after commit 95257987a638 ("ext4: drop EXT4_MF_FS_ABORTED flag") in v6.6-rc1, the EXT4_FLAGS_SHUTDOWN bit is set in ext4_handle_error() under errors=remount-ro mode. This causes the read to fail even when the error is triggered in errors=remount-ro mode. To correct the behavior under errors=remount-ro, EXT4_FLAGS_SHUTDOWN is replaced by the newly introduced EXT4_FLAGS_EMERGENCY_RO. This new flag only prevents writes, matching the previous behavior with SB_RDONLY. Fixes: 95257987a638 ("ext4: drop EXT4_MF_FS_ABORTED flag") Closes: https://lore.kernel.org/all/22d652f6-cb3c-43f5-b2fe-0a4bb6516a04@huawei.com/ Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122114130.229709-6-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: add more ext4_emergency_state() checks around sb_rdonly()Baokun Li
Some functions check sb_rdonly() to make sure the file system isn't modified after it's read-only. Since we also don't want the file system modified if it's in an emergency state (shutdown or emergency_ro), we're adding additional ext4_emergency_state() checks where sb_rdonly() is checked. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122114130.229709-5-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: add ext4_emergency_state() helper functionBaokun Li
Since both SHUTDOWN and EMERGENCY_RO are emergency states of the ext4 file system, and they are checked in similar locations, we have added a helper function, ext4_emergency_state(), to determine whether the current file system is in one of these two emergency states. Then, replace calls to ext4_forced_shutdown() with ext4_emergency_state() in those functions that could potentially trigger write operations. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: remove unused member 'i_unwritten' from 'ext4_inode_info'Baokun Li
After commit 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure"), no one cares about the value of i_unwritten, so there is no need to maintain this variable, remove it, and clean up the associated logic. Suggested-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122110533.4116662-9-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13jbd2: drop JBD2_ABORT_ON_SYNCDATA_ERRBaokun Li
Since ext4's data_err=abort mode doesn't depend on JBD2_ABORT_ON_SYNCDATA_ERR anymore, and nobody else uses it, we can drop it and only warn in jbd2 as it used to be long ago. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122110533.4116662-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: extract ext4_has_journal_option() from __ext4_fill_super()Baokun Li
Extract the ext4_has_journal_option() helper function to reduce code duplication. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122110533.4116662-5-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13ext4: reject the 'data_err=abort' option in nojournal modeBaokun Li
data_err=abort aborts the journal on I/O errors. However, this option is meaningless if journal is disabled, so it is rejected in nojournal mode to reduce unnecessary checks. Also, this option is ignored upon remount. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250122110533.4116662-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-05ext4: protect ext4_release_dquot against freezingOjaswin Mujoo
Protect ext4_release_dquot against freezing so that we don't try to start a transaction when FS is frozen, leading to warnings. Further, avoid taking the freeze protection if a transaction is already running so that we don't need end up in a deadlock as described in 46e294efc355 ext4: fix deadlock with fs freezing and EA inodes Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241121123855.645335-3-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-02-10ext4: remove unneeded forward declarationKemeng Shi
Remove unneeded forward declaration of ext4_destroy_lazyinit_thread(). Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241218145414.1422946-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-02-10ext4: remove unused ext4 journal callbackKemeng Shi
Remove unused ext4 journal callback. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241218145414.1422946-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-01-23Merge tag 'fsnotify_hsm_for_v6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify pre-content notification support from Jan Kara: "This introduces a new fsnotify event (FS_PRE_ACCESS) that gets generated before a file contents is accessed. The event is synchronous so if there is listener for this event, the kernel waits for reply. On success the execution continues as usual, on failure we propagate the error to userspace. This allows userspace to fill in file content on demand from slow storage. The context in which the events are generated has been picked so that we don't hold any locks and thus there's no risk of a deadlock for the userspace handler. The new pre-content event is available only for users with global CAP_SYS_ADMIN capability (similarly to other parts of fanotify functionality) and it is an administrator responsibility to make sure the userspace event handler doesn't do stupid stuff that can DoS the system. Based on your feedback from the last submission, fsnotify code has been improved and now file->f_mode encodes whether pre-content event needs to be generated for the file so the fast path when nobody wants pre-content event for the file just grows the additional file->f_mode check. As a bonus this also removes the checks whether the old FS_ACCESS event needs to be generated from the fast path. Also the place where the event is generated during page fault has been moved so now filemap_fault() generates the event if and only if there is no uptodate folio in the page cache. Also we have dropped FS_PRE_MODIFY event as current real-world users of the pre-content functionality don't really use it so let's start with the minimal useful feature set" * tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (21 commits) fanotify: Fix crash in fanotify_init(2) fs: don't block write during exec on pre-content watched files fs: enable pre-content events on supported file systems ext4: add pre-content fsnotify hook for DAX faults btrfs: disable defrag on pre-content watched files xfs: add pre-content fsnotify hook for DAX faults fsnotify: generate pre-content permission event on page fault mm: don't allow huge faults for files with pre content watches fanotify: disable readahead if we have pre-content watches fanotify: allow to set errno in FAN_DENY permission response fanotify: report file range info with pre-content events fanotify: introduce FAN_PRE_ACCESS permission event fsnotify: generate pre-content permission event on truncate fsnotify: pass optional file access range in pre-content event fsnotify: introduce pre-content permission events fanotify: reserve event bit of deprecated FAN_DIR_MODIFY fanotify: rename a misnamed constant fanotify: don't skip extra event info if no info_mode is set fsnotify: check if file is actually being watched for pre-content events on open fsnotify: opt-in for permission events at file open time ...
2024-12-11fs: enable pre-content events on supported file systemsJosef Bacik
Now that all the code has been added for pre-content events, and the various file systems that need the page fault hooks for fsnotify have been updated, add SB_I_ALLOW_HSM to the supported file systems. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/46960dcb2725fa0317895ed66a8409ba1c306a82.1731684329.git.josef@toxicpanda.com
2024-12-01ext4: switch to using the crc32c libraryEric Biggers
Now that the crc32c() library function directly takes advantage of architecture-specific optimizations, it is unnecessary to go through the crypto API. Just use crc32c(). This is much simpler, and it improves performance due to eliminating the crypto API overhead. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20241202010844.144356-17-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-11-18Merge tag 'ext4_for_linus-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most notably in the journaling code, bufered I/O, and compiler warning cleanups" * tag 'ext4_for_linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) jbd2: Fix comment describing journal_init_common() ext4: prevent an infinite loop in the lazyinit thread ext4: use struct_size() to improve ext4_htree_store_dirent() ext4: annotate struct fname with __counted_by() jbd2: avoid dozens of -Wflex-array-member-not-at-end warnings ext4: use str_yes_no() helper function ext4: prevent delalloc to nodelalloc on remount jbd2: make b_frozen_data allocation always succeed ext4: cleanup variable name in ext4_fc_del() ext4: use string choices helpers jbd2: remove the 'success' parameter from the jbd2_do_replay() function jbd2: remove useless 'block_error' variable jbd2: factor out jbd2_do_replay() jbd2: refactor JBD2_COMMIT_BLOCK process in do_one_pass() jbd2: unified release of buffer_head in do_one_pass() jbd2: remove redundant judgments for check v1 checksum ext4: use ERR_CAST to return an error-valued pointer mm: zero range of eof folio exposed by inode size extension ext4: partial zero eof block on unaligned inode size extension ext4: disambiguate the return value of ext4_dio_write_end_io() ...
2024-11-18Merge tag 'vfs-6.13.untorn.writes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs untorn write support from Christian Brauner: "An atomic write is a write issed with torn-write protection. This means for a power failure or any hardware failure all or none of the data from the write will be stored, never a mix of old and new data. This work is already supported for block devices. If a block device is opened with O_DIRECT and the block device supports atomic write, then FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block device. This contains the work to expand atomic write support to filesystems, specifically ext4 and XFS. Currently, only support for writing exactly one filesystem block atomically is added. Since it's now possible to have filesystem block size > page size for XFS, it's possible to write 4K+ blocks atomically on x86" * tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: drop an obsolete comment in iomap_dio_bio_iter ext4: Do not fallback to buffered-io for DIO atomic write ext4: Support setting FMODE_CAN_ATOMIC_WRITE ext4: Check for atomic writes support in write iter ext4: Add statx support for atomic writes xfs: Support setting FMODE_CAN_ATOMIC_WRITE xfs: Validate atomic writes xfs: Support atomic write for statx fs: iomap: Atomic write support fs: Export generic_atomic_write_valid() block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid()
2024-11-13ext4: prevent an infinite loop in the lazyinit threadMathieu Othacehe
Use ktime_get_ns instead of ktime_get_real_ns when computing the lr_timeout not to be affected by system time jumps. Use a boolean instead of the MAX_JIFFY_OFFSET value to determine whether the next_wakeup value has been set. Comparing elr->lr_next_sched to MAX_JIFFY_OFFSET can cause the lazyinit thread to loop indefinitely. Co-developed-by: Lukas Skupinski <lukas.skupinski@landisgyr.com> Signed-off-by: Lukas Skupinski <lukas.skupinski@landisgyr.com> Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241106134741.26948-2-othacehe@gnu.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12ext4: prevent delalloc to nodelalloc on remountNicolas Bretz
Implemented the suggested solution mentioned in the bug https://bugzilla.kernel.org/show_bug.cgi?id=218820 Preventing the disabling of delayed allocation mode on remount. delalloc to nodelalloc not permitted anymore nodelalloc to delalloc permitted, not affected Signed-off-by: Nicolas Bretz <bretznic@gmail.com> Link: https://patch.msgid.link/20241014034143.59779-1-bretznic@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12ext4: fix race in buffer_head read fault injectionLong Li
When I enabled ext4 debug for fault injection testing, I encountered the following warning: EXT4-fs error (device sda): ext4_read_inode_bitmap:201: comm fsstress: Cannot read inode bitmap - block_group = 8, inode_bitmap = 1051 WARNING: CPU: 0 PID: 511 at fs/buffer.c:1181 mark_buffer_dirty+0x1b3/0x1d0 The root cause of the issue lies in the improper implementation of ext4's buffer_head read fault injection. The actual completion of buffer_head read and the buffer_head fault injection are not atomic, which can lead to the uptodate flag being cleared on normally used buffer_heads in race conditions. [CPU0] [CPU1] [CPU2] ext4_read_inode_bitmap ext4_read_bh() <bh read complete> ext4_read_inode_bitmap if (buffer_uptodate(bh)) return bh jbd2_journal_commit_transaction __jbd2_journal_refile_buffer __jbd2_journal_unfile_buffer __jbd2_journal_temp_unlink_buffer ext4_simulate_fail_bh() clear_buffer_uptodate mark_buffer_dirty <report warning> WARN_ON_ONCE(!buffer_uptodate(bh)) The best approach would be to perform fault injection in the IO completion callback function, rather than after IO completion. However, the IO completion callback function cannot get the fault injection code in sb. Fix it by passing the result of fault injection into the bh read function, we simulate faults within the bh read function itself. This requires adding an extra parameter to the bh read functions that need fault injection. Fixes: 46f870d690fe ("ext4: simulate various I/O and checksum errors when reading metadata") Signed-off-by: Long Li <leo.lilong@huawei.com> Link: https://patch.msgid.link/20240906091746.510163-1-leo.lilong@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12ext4: mark ctx_*_flags() with __maybe_unusedAndy Shevchenko
When ctx_set_flags() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y: .../ext4/super.c:2120:1: error: unused function 'ctx_set_flags' [-Werror,-Wunused-function] 2120 | EXT4_SET_CTX(flags); /* set only */ | ^~~~~~~~~~~~~~~~~~~ Fix this by marking ctx_*_flags() with __maybe_unused (mark both for the sake of symmetry). See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build"). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20240905163229.140522-1-andriy.shevchenko@linux.intel.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12ext4: return error on syncfs after shutdownAmir Goldstein
This is the logic behavior and one that we would like to verify using a generic fstest similar to xfs/546. Link: https://lore.kernel.org/fstests/20240830152648.GE6216@frogsfrogsfrogs/ Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240904084657.1062243-1-amir73il@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12ext4: show the default enabled prefetch_block_bitmaps optionBaokun Li
After commit 21175ca434c5 ("ext4: make prefetch_block_bitmaps default"), we enable 'prefetch_block_bitmaps' by default, but this is not shown in the '/proc/fs/ext4/sdx/options' procfs interface. This makes it impossible to distinguish whether the feature is enabled by default or not, so 'prefetch_block_bitmaps' is shown in the 'options' procfs interface when prefetch_block_bitmaps is enabled by default. This makes it easy to notice changes to the default mount options between versions through the '/proc/fs/ext4/sdx/options' procfs interface. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241008120134.3758097-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-05ext4: Add statx support for atomic writesRitesh Harjani (IBM)
This patch adds base support for atomic writes via statx getattr. On bs < ps systems, we can create FS with say bs of 16k. That means both atomic write min and max unit can be set to 16k for supporting atomic writes. Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz>
2024-10-30ext4: avoid remount errors with 'abort' mount optionJan Kara
When we remount filesystem with 'abort' mount option while changing other mount options as well (as is LTP test doing), we can return error from the system call after commit d3476f3dad4a ("ext4: don't set SB_RDONLY after filesystem errors") because the application of mount option changes detects shutdown filesystem and refuses to do anything. The behavior of application of other mount options in presence of 'abort' mount option is currently rather arbitary as some mount option changes are handled before 'abort' and some after it. Move aborting of the filesystem to the end of remount handling so all requested changes are properly applied before the filesystem is shutdown to have a reasonably consistent behavior. Fixes: d3476f3dad4a ("ext4: don't set SB_RDONLY after filesystem errors") Reported-by: Jan Stancek <jstancek@redhat.com> Link: https://lore.kernel.org/all/Zvp6L+oFnfASaoHl@t14s Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Jan Stancek <jstancek@redhat.com> Link: https://patch.msgid.link/20241004221556.19222-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-10-30ext4: supress data-race warnings in ext4_free_inodes_{count,set}()Jeongjun Park
find_group_other() and find_group_orlov() read *_lo, *_hi with ext4_free_inodes_count without additional locking. This can cause data-race warning, but since the lock is held for most writes and free inodes value is generally not a problem even if it is incorrect, it is more appropriate to use READ_ONCE()/WRITE_ONCE() than to add locking. ================================================================== BUG: KCSAN: data-race in ext4_free_inodes_count / ext4_free_inodes_set write to 0xffff88810404300e of 2 bytes by task 6254 on cpu 1: ext4_free_inodes_set+0x1f/0x80 fs/ext4/super.c:405 __ext4_new_inode+0x15ca/0x2200 fs/ext4/ialloc.c:1216 ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391 vfs_symlink+0xca/0x1d0 fs/namei.c:4615 do_symlinkat+0xe3/0x340 fs/namei.c:4641 __do_sys_symlinkat fs/namei.c:4657 [inline] __se_sys_symlinkat fs/namei.c:4654 [inline] __x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654 x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff88810404300e of 2 bytes by task 6257 on cpu 0: ext4_free_inodes_count+0x1c/0x80 fs/ext4/super.c:349 find_group_other fs/ext4/ialloc.c:594 [inline] __ext4_new_inode+0x6ec/0x2200 fs/ext4/ialloc.c:1017 ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391 vfs_symlink+0xca/0x1d0 fs/namei.c:4615 do_symlinkat+0xe3/0x340 fs/namei.c:4641 __do_sys_symlinkat fs/namei.c:4657 [inline] __se_sys_symlinkat fs/namei.c:4654 [inline] __x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654 x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e Cc: stable@vger.kernel.org Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20241003125337.47283-1-aha310510@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-10-10Merge patch series "timekeeping/fs: multigrain timestamp redux"Christian Brauner
Jeff Layton <jlayton@kernel.org> says: The VFS has always used coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide when to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. What we need is a way to only use fine-grained timestamps when they are being actively queried. Use the (unused) top bit in inode->i_ctime_nsec as a flag that indicates whether the current timestamps have been queried via stat() or the like. When it's set, we allow the kernel to use a fine-grained timestamp iff it's necessary to make the ctime show a different value. This solves the problem of being able to distinguish the timestamp between updates, but introduces a new problem: it's now possible for a file being changed to get a fine-grained timestamp. A file that is altered just a bit later can then get a coarse-grained one that appears older than the earlier fine-grained time. This violates timestamp ordering guarantees. To remedy this, keep a global monotonic atomic64_t value that acts as a timestamp floor. When we go to stamp a file, we first get the latter of the current floor value and the current coarse-grained time. If the inode ctime hasn't been queried then we just attempt to stamp it with that value. If it has been queried, then first see whether the current coarse time is later than the existing ctime. If it is, then we accept that value. If it isn't, then we get a fine-grained time and try to swap that into the global floor. Whether that succeeds or fails, we take the resulting floor time, convert it to realtime and try to swap that into the ctime. We take the result of the ctime swap whether it succeeds or fails, since either is just as valid. Filesystems can opt into this by setting the FS_MGTIME fstype flag. Others should be unaffected (other than being subject to the same floor value as multigrain filesystems). * patches from https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org: tmpfs: add support for multigrain timestamps btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps Documentation: add a new file documenting multigrain timestamps fs: add percpu counters for significant multigrain timestamp events fs: tracepoints around multigrain timestamp events fs: handle delegated timestamps in setattr_copy_mgtime fs: have setattr_copy handle multigrain timestamps appropriately fs: add infrastructure for multigrain timestamps Link: https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-10ext4: switch to multigrain timestampsJeff Layton
Enable multigrain timestamps, which should ensure that there is an apparent change to the timestamp whenever it has been written after being actively observed via getattr. For ext4, we only need to enable the FS_MGTIME flag. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20241002-mgtime-v10-10-d1c4717f5284@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-03ext4: check stripe size compatibility on remount as wellOjaswin Mujoo
We disable stripe size in __ext4_fill_super if it is not a multiple of the cluster ratio however this check is missed when trying to remount. This can leave us with cases where stripe < cluster_ratio after remount:set making EXT4_B2C(sbi->s_stripe) become 0 that can cause some unforeseen bugs like divide by 0. Fix that by adding the check in remount path as well. Reported-by: syzbot+1ad8bac5af24d01e2cbd@syzkaller.appspotmail.com Tested-by: syzbot+1ad8bac5af24d01e2cbd@syzkaller.appspotmail.com Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Fixes: c3defd99d58c ("ext4: treat stripe in block unit") Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/3a493bb503c3598e25dcfbed2936bb2dff3fece7.1725002410.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-03ext4: fix error message when rejecting the default hashGabriel Krisman Bertazi
Commit 985b67cd8639 ("ext4: filesystems without casefold feature cannot be mounted with siphash") properly rejects volumes where s_def_hash_version is set to DX_HASH_SIPHASH, but the check and the error message should not look into casefold setup - a filesystem should never have DX_HASH_SIPHASH as the default hash. Fix it and, since we are there, move the check to ext4_hash_info_init. Fixes:985b67cd8639 ("ext4: filesystems without casefold feature cannot be mounted with siphash") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://patch.msgid.link/87jzg1en6j.fsf_-_@mailhost.krisman.be Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26ext4: don't set SB_RDONLY after filesystem errorsJan Kara
When the filesystem is mounted with errors=remount-ro, we were setting SB_RDONLY flag to stop all filesystem modifications. We knew this misses proper locking (sb->s_umount) and does not go through proper filesystem remount procedure but it has been the way this worked since early ext2 days and it was good enough for catastrophic situation damage mitigation. Recently, syzbot has found a way (see link) to trigger warnings in filesystem freezing because the code got confused by SB_RDONLY changing under its hands. Since these days we set EXT4_FLAGS_SHUTDOWN on the superblock which is enough to stop all filesystem modifications, modifying SB_RDONLY shouldn't be needed. So stop doing that. Link: https://lore.kernel.org/all/000000000000b90a8e061e21d12f@google.com Reported-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://patch.msgid.link/20240805201241.27286-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26ext4: fix access to uninitialised lock in fc replay pathLuis Henriques (SUSE)
The following kernel trace can be triggered with fstest generic/629 when executed against a filesystem with fast-commit feature enabled: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x66/0x90 register_lock_class+0x759/0x7d0 __lock_acquire+0x85/0x2630 ? __find_get_block+0xb4/0x380 lock_acquire+0xd1/0x2d0 ? __ext4_journal_get_write_access+0xd5/0x160 _raw_spin_lock+0x33/0x40 ? __ext4_journal_get_write_access+0xd5/0x160 __ext4_journal_get_write_access+0xd5/0x160 ext4_reserve_inode_write+0x61/0xb0 __ext4_mark_inode_dirty+0x79/0x270 ? ext4_ext_replay_set_iblocks+0x2f8/0x450 ext4_ext_replay_set_iblocks+0x330/0x450 ext4_fc_replay+0x14c8/0x1540 ? jread+0x88/0x2e0 ? rcu_is_watching+0x11/0x40 do_one_pass+0x447/0xd00 jbd2_journal_recover+0x139/0x1b0 jbd2_journal_load+0x96/0x390 ext4_load_and_init_journal+0x253/0xd40 ext4_fill_super+0x2cc6/0x3180 ... In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in function ext4_check_bdev_write_error(). Unfortunately, at this point this spinlock has not been initialized yet. Moving it's initialization to an earlier point in __ext4_fill_super() fixes this splat. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2024-08-26ext4: fix timer use-after-free on failed mountXiaxi Shen
Syzbot has found an ODEBUG bug in ext4_fill_super The del_timer_sync function cancels the s_err_report timer, which reminds about filesystem errors daily. We should guarantee the timer is no longer active before kfree(sbi). When filesystem mounting fails, the flow goes to failed_mount3, where an error occurs when ext4_stop_mmpd is called, causing a read I/O failure. This triggers the ext4_handle_error function that ultimately re-arms the timer, leaving the s_err_report timer active before kfree(sbi) is called. Fix the issue by canceling the s_err_report timer after calling ext4_stop_mmpd. Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Reported-and-tested-by: syzbot+59e0101c430934bc9a36@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=59e0101c430934bc9a36 Link: https://patch.msgid.link/20240715043336.98097-1-shenxiaxi26@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org