summaryrefslogtreecommitdiff
path: root/fs/gfs2/super.c
AgeCommit message (Collapse)Author
2025-07-09gfs2: Remove GIF_ALLOC_FAILED flagAndreas Gruenbacher
Get rid of the GIF_ALLOC_FAILED flag; we can now be confident that the additional consistency check isn't needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-26Merge tag 'gfs2-for-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP is enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb will never change. This is the counterpart of commit 9e888998ea4d ("writeback: fix false warning in inode_to_wb()") - Fix a hang introduced by commit 8d391972ae2d ("gfs2: Remove __gfs2_writepage()"): prevent gfs2_logd from creating transactions for jdata pages while trying to flush the log - Fix a race between gfs2_create_inode() and gfs2_evict_inode() by deallocating partially created inodes on the gfs2_create_inode() error path - Fix a bug in the journal head lookup code that could cause mount to fail after successful recovery - Various smaller fixes and cleanups from various people * tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits) gfs2: No more gfs2_find_jhead caching gfs2: Get rid of duplicate log head lookup gfs2: Simplify clean_journal gfs2: Simplify gfs2_log_pointers_init gfs2: Move gfs2_log_pointers_init gfs2: Minor comments fix gfs2: Don't start unnecessary transactions during log flush gfs2: Move gfs2_trans_add_databufs gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio gfs2: avoid inefficient use of crc32_le_shift() gfs2: Do not call iomap_zero_range beyond eof gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch gfs2: Fix usage of bio->bi_status in gfs2_end_log_write gfs2: deallocate inodes in gfs2_create_inode gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc gfs2: Move gfs2_dinode_dealloc gfs2: Don't reread inodes unnecessarily gfs2: gfs2_create_inode error handling fix gfs2: Remove unnecessary NULL check before free_percpu() gfs2: check sb_min_blocksize return value ...
2025-05-22gfs2: No more gfs2_find_jhead cachingAndreas Gruenbacher
We are no longer calling gfs2_find_jhead() on the same log twice, so there is no more reason for keeping the log contents cached across those calls. In addition, log head lookup and log header writing didn't go through the same address space and so the caching wasn't even fully working, anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Get rid of duplicate log head lookupAndreas Gruenbacher
Currently at mount time, the recovery code looks up the current log head and, if necessary, replays the log and writes a recovery header to indicate that the log is clean. It does that for each log that may need recovery. We also know that our own log will always be checked as part of that process. Then, the mount code looks up the log head of our own log again. The double log head lookup can be costly, but more importantly, it is unnecessary because we can trivially compute the position of the log head after recovery; all we need to do for that is bump the position and lh_sequence by one when writing a recovery header. With that in mind, move the call to gfs2_log_pointers_init() into gfs2_recover_func() and get rid of the double lookup in gfs2_make_fs_rw(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Simplify gfs2_log_pointers_initAndreas Gruenbacher
Move the initialization of sdp->sd_log_sequence and sdp->sd_log_flush_head inside gfs2_log_pointers_init(). Use gfs2_replay_incr_blk(). Before this change, the log head lookup code in freeze_go_xmote_bh() didn't update sdp->sd_log_flush_head. This is now fixed, but the code in freeze_go_xmote_bh() appears to be pretty useless in the first place: on a frozen filesystem, the log head will not change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-09super: add filesystem freezing helpers for suspend and hibernateChristian Brauner
Allow the power subsystem to support filesystem freeze for suspend and hibernate. For some kernel subsystems it is paramount that they are guaranteed that they are the owner of the freeze to avoid any risk of deadlocks. This is the case for the power subsystem. Enable it to recognize whether it did actually freeze the filesystem. If userspace has 10 filesystems and suspend/hibernate manges to freeze 5 and then fails on the 6th for whatever odd reason (current or future) then power needs to undo the freeze of the first 5 filesystems. It can't just walk the list again because while it's unlikely that a new filesystem got added in the meantime it still cannot tell which filesystems the power subsystem actually managed to get a freeze reference count on that needs to be dropped during thaw. There's various ways out of this ugliness. For example, record the filesystems the power subsystem managed to freeze on a temporary list in the callbacks and then walk that list backwards during thaw to undo the freezing or make sure that the power subsystem just actually exclusively freezes things it can freeze and marking such filesystems as being owned by power for the duration of the suspend or resume cycle. I opted for the latter as that seemed the clean thing to do even if it means more code changes. If hibernation races with filesystem freezing (e.g. DM reconfiguration), then hibernation need not freeze a filesystem because it's already frozen but userspace may thaw the filesystem before hibernation actually happens. If the race happens the other way around, DM reconfiguration may unexpectedly fail with EBUSY. So allow FREEZE_EXCL to nest with other holders. An exclusive freezer cannot be undone by any of the other concurrent freezers. Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-24gfs2: deallocate inodes in gfs2_create_inodeAndreas Gruenbacher
When creating and destroying inodes, we are relying on the inode hash table to make sure that for a given inode number, only a single inode will exist. We then link that inode to its inode and iopen glock and let those glocks point back at the inode. However, when iget_failed() is called, the inode is removed from the inode hash table before gfs_evict_inode() is called, and uniqueness is no longer guaranteed. Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work around that problem by detaching the inode glock from the inode before calling iget_failed(), but that broke the inode deallocation code in gfs_evict_inode(). To fix that, deallocate partially created inodes in gfs2_create_inode() instead of relying on gfs_evict_inode() for doing that. This means that gfs2_evict_inode() and its helper functions will no longer see partially created inodes, and so some simplifications are possible there. Fixes: 9ffa18884cce ("gfs2: gl_object races fix") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_deallocAndreas Gruenbacher
Don't check for the GIF_ALLOC_FAILED flag in gfs2_ea_dealloc() and pass that information explicitly instead. This allows for a cleaner follow-up patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: Move gfs2_dinode_deallocAndreas Gruenbacher
Move gfs2_dinode_dealloc() and its helper gfs2_final_release_pages() from super.c to inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: replace sd_aspace with sd_inodeAndreas Gruenbacher
Currently, sdp->sd_aspace and the per-inode metadata address spaces use sb->s_bdev->bd_mapping->host as their ->host; folios in those address spaces will thus appear to be on bdev rather than on gfs2 filesystems. This is a problem because gfs2 doesn't support cgroup writeback (SB_I_CGROUPWB), but bdev does. Fix that by using a "dummy" gfs2 inode as ->host in those address spaces. When coming from a folio, folio->mapping->host->i_sb will then be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in sb->s_iflags. Based on a previous version from Bob Peterson from several years ago. Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure this out. Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-07gfs2: pass through holder from the VFS for freeze/thawChristian Brauner
The filesystem's freeze/thaw functions can be called from contexts where the holder isn't userspace but the kernel, e.g., during systemd suspend/hibernate. So pass through the freeze/thaw flags from the VFS instead of hard-coding them. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10gfs2: skip if we cannot defer deleteAndreas Gruenbacher
In gfs2_evict_inode(), in the unlikely case that we cannot defer deleting the inode, it is not safe to fall back to deleting the inode; the only valid choice we have is to skip the delete. In addition, in evict_should_delete(), if we cannot lock the inode glock exclusively, we are in a bad enough state that skipping the delete is likely a better choice than trying to recover from the failure later. Fixes: c5b7a2400edc ("gfs2: Only defer deletes when we have an iopen glock") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: minor evict fixAndreas Gruenbacher
In evict_should_delete(), when gfs2_upgrade_iopen_glock() fails, we detach the iopen glock from the inode without calling glock_clear_object(). This leads to a warning in glock_set_object() when the same inode is recreated and the glock is reused. Fix that by only detaching the iopen glock in gfs2_evict_inode(). In addition, remove the dequeue code from evict_should_delete(); we already perform a conditional dequeue in gfs2_evict_inode(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETEAndreas Gruenbacher
Having this flag attached to the iopen glock instead of the inode is much simpler; it eliminates a protential weird race in gfs2_try_evict(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-19gfs2: Only defer deletes when we have an iopen glockAndreas Gruenbacher
The mechanism to defer deleting unlinked inodes is tied to delete_work_func(), which is tied to iopen glocks. When we don't have an iopen glock, we must carry out deletes immediately instead. Fixes a NULL pointer dereference in gfs2_evict_inode(). Fixes: 8c21c2c71e66 ("gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: gfs2_evict_inode clarificationAndreas Gruenbacher
When function evict_should_delete() returns SHOULD_DEFER_EVICTION, gh is never initialized, but that isn't obvious; if it did initialize gh and then return SHOULD_DEFER_EVICTION, gfs2_evict_inode() would fail to release it. To clarify the code, change gfs2_evict_inode() to always check if gh needs to be released, no matter what evict_should_delete() returns. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Update to the evict / remote delete documentationAndreas Gruenbacher
Try to be a bit more clear and remove some duplications. We cannot actually get rid of the verification step eventually, so remove the comment saying so. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inodeAndreas Gruenbacher
Move calls to gfs2_queue_verify_delete() into gfs2_evict_inode(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glockAndreas Gruenbacher
In case an iopen glock cannot be upgraded, function gfs2_upgrade_iopen_glock() needs to communicate to gfs2_evict_inode() whether deleting the inode should be deferred or skipped altogether. Change the function to return the appropriate enum evict_behavior value to indicate that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Rename dinode_demise to evict_behaviorAndreas Gruenbacher
Rename enum dinode_demise to evict_behavior and its items SHOULD_DELETE_DINODE to EVICT_SHOULD_DELETE, SHOULD_NOT_DELETE_DINODE to EVICT_SHOULD_SKIP_DELETE, and SHOULD_DEFER_EVICTION to EVICT_SHOULD_DEFER_DELETE. In gfs2_evict_inode(), add a separate variable of type enum evict_behavior instead of implicitly casting to int. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETEAndreas Gruenbacher
The GIF_DEFERRED_DELETE flag indicates an action that gfs2_evict_inode() should take, so rename the flag to GIF_DEFER_DELETE to clarify. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Faster gfs2_upgrade_iopen_glock wakeupsAndreas Gruenbacher
Move function needs_demote() to glock.h and rename it to glock_needs_demote(). In handle_callback(), wake up the glock when setting the GLF_PENDING_DEMOTE flag as well. (Setting the GLF_DEMOTE flag already triggered a wake-up.) With that, check for glock_needs_demote() in gfs2_upgrade_iopen_glock() to wake up when either of those flags is set for the inode glock: the faster we can react to contention, the better. The GLF_PENDING_DEMOTE flag is only used for inode glocks (see gfs2_glock_cb()) so it's okay to only check for the GLF_DEMOTE flag in gfs2_drop_inode(). Still, using glock_needs_demote() there as well makes the code a little easier to read. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-10-22KMSAN: uninit-value in inode_go_dump (5)Qianqiang Liu
When mounting of a corrupted disk image fails, the error message printed can reference uninitialized inode fields. To prevent that from happening, always initialize those fields. Reported-by: syzbot+aa0730b0a42646eb1359@syzkaller.appspotmail.com Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-25gfs2: Fix unlinked inode cleanupAndreas Gruenbacher
Before commit f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work"), function delete_work_func() was used to trigger the eviction of in-memory inodes from remote as well as deleting unlinked inodes at a later point. These two kinds of work were then split into two kinds of work, and the two places in the code were deferred deletion of inodes is required accidentally ended up queuing the wrong kind of work. This caused unlinked inodes to be left behind, which could in the worst case fill up filesystems and require a filesystem check to recover. Fix that by queuing the right kind of work in try_rgrp_unlink() and gfs2_drop_inode(). Fixes: f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-29gfs2: Revise glock reference counting modelAndreas Gruenbacher
In the current glock reference counting model, a bias of one is added to a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED). A glock is removed from the lru_list when it is enqueued, and added back when it is dequeued. This isn't a very appropriate model because most glocks are held for long periods of time (for example, the inode "owns" references to its inode and iopen glocks as long as the inode is cached even when the glock state changes to LM_ST_UNLOCKED), and they can only be freed when they are no longer referenced, anyway. Fix this by getting rid of the refcount bias for locked glocks. That way, we can use lockref_put_or_lock() to efficiently drop all but the last glock reference, and put the glock onto the lru_list when the last reference is dropped. When find_insert_glock() returns a reference to a cached glock, it removes the glock from the lru_list. Dumping the "glocks" and "glstats" debugfs files also takes glock references, but instead of removing the glocks from the lru_list in that case as well, we leave them on the list. This ensures that dumping those files won't perturb the order of the glocks on the lru_list. In addition, when the last reference to an *unlocked* glock is dropped, we immediately free it; this preserves the preexisting behavior. If it later turns out that caching unlocked glocks is useful in some situations, we can change the caching strategy. It is currently unclear if a glock that has no active references can have the GLF_LFLUSH flag set. To make sure that such a glock won't accidentally be evicted due to memory pressure, we add a GLF_LFLUSH check to gfs2_dispose_glock_lru(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-07gfs2: make timeout values more explicitWolfram Sang
'timeout' is a vague name for the return value of wait_event_*_timeout because it actually returns the time left. Because the variable is never used later, just drop the return value. Since variable 'timeout' is then only used to carry a fixed timeout value, drop this in favor of a fixed function argument as in the other call to wait_event_timeout() above. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-29gfs2: gfs2_freeze_unlock cleanupAndreas Gruenbacher
Function gfs2_freeze_unlock() is always called with &sdp->sd_freeze_gh as its argument, so clean up the code by passing in sdp instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24gfs2: Fix potential glock use-after-free on unmountAndreas Gruenbacher
When a DLM lockspace is released and there ares still locks in that lockspace, DLM will unlock those locks automatically. Commit fb6791d100d1b started exploiting this behavior to speed up filesystem unmount: gfs2 would simply free glocks it didn't want to unlock and then release the lockspace. This didn't take the bast callbacks for asynchronous lock contention notifications into account, which remain active until until a lock is unlocked or its lockspace is released. To prevent those callbacks from accessing deallocated objects, put the glocks that should not be unlocked on the sd_dead_glocks list, release the lockspace, and only then free those glocks. As an additional measure, ignore unexpected ast and bast callbacks if the receiving glock is dead. Fixes: fb6791d100d1b ("GFS2: skip dlm_unlock calls in unmount") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: David Teigland <teigland@redhat.com>
2024-04-09gfs2: Replace gfs2_glock_queue_put with gfs2_glock_put_asyncAndreas Gruenbacher
Function gfs2_glock_queue_put() puts a glock reference by enqueuing glock work instead of putting the reference directly. This ensures that the operation won't sleep, but it is costly and really only necessary when putting the final glock reference. Replace it with a new gfs2_glock_put_async() function that only queues glock work when putting the last glock reference. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09gfs2: Fix NULL pointer dereference in gfs2_log_flushAndreas Gruenbacher
In gfs2_jindex_free(), set sdp->sd_jdesc to NULL under the log flush lock to provide exclusion against gfs2_log_flush(). In gfs2_log_flush(), check if sdp->sd_jdesc is non-NULL before dereferencing it. Otherwise, we could run into a NULL pointer dereference when outstanding glock work races with an unmount (glock_work_func -> run_queue -> do_xmote -> inode_go_sync -> gfs2_log_flush). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-27gfs2: Fix freeze consistency check in log_write_headerAndreas Gruenbacher
Functions gfs2_freeze_super() and gfs2_thaw_super() are using the SDF_FROZEN flag to indicate when the filesystem is frozen, synchronized by sd_freeze_mutex. However, this doesn't prevent writes from happening between the point of calling thaw_super() and the point where the SDF_FROZEN flag is cleared, so the following assert can trigger in log_write_header(): gfs2_assert_withdraw(sdp, !test_bit(SDF_FROZEN, &sdp->sd_flags)); Fix that by checking for sb->s_writers.frozen != SB_FREEZE_COMPLETE in log_write_header() instead. To make sure that the filesystem-specific part of freezing happens before sb->s_writers.frozen is set to SB_FREEZE_COMPLETE, move that code from gfs2_freeze_locally() into gfs2_freeze_fs() and hook that up to the .freeze_fs operation. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-27gfs2: Refcounting fix in gfs2_thaw_superAndreas Gruenbacher
It turns out that the .freeze_super and .thaw_super operations require the filesystem to manage the superblock refcount itself. We are using the freeze_super() and thaw_super() helpers to mostly take care of that for us, but this means that the superblock may no longer be around by when thaw_super() returns, and gfs2_thaw_super() will then access freed memory. Take an extra superblock reference in gfs2_thaw_super() to fix that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-27gfs2: Minor gfs2_{freeze,thaw}_super cleanupAndreas Gruenbacher
This minor cleanup to gfs2_freeze_super() and gfs2_thaw_super() prepares for the following refcounting fix. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-20gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawnAndreas Gruenbacher
This function checks whether the filesystem has been been marked to be withdrawn eventually or has been withdrawn already. Rename this function to avoid confusing code like checking for gfs2_withdrawing() when gfs2_withdrawn() has already returned true. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-20gfs2: Mark withdraws as unlikelyAndreas Gruenbacher
Mark the gfs2_withdrawn(), gfs2_withdrawing(), and gfs2_withdraw_in_prog() inline functions as likely to return %false. This allows to get rid of likely() and unlikely() annotations at the call sites of those functions. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-20gfs2: use is_subdir()Al Viro
... instead of reimplementing it with misguiding name (is_ancestor(x, y) would normally imply "x is an ancestor of y", not the other way round). With races, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-11-07Merge tag 'gfs2-v6.6-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Don't update inode timestamps for direct writes (performance regression fix) - Skip no-op quota records instead of panicing - Fix a RCU race in gfs2_permission() - Various other smaller fixes and cleanups all over the place * tag 'gfs2-v6.6-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (24 commits) gfs2: don't withdraw if init_threads() got interrupted gfs2: remove dead code in add_to_queue gfs2: Fix slab-use-after-free in gfs2_qd_dealloc gfs2: Silence "suspicious RCU usage in gfs2_permission" warning gfs2: fs: derive f_fsid from s_uuid gfs2: No longer use 'extern' in function declarations gfs2: Rename gfs2_lookup_{ simple => meta } gfs2: Convert gfs2_internal_read to folios gfs2: Convert stuffed_readpage to folios gfs2: Minor gfs2_write_jdata_batch PAGE_SIZE cleanup gfs2: Get rid of gfs2_alloc_blocks generation parameter gfs2: Add metapath_dibh helper gfs2: Clean up quota.c:print_message gfs2: Clean up gfs2_alloc_parms initializers gfs2: Two quota=account mode fixes gfs2: Stop using GFS2_BASIC_BLOCK and GFS2_BASIC_BLOCK_SHIFT gfs2: setattr_chown: Add missing initialization gfs2: fix an oops in gfs2_permission gfs2: ignore negated quota changes gfs2: Don't update inode timestamps for direct writes ...
2023-11-06gfs2: Fix slab-use-after-free in gfs2_qd_deallocJuntong Deng
In gfs2_put_super(), whether withdrawn or not, the quota should be cleaned up by gfs2_quota_cleanup(). Otherwise, struct gfs2_sbd will be freed before gfs2_qd_dealloc (rcu callback) has run for all gfs2_quota_data objects, resulting in use-after-free. Also, gfs2_destroy_threads() and gfs2_quota_cleanup() is already called by gfs2_make_fs_ro(), so in gfs2_put_super(), after calling gfs2_make_fs_ro(), there is no need to call them again. Reported-by: syzbot+29c47e9e51895928698c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=29c47e9e51895928698c Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-11-06gfs2: fs: derive f_fsid from s_uuidAmir Goldstein
gfs2 already has optional persistent uuid. Use that uuid to report f_fsid in statfs(2), same as ext2/ext4/zonefs. This allows gfs2 to be monitored by fanotify filesystem watch. for example, with inotify-tools 4.23.8.0, the following command can be used to watch changes over entire filesystem: fsnotifywatch --filesystem /mnt/gfs2 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-11-06gfs2: No longer use 'extern' in function declarationsAndreas Gruenbacher
For non-static function declarations, external linkage is implied and the 'extern' keyword isn't needed. Some static checkers complain about the overuse of 'extern', so clean up all the function declarations. In addition, remove 'extern' from the definition of free_local_statfs_inodes(); it isn't needed there, either. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-10-18gfs2: convert to new timestamp accessorsJeff Layton
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-38-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-03gfs2: fix an oops in gfs2_permissionAl Viro
In RCU mode, we might race with gfs2_evict_inode(), which zeroes ->i_gl. Freeing of the object it points to is RCU-delayed, so if we manage to fetch the pointer before it's been replaced with NULL, we are fine. Check if we'd fetched NULL and treat that as "bail out and tell the caller to get out of RCU mode". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-18gfs2: Simplify function gfs2_upgrade_iopen_glockAndreas Gruenbacher
When trying to upgrade the iopen glock, gfs2_upgrade_iopen_glock() tries to take the iopen glock with the LM_FLAG_TRY_1CB flag set before trying to take it without the LM_FLAG_TRY or LM_FLAG_TRY_1CB flags set. Both calls will cause the lock contention bast callbacks to be invoked throughout the cluster, and we really don't need them to be invoked twice. Remove the first LM_FLAG_TRY_1CB call to eliminate unnecessary dlm traffic. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05Merge tag 'gfs2-v6.5-rc5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix a glock state (non-)transition bug when a dlm request times out and is canceled, and we have locking requests that can now be granted immediately - Various fixes and cleanups in how the logd and quotad daemons are woken up and terminated - Fix several bugs in the quota data reference counting and shrinking. Free quota data objects synchronously in put_super() instead of letting call_rcu() run wild - Make sure not to deallocate quota data during a withdraw; rather, defer quota data deallocation to put_super(). Withdraws can happen in contexts in which callers on the stack are holding quota data references - Many minor quota fixes and cleanups by Bob - Update the the mailing list address for gfs2 and dlm. (It's the same list for both and we are moving it to gfs2@lists.linux.dev) - Various other minor cleanups * tag 'gfs2-v6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (51 commits) MAINTAINERS: Update dlm mailing list MAINTAINERS: Update gfs2 mailing list gfs2: change qd_slot_count to qd_slot_ref gfs2: check for no eligible quota changes gfs2: Remove useless assignment gfs2: simplify slot_get gfs2: Simplify qd2offset gfs2: introduce qd_bh_get_or_undo gfs2: Remove quota allocation info from quota file gfs2: use constant for array size gfs2: Set qd_sync_gen in do_sync gfs2: Remove useless err set gfs2: Small gfs2_quota_lock cleanup gfs2: move qdsb_put and reduce redundancy gfs2: improvements to sysfs status gfs2: Don't try to sync non-changes gfs2: Simplify function need_sync gfs2: remove unneeded pg_oflow variable gfs2: remove unneeded variable done gfs2: pass sdp to gfs2_write_buf_to_page ...
2023-09-05gfs2: Introduce new quota=quiet mount optionBob Peterson
This patch adds a new mount option quota=quiet which is the same as quota=on but it suppresses gfs2 quota error messages. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05gfs2: Fix asynchronous thread destructionAndreas Gruenbacher
The kernel threads are currently stopped and destroyed synchronously by gfs2_make_fs_ro() and gfs2_put_super(), and asynchronously by signal_our_withdraw(), with no synchronization, so the synchronous and asynchronous contexts can race with each other. First, when creating the kernel threads, take an extra task struct reference so that the task struct won't go away immediately when they terminate. This allows those kthreads to terminate immediately when they're done rather than hanging around as zombies until they are reaped by kthread_stop(). When kthread_stop() is called on a terminated kthread, it will return immediately. Second, in signal_our_withdraw(), once the SDF_JOURNAL_LIVE flag has been cleared, wake up the logd and quotad wait queues instead of stopping the logd and quotad kthreads. The kthreads are then expected to terminate automatically within short time, but if they cannot, they will not block the withdraw. For example, if a user process and one of the kthread decide to withdraw at the same time, only one of them will perform the actual withdraw and the other will wait for it to be done. If the kthread ends up being the one to wait, the withdrawing user process won't be able to stop it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05gfs2: Stop using gfs2_make_fs_ro for withdrawAndreas Gruenbacher
[ 81.372851][ T5532] CPU: 1 PID: 5532 Comm: syz-executor.0 Not tainted 6.2.0-rc1-syzkaller-dirty #0 [ 81.382080][ T5532] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 [ 81.392343][ T5532] Call Trace: [ 81.395654][ T5532] <TASK> [ 81.398603][ T5532] dump_stack_lvl+0x1b1/0x290 [ 81.418421][ T5532] gfs2_assert_warn_i+0x19a/0x2e0 [ 81.423480][ T5532] gfs2_quota_cleanup+0x4c6/0x6b0 [ 81.428611][ T5532] gfs2_make_fs_ro+0x517/0x610 [ 81.457802][ T5532] gfs2_withdraw+0x609/0x1540 [ 81.481452][ T5532] gfs2_inode_refresh+0xb2d/0xf60 [ 81.506658][ T5532] gfs2_instantiate+0x15e/0x220 [ 81.511504][ T5532] gfs2_glock_wait+0x1d9/0x2a0 [ 81.516352][ T5532] do_sync+0x485/0xc80 [ 81.554943][ T5532] gfs2_quota_sync+0x3da/0x8b0 [ 81.559738][ T5532] gfs2_sync_fs+0x49/0xb0 [ 81.564063][ T5532] sync_filesystem+0xe8/0x220 [ 81.568740][ T5532] generic_shutdown_super+0x6b/0x310 [ 81.574112][ T5532] kill_block_super+0x79/0xd0 [ 81.578779][ T5532] deactivate_locked_super+0xa7/0xf0 [ 81.584064][ T5532] cleanup_mnt+0x494/0x520 [ 81.593753][ T5532] task_work_run+0x243/0x300 [ 81.608837][ T5532] exit_to_user_mode_loop+0x124/0x150 [ 81.614232][ T5532] exit_to_user_mode_prepare+0xb2/0x140 [ 81.619820][ T5532] syscall_exit_to_user_mode+0x26/0x60 [ 81.625287][ T5532] do_syscall_64+0x49/0xb0 [ 81.629710][ T5532] entry_SYSCALL_64_after_hwframe+0x63/0xcd In this backtrace, gfs2_quota_sync() takes quota data references and then calls do_sync(). Function do_sync() encounters filesystem corruption and withdraws the filesystem, which (among other things) calls gfs2_quota_cleanup(). Function gfs2_quota_cleanup() wrongly assumes that nobody is holding any quota data references anymore, and destroys all quota data objects. When gfs2_quota_sync() then resumes and dereferences the quota data objects it is holding, those objects are no longer there. Function gfs2_quota_cleanup() deals with resource deallocation and can easily be delayed until gfs2_put_super() in the case of a filesystem withdraw. In fact, most of the other work gfs2_make_fs_ro() does is unnecessary during a withdraw as well, so change signal_our_withdraw() to skip gfs2_make_fs_ro() and perform the necessary steps directly instead. Thanks to Edward Adam Davis <eadavis@sina.com> for the initial patches. Link: https://lore.kernel.org/all/0000000000002b5e2405f14e860f@google.com Reported-by: syzbot+3f6a670108ce43356017@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05gfs2: Rename SDF_DEACTIVATING to SDF_KILLAndreas Gruenbacher
Rename the SDF_DEACTIVATING flag to SDF_KILL to make it more obvious that this relates to the kill_sb filesystem operation. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-08-28Merge tag 'v6.6-vfs.super' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
2023-07-24gfs2: convert to ctime accessor functionsJeff Layton
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-45-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>